Introduction to Data Mining (Theory)

Paper Code: 
24DAC331
Credits: 
4
Periods/week: 
2
Max. Marks: 
100.00
Objective: 

This course will enable students to learn basic data mining concepts. This will help them in understanding analytical procedures used in analytics through data mining approach. 

 

Course Outcomes: 

Course

Course outcome

(at course level)

Learning and teaching strategies

Assessment Strategies

Course Code

Course

Title

 

24DAC331

 

Introduction to Data Mining

 (Theory)

 

CO1.Identify basic applications,     concepts, and techniques of data mining. Student will also differentiate supervised and non-supervised techniques in data mining.

CO2.Create association rules and develop tree/graph in market basket dataset using Apriori and FP tree algorithms

CO3. Analyze large datasets to gain business understanding and apply classification, prediction and clustering algorithms.

CO4.Evaluate classification/prediction models using metrics like accuracy, ROC, RMSE, confusion matrix etc.

CO5.Create quantitative analysis reports and perform comparative analysis of algorithms for decision making

CO6.Contribute effectively in course-specific interaction

Approach in teaching:

Interactive Lectures, Discussion, Demonstrations, Group activities, Teaching using advanced IT audio-video tools. 

Learning activities for the students:

Self-learning assignments, Effective questions, Seminar presentation, Giving tasks.

Assessment Strategies

Class test, Semester end examinations, Quiz, Solving problems in tutorials, Assignments, Presentation

 

6.00
Unit I: 
Introduction to Data Warehousing:

Architecture of Data Warehouse, Data Preprocessing – Need, Data Cleaning, Data Integration &Transformation, Data Reduction, Machine Learning, Pattern Matching. Introduction to Data Mining: Basic Data Mining Tasks, Data Mining versus Knowledge Discovery in Databases, Data Mining Metrics, Data Mining Query Language, Applications of Data Mining.

 

6.00
Unit II: 
Data Mining Techniques:

Frequent item-sets and Association rule mining: Apriori algorithm, Use of sampling for frequent item-set, FP tree algorithm, Graph Mining, Frequent sub-graph mining. Market Basket Analysis and Association Analysis, Market Basket Data, Stores, Customers, Orders, Items, Order Characteristics, Product Popularity, Tracking Marketing Interventions.

 

6.00
Unit III: 
Classification & Prediction:

Decision tree learning: Construction, performance, attribute selection Issues: Over-fitting, tree pruning methods, missing values, Information Gain, Gain Ratio, Gini Index, continuous classes. Classification and Regression Trees (CART) and C 5.0 .

6.00
Unit IV: 
Bayesian Classification and ANN:

Bayes Theorem, Naïve Bayes classifier, Bayesian Networks Inference, Parameter and structure learning: Linear classifiers, Least squares, logistic, perceptron and SVM classifiers, Prediction: Linear regression, Non-linear regression (Artificial Neural Networks).

6.00
Unit V: 
Accuracy Measures:

Precision, recall, F-measure, confusion matrix, cross-validation, bootstrap, Clustering: k-means, Expectation Maximization (M) algorithm, Hierarchical clustering, Correlation clustering, DBSCAN.

ESSENTIAL READINGS: 
  1. Jiawei Han & Micheline Kamber, “Data Mining: Concepts & Techniques”, Morgan Kaufmann Publishers, Third Edition.
  2. Mohanty, Soumendra, “Data Warehousing: Design, Development and Best Practices”, Tata McGraw Hill, 2006
REFERENCES: 

Suggested Readings:

  1. W. H. Inmon, “Building the Data Warehouse”, Wiley Dreamtech India Pvt. Ltd., 4th  Edition, 2005

e-Resources:

  1. https://www.slideshare.net/
  2. https://nptel.ac.in/courses/106106222
  3. https://spoken-tutorial.org/??/
  4. www.kaggle.com

Journals:

1.   Journal of the Brazilian Computer Society, SpringerOpen

2.   Journal of Internet Services and Applications, SpringerOpen

 

Academic Year: