Introduction to Data Mining

Paper Code: 
DAC 331
Credits: 
02
Periods/week: 
1
Max. Marks: 
100.00
Objective: 
Learning outcome (at course level)Learning and teaching strategiesAssessment Strategies
   

Students will be able to:

  1. Explain basic applications, concepts, and techniques of data mining.
  2. Apply data mining approaches to solve practical problems in a variety of disciplines
  3. Analyze large sets of data to gain useful business understanding.
  4. Describe and demonstrate basic data mining algorithms, methods, and tools
  5. Generate quantitative analysis reports and perform comparative analysis of algorithms for decision making
     

Approach in teaching:

Interactive Lectures, Discussion, reading assignments, Demonstrations, Group activities, Teaching using advanced IT audio-video tools 

 

Learning activities for the students:

Self-learning assignments, Effective questions, Seminar presentation, Giving tasks.

 

Assessment Strategies

Class test, Semester end examinations, Quiz, Solving problems in tutorials, Assignments, Presentation

 

6.00
Unit I: 
UNIT I

Introduction to Data Warehousing: Architecture of Data Warehouse, Data Preprocessing – Need, Data Cleaning, Data Integration &Transformation, Data Reduction, Machine Learning, Pattern Matching. Introduction to Data Mining: Basic Data Mining Tasks, Data Mining versus Knowledge Discovery in Databases, Data Mining Metrics, Data Mining Query Language, Applications of Data Mining.

 

6.00
Unit II: 
UNIT II

Data Mining Techniques: Frequent item-sets and Association rule mining: Apriori algorithm, Use of sampling for frequent item-set, FP tree algorithm, Graph Mining, Frequent sub-graph mining. Market Basket Analysis and Association Analysis, Market Basket Data, Stores, Customers, Orders, Items, Order Characteristics, Product Popularity, Tracking Marketing Interventions.

 

 

6.00
Unit III: 
UNIT III

Classification & Prediction: Decision tree learning: Construction, performance, attribute selection Issues: Over-fitting, tree pruning methods, missing values, Information Gain, Gain Ratio, Gini Index, continuous classes. Classification and Regression Trees (CART) and C 5.0 .

 

6.00
Unit IV: 
UNIT IV

Bayesian Classification: Bayes Theorem, Naïve Bayes classifier, Bayesian Networks Inference, Parameter and structure learning: Linear classifiers, Least squares, logistic, perceptron and SVM classifiers, Prediction: Linear regression, Non-linear regression (Artificial Neural Networks).

 

6.00
Unit V: 
UNIT V

Accuracy Measures: Precision, recall, F-measure, confusion matrix, cross-validation, bootstrap, Clustering: k-means, Expectation Maximization (M) algorithm, Hierarchical clustering, Correlation clustering, DBSCAN.

 

ESSENTIAL READINGS: 
  • Jiawei Han & Micheline Kamber, “Data Mining: Concepts & Techniques”, Morgan Kaufmann Publishers, Third Edition.
  • Mohanty, Soumendra, “Data Warehousing: Design, Development and Best Practices”, Tata McGraw Hill, 2006
  • W. H. Inmon, “Building the Data Warehouse”, Wiley Dreamtech India Pvt. Ltd., 4th  Edition, 2005

 

Academic Year: