INTRODUCTION TO DATA MINING

Paper Code: 
DAC 331
Credits: 
2
Periods/week: 
30
Max. Marks: 
100.00
Objective: 

Students will learn basic data miningconcepts. this will help them in understanding analytical procedures used in Business Analytics through data mining approach. 

Course Outcomes (COs):

 

Course outcome (at course level)

Learning and teaching strategies

Assessment Strategies

Students will be able to:

CO1. Identify basic applications, concepts, and techniques of data mining. Student will also differentiate supervised and non-supervised techniques in data mining.

CO2. Create association rules and develop tree/graph in market busket dataset using Apriori and FP tree algorithms

CO3. Analyze large datasets to gain business understanding and apply classification, prediction and clustering algorithms.

CO4. Evaluate classification/prediction models using metrics like accuracy, ROC, RMSE, confusion matrix etc.

CO5. Generate quantitative analysis reports and perform comparative analysis of algorithms for decision making

Approach in teaching:

Interactive Lectures, Discussion, reading assignments, Demonstrations, Group activities, Teaching using advanced IT audio-video tools 

 

Learning activities for the students:

Self-learning assignments, Effective questions, Seminar presentation, Giving tasks.

 

Assessment Strategies

Class test, Semester end examinations, Quiz, Solving problems in tutorials, Assignments, Presentation


 

6.00
Unit I: 

Introduction to Data Warehousing: Architecture of Data Warehouse, Data Preprocessing – Need, Data Cleaning, Data Integration &Transformation, Data Reduction, Machine Learning, Pattern Matching. Introduction to Data Mining: Basic Data Mining Tasks, Data Mining versus Knowledge Discovery in Databases,Data Mining Metrics, Data Mining Query Language, Applications of Data Mining.

 

6.00
Unit II: 

Data Mining Techniques: Frequent item-sets and Association rule mining: Apriori algorithm, Use of sampling for frequent item-set,FP tree algorithm, Graph Mining, Frequent sub-graph mining.Market Basket Analysis and Association Analysis, Market Basket Data, Stores, Customers, Orders, Items, Order Characteristics, Product Popularity, Tracking Marketing Interventions.

 

6.00
Unit III: 

Classification & Prediction:Decision tree learning: Construction, performance, attribute selection Issues: Over-fitting, tree pruning methods, missing values, Information Gain, Gain Ratio, Gini Index,continuous classes. Classification and Regression Trees (CART) andC5.0.

 

6.00
Unit IV: 

Bayesian Classification: Bayes Theorem, Naïve Bayes classifier, Bayesian Networks Inference, Parameter and structure learning: Linear classifiers, Least squares, logistic, perceptron and SVM classifiers, Prediction: Linear regression, Non-linear regression(Artificial Neural Networks).

 

6.00
Unit V: 

Accuracy Measures: Precision, recall, F-measure, confusion matrix, cross-validation, bootstrap, Clustering: k-means, Expectation Maximization (M) algorithm, Hierarchical clustering, Correlation clustering, DBSCAN.

 

ESSENTIAL READINGS: 
  1. Jiawei Han &MichelineKamber, “Data Mining: Concepts & Techniques”, Morgan Kaufmann Publishers, Third Edition.
  2. Mohanty, Soumendra, “Data Warehousing: Design, Development and Best Practices”, Tata McGraw Hill, 2006

 

REFERENCES: 

SUGGESTED READINGS:

  1. W. H. Inmon, “Building the Data Warehouse”, Wiley Dreamtech India Pvt. Ltd., 4th  Edition, 2005

 

JOURNALS:

  1. Journal of the Brazilian Computer Society, SpringerOpen
  2. Journal of Internet Services and Applications, SpringerOpen
  3. https://www.journals.elsevier.com/international-journal-of-information-management-data-insights
  4. https://journal-bcs.springeropen.com/

 

E-RESOURCES: 

  1. https://www.slideshare.net/
  2. https://nptel.ac.in/courses/106106222
  3. https://spoken-tutorial.org/??/
  4. www.kaggle.com

 

 

Academic Year: