DATA WAREHOUSE & DATA MINING

Paper Code: 
24DCAI703A
Credits: 
03
Periods/week: 
03
Max. Marks: 
100.00
Objective: 

Course Objectives:

The course will enable the students to

  1. Understand the basic data warehouse and data mining concepts.
  2. Learn and describe various data mining techniques.
  3. Develop skills in students to implement data mining algorithms on real world     problems and evaluate their performance.

 

Course Outcomes: 

Course

Learning outcome

(at course level)

Learning and teaching strategies

Assessment Strategies

Course Code

Course

title

 

24DCAI 703A

 

DATA WAREHOUSE & DATA MINING

(Theory)

CO115. Analyse the significance of dataware house and data mining in information management.

CO116. Elaborate the concepts and architecture of data warehouses.

CO117. Determine data mining and machine learning fundamentals with evaluation metrics. 

CO118. Inspect unbalanced data in unsupervised learning using association rules, clustering algorithms.

CO119. Construct models, evaluate performance, address attribute selection in classification and prediction.

CO120. Contribute effectively in course-specific interaction

 

Approach in teaching:

Interactive Lectures, Discussion, PowerPoint Presentations, Informative videos

 

Learning activities for the students: 

Self-learning assignments, Effective questions, presentations.

 

 

 

Assessment tasks will include Class Test on the topics, Semester end examinations, Quiz, Student presentations and assignments.

 

9.00
Unit I: 
Need for strategic information:

Decision support system, Operational versus Decision-Support Systems, Data Warehousing-the only solution, definitions of Data warehousing and data mining, features of Data warehouse, Data Marts, Metadata. Planning Data warehouse, project team, project management considerations, information packages & requirements gathering methods and Requirements definition: Scope and Content.

 

9.00
Unit II: 
Architectural components:

Objectives, Data Warehouse Architecture, Distinguishing Characteristics, Architectural Framework. Infrastructure: Operational & Physical. Implementation of Data warehouse, ETL (Extract, Transform and Load in Data warehouse) Physical design: steps, considerations, physical storage, indexing, Data lake vs. Data warehouse

 

9.00
Unit III: 
Introduction to Data Mining and machine learning:

Basic Data Mining Tasks, Data Mining versus Knowledge Discovery in Databases, Applications of Machine Learning, Machine Learning vs AI ,Types of Machine Learning, Metrics, Accuracy Measures: Precision, recall, F-measure, confusion matrix, cross-validation.

 

9.00
Unit IV: 
Understand the Problem by Understanding the Data:

unbalanced data, Unsupervised Learning: Association rules, Apriori algorithm, FP tree algorithm, Market Basket Analysis and Association Analysis. Clustering: k-means and implementation of k-means using python, Concept of other clustering algorithms: Hierarchical clustering, and DBSCAN.

 

9.00
Unit V: 
Classification & Prediction:

model Construction, performance, attribute selection Issues: under, Over-fitting, cross validation, tree pruning methods, missing values, Information Gain, Gain Ratio, Gini Index, continuous classes. Classification and Regression Trees (CART) and C 5.0. Linear Regression, Multiple Linear Regression, Logistic Regression, Naïve Bayes, Support Vector Machines(SVM) and Simple neural network.

 

ESSENTIAL READINGS: 

SUGGESTED TEXT BOOKS:

 

  1. Paulraj Ponnian, “Data Warehousing Fundamentals”, John Wiley.
  2. Jiawei Han & Micheline Kamber, “Data Mining: Concepts & Techniques”, Morgan Kaufmann Publishers, Third Edition.
  3. Sebastian Raschka & Vahid Mirjalili, “Python Machine Learning”, Second Edition, Packt.

 

REFERENCES: 

SUGGESTED REFERENCE BOOKS : 

 

  1. Sima Yazdani, Shirley S. Wong, “Data warehousing with oracle”
  2. Han Kamber, Morgan Kaufmann, “Data Mining Concepts and Techniques”
  3. Ralph Kimball, “The Data Warehouse Lifecycle tool kit”, John Wiley.

 

 

REFERENCE JOURNALS:

  1. International Journal of Data Mining, Modelling and Management, Inderscience:
  2. https://www.inderscience.com/jhome.php?jcode=ijdmmm
  3. Data Mining and Knowledge Discovery, Springer : https://www.springer.com/journal/10618

 

 

e-RESOURCES INCLUDING LINKS:

 

  1. Introduction to data mining, Akannsha Totewar on slideshare: https://www.slideshare.net/akannshat/data-mining-15329899
  2. Data mining techniques ,Greeks for Greeks: https://www.geeksforgeeks.org/data-mining-techniques/
  3.  NPTEL: NOC:Data Mining, IIT Kharagpur : https://nptel.ac.in/courses/106105174

 

Academic Year: