DATA WAREHOUSING & DATA MINING

Paper Code: 
DCAI 703A
Credits: 
03
Periods/week: 
03
Max. Marks: 
100.00
Objective: 

This course will help students to:

  1. Understand the basic data warehouse and data mining concepts.
  2. Learn and describe various data mining techniques.
  3. Develop skills in students to implement data mining algorithms on real world problems and evaluate their performance.

 

Learning Outcome

Learning and Teaching Strategies

Assessment Strategies-

The students will:

CO96. Differentiate between operational and      decision support systems.

CO97. Explain the concepts and architecture of data warehouses.

CO98. Identify applications of data mining in different domains

CO99. Develop a machine learning model for a practical problem.

CO100. Evaluate and compare the performance of machine learning models.

Approach in teaching:

Interactive Lectures, Tutorials, Demonstrations,

Learning activities for the students:

Self-learning assignments, Quizzes, Presentations, Discussions

  • Assignment
  • Classroom activity
  • Multiple choice questions
  • Semester End Examination
 

 

9.00
Unit I: 

Need for strategic information, Decision support system, Operational versus Decision-Support Systems, Data Warehousing-the only solution, definitions of Data warehousing and data mining, features of Data warehouse, Data Marts, Metadata. Planning Data warehouse, project team, project management considerations, information packages & requirements gathering methods and Requirements definition: Scope and Content.

 

9.00
Unit II: 

Architectural components: Objectives, Data Warehouse Architecture, Distinguishing Characteristics, Architectural Framework. Infrastructure: Operational & Physical. Implementation of Data warehouse, ETL (Extract, Transform and Load in Data warehouse) Physical design: steps, considerations, physical storage, indexing, Data lake vs. Data warehouse

 

9.00
Unit III: 

Introduction to Data Mining and machine learning: Basic Data Mining Tasks, Data Mining versus Knowledge Discovery in Databases, Applications of Machine Learning, Machine Learning vs AI , Types of Machine Learning, Metrics, Accuracy Measures: Precision, recall, F-measure, confusion matrix, cross-validation.

 

9.00
Unit IV: 

Understand the Problem by Understanding the Data, unbalanced data, Unsupervised Learning: Association rules, Apriori algorithm, FP tree algorithm, Market Basket Analysis and Association Analysis. Clustering: k-means and implementation of k-means using python, Concept of other clustering algorithms: Hierarchical clustering, and DBSCAN.

 

 

9.00
Unit V: 

Classification & Prediction: model Construction, performance, attribute selection Issues: under, Over-fitting, cross validation, tree pruning methods, missing values, Information Gain, Gain Ratio, Gini Index, continuous classes. Classification and Regression Trees (CART) and C 5.0. Linear Regression, Multiple Linear Regression, Logistic Regression, Naïve Bayes, Support Vector Machines(SVM) and Simple neural network.

 

ESSENTIAL READINGS: 
  • Paulraj Ponnian, “Data Warehousing Fundamentals”, John Wiley.
  • Jiawei Han & Micheline Kamber, “Data Mining: Concepts & Techniques”, Morgan Kaufmann Publishers, Third Edition.
  • Sebastian Raschka & Vahid Mirjalili, “Python Machine Learning”, Second              Edition,Packt.

 

REFERENCES: 

SUGGESTED REFERENCE BOOKS : 

  • Sima Yazdani, Shirley S. Wong, “Data warehousing with oracle”
  • Han Kamber, Morgan Kaufmann, “Data Mining Concepts and Techniques”
  • Ralph Kimball, “The Data Warehouse Lifecycle tool kit”, John Wiley.

 

E-RESOURCES INCLUDING LINKS:

 

REFERENCE JOURNALS:

 

Academic Year: