This course will help students to:
Learning Outcome | Learning and Teaching Strategies | Assessment Strategies- |
The students will: CO96. Differentiate between operational and decision support systems. CO97. Explain the concepts and architecture of data warehouses. CO98. Identify applications of data mining in different domains CO99. Develop a machine learning model for a practical problem. CO100. Evaluate and compare the performance of machine learning models. | Approach in teaching: Interactive Lectures, Tutorials, Demonstrations, Learning activities for the students: Self-learning assignments, Quizzes, Presentations, Discussions |
|
Need for strategic information, Decision support system, Operational versus Decision-Support Systems, Data Warehousing-the only solution, definitions of Data warehousing and data mining, features of Data warehouse, Data Marts, Metadata. Planning Data warehouse, project team, project management considerations, information packages & requirements gathering methods and Requirements definition: Scope and Content.
Architectural components: Objectives, Data Warehouse Architecture, Distinguishing Characteristics, Architectural Framework. Infrastructure: Operational & Physical. Implementation of Data warehouse, ETL (Extract, Transform and Load in Data warehouse) Physical design: steps, considerations, physical storage, indexing, Data lake vs. Data warehouse
Introduction to Data Mining and machine learning: Basic Data Mining Tasks, Data Mining versus Knowledge Discovery in Databases, Applications of Machine Learning, Machine Learning vs AI , Types of Machine Learning, Metrics, Accuracy Measures: Precision, recall, F-measure, confusion matrix, cross-validation.
Understand the Problem by Understanding the Data, unbalanced data, Unsupervised Learning: Association rules, Apriori algorithm, FP tree algorithm, Market Basket Analysis and Association Analysis. Clustering: k-means and implementation of k-means using python, Concept of other clustering algorithms: Hierarchical clustering, and DBSCAN.
Classification & Prediction: model Construction, performance, attribute selection Issues: under, Over-fitting, cross validation, tree pruning methods, missing values, Information Gain, Gain Ratio, Gini Index, continuous classes. Classification and Regression Trees (CART) and C 5.0. Linear Regression, Multiple Linear Regression, Logistic Regression, Naïve Bayes, Support Vector Machines(SVM) and Simple neural network.
SUGGESTED REFERENCE BOOKS :
E-RESOURCES INCLUDING LINKS:
REFERENCE JOURNALS: