This course will enable students to learn basic data mining concepts. This will help them in understanding analytical procedures used in analytics through data mining approach.
Course | Course outcome (at course level) | Learning and teaching strategies | Assessment Strategies | |
Course Code | Course Title | |||
24DAC331 |
Introduction to Data Mining (Theory)
| CO1.Identify basic applications, concepts, and techniques of data mining. Student will also differentiate supervised and non-supervised techniques in data mining. CO2.Create association rules and develop tree/graph in market basket dataset using Apriori and FP tree algorithms CO3. Analyze large datasets to gain business understanding and apply classification, prediction and clustering algorithms. CO4.Evaluate classification/prediction models using metrics like accuracy, ROC, RMSE, confusion matrix etc. CO5.Create quantitative analysis reports and perform comparative analysis of algorithms for decision making CO6.Contribute effectively in course-specific interaction | Approach in teaching: Interactive Lectures, Discussion, Demonstrations, Group activities, Teaching using advanced IT audio-video tools. Learning activities for the students: Self-learning assignments, Effective questions, Seminar presentation, Giving tasks. | Assessment Strategies Class test, Semester end examinations, Quiz, Solving problems in tutorials, Assignments, Presentation |
Architecture of Data Warehouse, Data Preprocessing – Need, Data Cleaning, Data Integration &Transformation, Data Reduction, Machine Learning, Pattern Matching. Introduction to Data Mining: Basic Data Mining Tasks, Data Mining versus Knowledge Discovery in Databases, Data Mining Metrics, Data Mining Query Language, Applications of Data Mining.
Frequent item-sets and Association rule mining: Apriori algorithm, Use of sampling for frequent item-set, FP tree algorithm, Graph Mining, Frequent sub-graph mining. Market Basket Analysis and Association Analysis, Market Basket Data, Stores, Customers, Orders, Items, Order Characteristics, Product Popularity, Tracking Marketing Interventions.
Decision tree learning: Construction, performance, attribute selection Issues: Over-fitting, tree pruning methods, missing values, Information Gain, Gain Ratio, Gini Index, continuous classes. Classification and Regression Trees (CART) and C 5.0 .
Bayes Theorem, Naïve Bayes classifier, Bayesian Networks Inference, Parameter and structure learning: Linear classifiers, Least squares, logistic, perceptron and SVM classifiers, Prediction: Linear regression, Non-linear regression (Artificial Neural Networks).
Precision, recall, F-measure, confusion matrix, cross-validation, bootstrap, Clustering: k-means, Expectation Maximization (M) algorithm, Hierarchical clustering, Correlation clustering, DBSCAN.
Suggested Readings:
e-Resources:
Journals:
1. Journal of the Brazilian Computer Society, SpringerOpen
2. Journal of Internet Services and Applications, SpringerOpen