Big Data & Text Mining

Paper Code: 
DBDA 511 B
Credits: 
3
Periods/week: 
3
Max. Marks: 
100.00
Objective: 

This Course enables the students to

  1. Understand the basic concepts of big data.
  2. Handling and processing big data.
  3. Understand text mining concepts.
  4. Comprehend the concepts and applications of clustering and classification in text mining.

 

Course Outcomes (COs).

Course Outcome (at course level)

 

Learning and teaching strategies

Assessment Strategies

 
 

On completion of this course, the students will:

CO256. Discuss the concepts and applications of big data and analyze various sources of text related big data.

 

CO257. Identify the suitable tool for handling big data in a real time scenario.

CO258. Extract and prepare text data for mining.

CO259. Build and evaluate machine learning models using appropriate metrics.

CO260. Interpret the results, gain insights, and recommend possible actions from analytics performed on text data.

Approach in teaching.

Interactive Lectures, Tutorials, Demonstrations,

Learning activities for the students.

Self-learning assignments, Quizzes, Presentations, Discussions

  • Assignment
  • Classroom activity
  • Multiple choice questions
  • Semester End Examination
 
 

 

9.00

 

Introduction – What is Big Data? Handling and Processing Big Data, Methodological Challenges and Problems faced in handling big data, big data applications, Text based big data. sources of text data, issues and handling of big data.

 

 

9.00
Unit II: 

 

Big Data Overview, Drivers of Big Data, Big Data Attributes, Examples of Big Data Analytics, Introduction to Big Data Tools, Techniques, and Systems. The relationship between Apache Spark and Hadoop Ecosystem, Components of Spark.

 

 

9.00
Unit III: 

 

Introduction to Text Mining, Data preprocessing(Tokenization,Normalization,Stemming), Data cleaning Applications of text mining

Text clustering . Feature Selection and Transformation Methods for Text Clustering, Word and Phrase-based Clustering(K means AND K-Mediods)

 

9.00
Unit IV: 

 

Text Representation (Sequence of words,Syntatic structure, Entities and relation, Logic predicates),Word association mining and analysis. Basic word relations Pradigmatic,syntagmatic,Applications in text mining, Topic mining and analysis.Motivation,Opinion mining and sentiment analysis, Text based prediction

 

 

Unit IV                                                                                                                                                           9 hrs

Text Representation (Sequence of words,Syntatic structure, Entities and relation, Logic predicates),Word association mining and analysis. Basic word relations Pradigmatic,syntagmatic,Applications in text mining, Topic mining and analysis.Motivation,Opinion mining and sentiment analysis, Text based prediction

 

Unit V                                                                                                                                                             9 hrs

Text Classification. Commonly used text classification methods. Decision Trees,SVM Classifiers,Feature Selection for Text Classification

 

9.00
Unit V: 

 

Text Classification. Commonly used text classification methods. Decision Trees,SVM Classifiers,Feature Selection for Text Classification

 

ESSENTIAL READINGS: 
  1. Big Data, Black Book, DT Editorial Services, DreamTech Press 2015
  2. Text Mining with Machine Learning. Principles and Techniques 1st Edition, CRC Press; 1st edition (November 11, 2019)

 

 

REFERENCES: 
  1. Text Data Mining Springer; 1st ed. 2021 edition (May 23, 2021)

 

E RESOURCES

  • NOC:Business Analytics & Text Mining Modeling Using Python, IIT Roorkee: https://nptel.ac.in/courses/110107129

JOURNALS

 

Academic Year: