Big Data and Text Mining (Theory)

Paper Code: 
24DBDA511B
Credits: 
03
Periods/week: 
03
Max. Marks: 
100.00
Objective: 

This Course enables the students to

  1. Understand the basic concepts of big data.
  2. Handling and processing big data.
  3. Understand text mining concepts.
  4. Comprehend the concepts and applications of clustering and classification in text mining.

 

Course Outcomes: 

Course

Learning outcome

(at course level)

Learning and teaching strategies

Assessment Strategies

Course Code

Course

Title

24DBDA 511 B

Big Data and Text Mining

(Theory)

CO307. Identify the applications of big data and analyze various sources of text data.

CO308. Compare tools for handling big data in a real time scenario.

CO309. Apply text mining techniques to analyse textual data.

CO310. Evaluate methods for text representation and text mining in textual analysis.

CO311. Build and evaluate machine learning models for text classification using appropriate metrics.

CO312.Contribute effectively in course-specific interaction

Approach in teaching.

Interactive Lectures, Demonstrations,

Learning activities for the students.

Self-learning assignments, Quizzes, Presentations, Discussions

  • Assignment
  • Classroom activity
  • Multiple choice questions
  • Semester End Examination
 

 

9.00
Unit I: 
Introduction Big data:

What is Big Data? Handling and Processing Big Data, Methodological Challenges and Problems faced in handling big data, big data applications, Text based big data. sources of text data, issues and handling of big data.

9.00
Unit II: 
Big Data Tools :

Big Data Overview, Drivers of Big Data, Big Data Attributes, Examples of Big Data Analytics, Introduction to Big Data Tools, Techniques, and Systems. The relationship between Apache Spark and Hadoop Ecosystem, Components of Spark.

9.00
Unit III: 
Introduction to Text Mining:

Text mining concept, Data preprocessing (Tokenization, Normalization, Stemming), Data cleaning Applications of text mining.

Text clustering . Feature Selection and Transformation Methods for Text Clustering, Word and Phrase-based Clustering(K means AND K-Mediods).

 

9.00
Unit IV: 
Text Representation and applications:

Text Representation (Sequence of words, Syntactic structure, Entities and relation, Logic predicates), Word association mining and analysis. Basic word relations Paradigmatic, syntagmatic, Applications in text mining, Topic mining and analysis. Motivation, opinion mining and sentiment analysis, Text based prediction.

 

9.00
Unit V: 
Text Classification:

Classification, commonly used text classification methods. Decision Trees, SVM Classifiers, Feature Selection for Text Classification.

 

ESSENTIAL READINGS: 

SUGGESTED TEXT BOOKS

  1. Big Data, Black Book, DT Editorial Services, DreamTech Press 2015
  2. Text Mining with Machine Learning. Principles and Techniques 1st Edition, CRC Press; 1st edition (November 11, 2019)

 

REFERENCES: 

SUGGESTED READINGS:

  1. Text Data Mining Springer; 1st ed. 2021 edition (May 23, 2021)

e RESOURCES

  1. NOC:Business Analytics & Text Mining Modeling Using Python, IIT Roorkee: https://nptel.ac.in/courses/110107129
  2. Datascience.com,textmining:https.//towardsdatascience.com/text-representation-for-data-science-and-text-mining-719ce81f3c84

JOURNALS

  1. Text and Data Mining, Elsevier: https://www.elsevier.com/open-science/research-data/text-and-data-mining

 

Academic Year: