Big Data and Text Mining

Paper Code: 
25DBDA511B
Credits: 
03
Periods/week: 
03
Max. Marks: 
100.00
Objective: 

This Course enables the students to

1.   Understand the  basic  concepts of big data.

2.   Handling  and  processing big data.

3.   Understand text  mining  concepts.

4.   Comprehend the  concepts and  applications of clustering and  classification in text mining.

 

Course Outcomes: 

Course

Learning outcome

(at course level)

Learning and teaching strategies

Assessment

Strategies

Course

Code

Course

Title

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

25DBDA

511 B

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Big Data and Text Mining (Theory)

CO307.  Identify the applications     of     big data  and   analyze various sources of text data.

CO308.    Compare tools for handling big data in a real  time scenario.

CO309.   Apply     text mining    techniques   to analyse textual data. CO310.         Evaluate methods      for       text representation       and text   mining   in  textual analysis.

CO311.    Build     and evaluate           machine learning     models    for text             classification using            appropriate metrics. CO312.Contribute effectively   in   course- specific  interaction

Approach          in teaching. Interactive Lectures, Demonstrations, Learning activities      for the students. Self-learning assignments, Quizzes, Presentations, Discussions

 Assignment

 Classroom activity

 Multiple choice questions

 Semester End

Examination

 

9.00
Unit I: 

Introduction  Big    data:  What    is   Big   Data?    Handling    and    Processing   Big Data, Methodological Challenges and  Problems faced in handling big data, big data applications, Text based big data. sources of text  data, issues and  handling of big data.

9.00
Unit II: 

Big  Data Tools : Big Data  Overview, Drivers  of Big Data,  Big Data  Attributes, Examples of Big   Data    Analytics,    Introduction   to   Big   Data    Tools,    Techniques,   and    Systems. The relationship between Apache  Spark  and  Hadoop Ecosystem, Components of Spark.

9.00
Unit III: 
Introduction to Text Mining:Text mining concept,Data preprocessing(Tokenization, Normalization, Stemming),  Data  cleaning  Applications of  text  mining  Text
clustering .  Feature Selection and  Transformation Methods for  Text  Clustering, Word  and Phrase-based Clustering(K means AND K-Mediods)
 
9.00
Unit IV: 

Text Representation  and  applications:  Text   Representation  (Sequence  of words, Syntactic structure,  Entities   and  relation, Logic predicates), Word  association mining  and analysis.  Basic   word   relations  Paradigmatic,  syntagmatic,  Applications in  text   mining, Topic  mining    and    analysis.   Motivation,  opinion    mining    and    sentiment  analysis,  Text based prediction

 

9.00
Unit V: 

Text   Classification:    Classification, commonly used   text     classification  methods. Decision  Trees, SVM Classifiers, Feature Selection for Text  Classification

 

ESSENTIAL READINGS: 

1. Big Data, Black Book, DT Editorial  Services, DreamTech Press  2015

2. Text Mining with Machine  Learning. Principles  and  Techniques 1st  Edition,  CRC Press; 1st  edition  (November 11,  2019)

 

REFERENCES: 

SUGGESTED READINGS:

1. Text Data  Mining Springer; 1st  ed.  2021  edition  (May 23,  2021)

e RESOURCES

     1.   NOC:Business Analytics  & Text  Mining Modeling  Using Python, IIT Roorkee:

https://nptel.ac.in/courses/110107129

2.    Datascience.com,textmining:https.//towardsdatascience.com/text-representation-for- data-science-and-text-mining-719ce81f3c84

JOURNALS

1.    Text  and  Data  Mining, Elsevier: https://www.elsevier.com/open-science/research- data/text-and-data-mining

 

Academic Year: