BIG DATA TECHNOLOGIES

Paper Code:

24MCA322

Credits:

Periods/week:

Max. Marks:

100.00

Objective:

This Course enables the students to

1. Define the basic concepts of big data.

2. Understand the concepts of big data technologies.

3. Introduce the tools required to manage and analyze big data

4. Relate data management by RDBMS & NOSQL.

5. Generate applications using map reduce.

6. Develop skills to solve complex real world problems.

Course Outcomes:

Course

Learning Outcome (at course level)

Learning and teaching strategies

Assessment Strategies

Course Code

Course

Title

24MCA 322

Big

Data Technologies

(Theory)

Identify the basic concepts of big data.
Describe the concepts of big data technologies.
Choose how to use tools to manage big data.
Compare different tools used in Big Data Analytics.
Solve data management using NOSQL and Develop new applications using map reduce
Contribute effectively in course-specific interaction

Approach in teaching:

Interactive Lectures, Discussion, Demonstration with real world examples, Role plays, tool based experiment

Learning activities for the students:

Self-learning assignments, Quiz activity, Effective questions, case study based learning approach, presentation, flip classroom

Assignment
Written test in classroom
Classroom activity
Multiple choice questions

Semester End Examination

10.00

Unit I:

Understanding Big Data

Introduction, Need, convergence of key trends, structured data Vs. unstructured data , industry examples of big data, web analytics – big data and marketing, fraud and big data, risk and big data, credit risk management, big data and algorithmic trading, big data and its applications in healthcare, medicine, advertising etc.

14.00

Unit II:

Big Data Technologies: Hadoop

Open source technologies, cloud and big data, Crowd Sourcing Analytics, inter and trans firewall analytics

Introduction to Hadoop, Data format, analyzing data with Hadoop, scaling out, Hadoop streaming, Hadoop pipes. Design of Hadoop distributed file system (HDFS), HDFS concepts – Java interface, data flow, Data Ingest with Flume and Sqoop. Hadoop I/O – data integrity, compression, serialization, Avro – file-based data structures.

14.00

Unit III:

Hadoop Related Tools:

Introduction to HBase: The Dawn of Big Data, the Problem with Relational Database Systems. Introduction to Cassandra: Introduction to Pig, Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries

10.00

Unit IV:

NoSQL Data Management:

Introduction to NoSQL, aggregate data models, key-value and document data models, relationships, graph databases, schemaless databases, materialized views, distribution models, sharding, master-slave replication, peer-peer replication Consistency: relaxing consistency, version stamps

12.00

Unit V:

Map Reduce Applications:

MapReduce workflows, unit tests with MRUnit, test data and local tests, anatomy of MapReduce job run, classic Map-reduce – YARN, failures in classic Map-reduce and YARN – job scheduling, shuffle and sort, task execution, MapReduce types – input formats – output formats, MapReduce – partitioning and combining, Composing MapReduce Calculations.

ESSENTIAL READINGS:

Essential Readings:

Big Data, Black Book, DT Editorial Services, DreamTech Press 2016.
Professional NOSQL, Shashank Tiwari, Wrox, September 2011.
Big Data and Analytics, 2ed, Subhashini Chellappan, Seema Acharya, Wiley, 2019.

REFERENCES:

Suggested Readings:

HBase: The Definitive Guide, 2e, Lars George, O'Reilley, 2014.
Programming Pig, Alan Gates, O'Reilley, 2017.
NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, P. J. Sadalage and M. Fowler, Pearson Education, Inc. 2012.
Programming Hive, 2e, E. Capriolo, D. Wampler, and J. Rutherglen, O'Reilley, 2017.

E-Resources:

NPTEL MOOC Course on Big Data Computing, Prof. Rajiv Misra, Department of Computer science and Engineering, IIT Patna, https://nptel.ac.in/courses/106104189.
MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat jeff@google.com, sanjay@google.com Google, Inc., http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduce-osdi04.pdf
Map Reduce (hands on), Zbigniew Baranowski, https://cds.cern.ch/record/2041327; CERN Document Server, NDLI
Big Data - Tim Smith, TEDEd Animation, https://ed.ted.com/lessons/exploration-on-the-big-data-frontier-tim-smith.
Introduction to Hadoop, Matthias Braeger, CERN school on Grid and Advanced Information Systems, NDLI, https://edms.cern.ch/document/1283414/1

Journals (International / National):

1.Journal of big data (Springer) (Open Access), CERN Document Server, NDLI, https://link. springer.com/journal/40537/volumes-and-issues

Big Data Research, Elsevier, https://www.journals.elsevier.com/big-data-research
Frontiers in Big Data (Open Access), https://www.frontiersin.org/journals/big-data
International Journal of Big Data Management, Inderscience Publishers, https://www.inderscience.com/jhome.php?jcode=ijbdm (The journal also publishes some open access papers)

Academic Year:

2024-25

Header Menu

Programmes

Department Events

1.Journal of big data (Springer) (Open Access), CERN Document Server, NDLI, https://link. springer.com/journal/40537/volumes-and-issues

Department News

Pages

Footer Menu

Follow Computer Science & IT on:

IIS (Deemed to be University)

Header Menu

Search form

You are here

Programmes

Department Events

BIG DATA TECHNOLOGIES

1.Journal of big data (Springer) (Open Access), CERN Document Server, NDLI, https://link. springer.com/journal/40537/volumes-and-issues

Department News

Pages

Footer Menu

Follow Computer Science & IT on:

IIS (Deemed to be University)