Big Data Technologies

Paper Code: 
MCA 522
Credits: 
04
Periods/week: 
04
Max. Marks: 
100.00
Objective: 
  • Understand Big Data and Hadoop ecosystem
  • Work with Hadoop Distributed File System (HDFS)
  • Understand the concept of NOSQL
  • Write MapReduce programs and implementing HBase
10.00
Unit I: 
Understanding Big Data

Introduction, Need, convergence of key trends, structured data Vs. unstructured data , industry examples of big data, web analytics – big data and marketing, fraud and big data, risk and big data, credit risk management, big data and algorithmic trading, big data and its applications in healthcare, medicine, advertising

14.00
Unit II: 
Big Data Technologies: Hadoop

Open source technologies,  cloud and big data, mobile business intelligence, Crowd sourcing analytics, inter and trans firewall analytics

 

Introduction to Hadoop, Data format, analyzing data with Hadoop, scaling out, Hadoop streaming, Hadoop pipes. Design of Hadoop distributed file system (HDFS), HDFS concepts – Java interface, data flow, Data Ingest with Flume and Sqoop. Hadoop I/O – data integrity, compression, serialization, Avro – file-based data structures.

14.00
Unit III: 
Hadoop Related Tools

Introduction to Hbase: The Dawn of Big Data, the Problem with Relational Database Systems. Introduction to Cassandra: The Cassandra Elevator Pitch. Introduction to Pig, Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries.

10.00
Unit IV: 
NOSQL Data Management

Introduction to NoSQL, aggregate data models, key-value and document data models, relationships, graph databases, schemaless databases, materialized views, distribution models, sharding, master-slave replication, peer-peer replication Consistency: relaxing consistency, version stamps

12.00
Unit V: 
Map Reduce Applications

MapReduce workflows, unit tests with MRUnit,  test data and local tests, anatomy of MapReduce job run, classic Map-reduce – YARN,  failures in classic Map-reduce and YARN – job scheduling, shuffle and sort,  task execution, MapReduce types – input formats – output formats, MapReduce – partitioning and combining, Composing MapReduce Calculations.

ESSENTIAL READINGS: 
  • Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses, Michael Minelli, Michelle Chambers, and Ambiga Dhiraj,  Wiley, 2013.
  • NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, P. J. Sadalage and M. Fowler, Pearson Education, Inc. 2012.
  • Hadoop: The Definitive Guide, Tom White, Third Edition, O'Reilley, 2012.
  • Programming Hive, E. Capriolo, D. Wampler, and J. Rutherglen, O'Reilley, 2012.

 

REFERENCES: 
  • HBase: The Definitive Guide, Lars George, O'Reilley, 2011.
  • Cassandra: The Definitive Guide, Eben Hewitt, O'Reilley, 2010.
  • Programming Pig, Alan Gates, O'Reilley, 2011.
Academic Year: