BIG DATA LAB

Paper Code: 
MCA 326
Credits: 
02
Periods/week: 
04
Max. Marks: 
100.00
Objective: 

This course enables the students to

  1. Define Hadoop and how can it help process large data sets.
  2. Understand how to write MapReduce programs using Hadoop API.
  3. How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API, for effectively loading and processing data in Hadoop.
  4. Ingest data from a RDBMS or a data warehouse to Hadoop.
  5. Inculcate best practices for building, debugging and optimizing Hadoop solutions.
  6. Get acquainted with tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how they can help in BigData projects.

 

Course Outcomes(COs):

 

Learning Outcome (at course level)

 

Learning and teaching strategies

Assessment Strategies

 
 

CO188.        Understand Sqoop architecture and uses Able to load real-time data from an RDBMS table/Query on to HDFS Able to write sqoop scripts for exporting data from HDFS onto RDMS tables.

CO189.        Understand Apache PIG, PIG Data Flow Engine Understand data types, data model, and modes of execution.

CO190.        Store the data from a Pig relation on to HDFS.

CO191.        Load data into Pig Relation with or without schema.

CO192.        Split, join, filter, and transform the data using pig operators Able to write pig scripts and work with UDFs.

CO193.        Understand the importance of Hive, Hive Architecture Able to create Managed, External, Partitioned and Bucketed Tables Able to Query the data, perform joins between tables Understand storage formats of Hive Understand Vectorization in Hive

Approach in teaching:

Demonstrations, implementing enquiry based learning, Application based examples

 

Learning activities for the students:

Discussions, Lab Assignments, Exercises based on real worldproblems.

 

·  Lab Assignments

·  Practical Record

·  Continues Assessment

·  Semester End Examination

 

 

1.     Implementation of aggregate data model using NOSQL

 

2.     Implementation of File System for performing data analytics using Hadoop/ Cassandra

 

3.     Implementation of data model and clients using Hbase

 

4.     Application Development using Hive

 

5.     Manipulating files in HDFS pragmatically using the File System API.

 

6.     Inverted Index MapReduce Application with custom Partitioner and Combiner Custom types and Composite Keys Custom Comparators InputFormats and OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins.

7.      MapReduce job in YARN and Hadoop

8.     Importing data from an RDBMS to HDFS using Sqoop.

          9.      Exporting data from HDFS to an Other data integration tools: Flume, Kafka, Informatica, Talend etc.

ESSENTIAL READINGS: 

  • Big Data, Black Book, DT Editorial Services, DreamTech Press 2016.
  • Professional NOSQL, Shashank Tiwari, Wrox, September 2011.
  • Big Data and Analytics, 2ed, Subhashini Chellappan, Seema Acharya, Wiley, 2019.

REFERENCES: 

  • HBase: The Definitive Guide, 2e, Lars George, O'Reilley, 2014.
  • Programming Pig, Alan Gates, O'Reilley, 2017.
  • NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, P. J. Sadalage and M. Fowler, Pearson Education, Inc. 2012.
  • Programming Hive, 2e, E. Capriolo, D. Wampler, and J. Rutherglen, O'Reilley, 2017.

Academic Year: