Big Data Lab

Paper Code: 
MCA 326
Credits: 
02
Periods/week: 
04
Max. Marks: 
100.00
Objective: 

Course Objectives:

This course enables the students to

  1. Define Hadoop and how can it help process large data sets.
  2. Understand how to write MapReduce programs using Hadoop API.
  3. How to use HDFS (the Hadoop Distributed Filesytem), from the command line and API, for effectively loading and processing data in Hadoop.
  4. Ingest data from a RDBMS or a data warehouse to Hadoop.
  5. Inculcate best practices for building, debugging and optimizing Hadoop solutions.
  6. Get acquainted with tools like Pig, Hive, HBase, Elastic MapReduce etc. and understand how they can help in BigData projects.

 

 

Course Outcomes(COs):

Learning outcomes

(at course level)

Learning and teaching strategies

Assessment

Strategies

CO171. Understand Sqoop architecture and uses Able to load real-time data from an RDBMS table/Query on to HDFS Able to write sqoop scripts for exporting data from HDFS onto RDMS tables.

 

CO172. Understand Apache PIG, PIG Data Flow Engine Understand data types, data model, and modes of execution.

 

CO173. Store the data from a Pig relation on to HDFS.

 

CO174. Load data into Pig Relation with or without schema.

 

CO175. Split, join, filter, and transform the data using pig operators Able to write pig scripts and work with UDFs.

 

CO176. Understand the importance of Hive, Hive Architecture Able to create Managed, External, Partitioned and Bucketed Tables Able to Query the data, perform joins between tables Understand storage formats of Hive Understand Vectorization in Hive

Approach in teaching:

Demonstrations, implementing enquiry based learning, Application based examples

 

Learning activities for the students:

Discussions, Lab Assignments, Exercises based on real world problems.

 

Lab Assignments

Practical Record

Continues Assessment

Semester End Examination

 

Contents

Implementation of aggregate data model using NOSQL

Implementation of File System for performing data analytics using Hadoop/ Cassandra

Implementation of data model and clients using Hbase{C}{C}{C}{C}

Application Development using Hive

Manipulating files in HDFS pragmatically using the FileSystem API.

Inverted Index MapReduce Application with custom Partitioner and Combiner Custom types and Composite Keys Custom Comparators InputFormats and OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins.

MapReduce job in YARN and Hadoop

Importing data from an RDBMS to HDFS using Sqoop. {C}{C}{C}{C}

 Exporting data from HDFS to an Other data integration tools: Flume, Kafka, Informatica, Talend etc.

Academic Year: