Course Objectives:
This course enables the students to
Course Outcomes(COs):
Learning outcomes (at course level) | Learning and teaching strategies | Assessment Strategies |
CO171. Understand Sqoop architecture and uses Able to load real-time data from an RDBMS table/Query on to HDFS Able to write sqoop scripts for exporting data from HDFS onto RDMS tables.
CO172. Understand Apache PIG, PIG Data Flow Engine Understand data types, data model, and modes of execution.
CO173. Store the data from a Pig relation on to HDFS.
CO174. Load data into Pig Relation with or without schema.
CO175. Split, join, filter, and transform the data using pig operators Able to write pig scripts and work with UDFs.
CO176. Understand the importance of Hive, Hive Architecture Able to create Managed, External, Partitioned and Bucketed Tables Able to Query the data, perform joins between tables Understand storage formats of Hive Understand Vectorization in Hive | Approach in teaching: Demonstrations, implementing enquiry based learning, Application based examples
Learning activities for the students: Discussions, Lab Assignments, Exercises based on real world problems.
| Lab Assignments Practical Record Continues Assessment Semester End Examination |
Contents
Implementation of aggregate data model using NOSQL
Implementation of File System for performing data analytics using Hadoop/ Cassandra
Implementation of data model and clients using Hbase{C}{C}{C}{C}
Application Development using Hive
Manipulating files in HDFS pragmatically using the FileSystem API.
Inverted Index MapReduce Application with custom Partitioner and Combiner Custom types and Composite Keys Custom Comparators InputFormats and OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins.
MapReduce job in YARN and Hadoop
Importing data from an RDBMS to HDFS using Sqoop. {C}{C}{C}{C}
Exporting data from HDFS to an Other data integration tools: Flume, Kafka, Informatica, Talend etc.