This course enables the students to Course Outcomes(COs): Learning Outcome (at course level) Learning and teaching strategies Assessment Strategies CO188. Understand Sqoop architecture and uses Able to load real-time data from an RDBMS table/Query on to HDFS Able to write sqoop scripts for exporting data from HDFS onto RDMS tables. CO189. Understand Apache PIG, PIG Data Flow Engine Understand data types, data model, and modes of execution. CO190. Store the data from a Pig relation on to HDFS. CO191. Load data into Pig Relation with or without schema. CO192. Split, join, filter, and transform the data using pig operators Able to write pig scripts and work with UDFs. CO193. Understand the importance of Hive, Hive Architecture Able to create Managed, External, Partitioned and Bucketed Tables Able to Query the data, perform joins between tables Understand storage formats of Hive Understand Vectorization in Hive Approach in teaching: Demonstrations, implementing enquiry based learning, Application based examples Learning activities for the students: Discussions, Lab Assignments, Exercises based on real worldproblems. · Lab Assignments · Practical Record · Continues Assessment · Semester End Examination
1. Implementation of aggregate data model using NOSQL 2. Implementation of File System for performing data analytics using Hadoop/ Cassandra 3. Implementation of data model and clients using Hbase 4. Application Development using Hive 5. Manipulating files in HDFS pragmatically using the File System API. 6. Inverted Index MapReduce Application with custom Partitioner and Combiner Custom types and Composite Keys Custom Comparators InputFormats and OutputFormats Distributed Cache MapReduce Design Patterns Sorting Joins. 7. MapReduce job in YARN and Hadoop 8. Importing data from an RDBMS to HDFS using Sqoop. 9. Exporting data from HDFS to an Other data integration tools: Flume, Kafka, Informatica, Talend etc.