The course will enable the students to
1 Learning basic models of parallel machines and tools
2 How to parallelize programs and how to use basic tools like MPI and POSIX threads.
3 To learn core ideas behind parallel and distributed computing.
4 To explore the methodologies adopted for concurrent and distributed environment.
5 To understand the networking aspects of parallel and distributed computing.
6 To provide an overview of the computational aspects of parallel and distributed computing.
7 To learn parallel and distributed computing models
Course Learning Outcomes (CLOs):
Learning Outcome (at course level) Students will be able to: | Learning and teaching strategies | Assessment Strategies |
| Approach in teaching: Interactive Lectures, Tutorials, Demonstrations, Flipped classes.
Learning activities for the students: Self-learning assignments, Quizzes, Presentations, Discussions
|
|
Introduction, Need, convergence of key trends, structured data Vs. unstructured data , industry examples of big data, web analytics – big data and marketing, fraud and big data, risk and big data, credit risk management, big data and algorithmic trading, big data and its applications in healthcare, medicine, advertising etc.
Open source technologies, cloud and big data, Crowd Sourcing Analytics, inter and trans firewall analytics
Introduction to Hadoop, Data format, analyzing data with Hadoop, scaling out, Hadoop streaming, Hadoop pipes. Design of Hadoop distributed file system (HDFS), HDFS concepts – Java interface, data flow, Data Ingest with Flume and Sqoop. Hadoop I/O – data integrity, compression, serialization, Avro – file-based data structures.
Introduction to Hbase: The Dawn of Big Data, the Problem with Relational Database Systems. Introduction to Cassandra: Introduction to Pig, Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries.
Introduction to NoSQL, aggregate data models, key-value and document data models, relationships, graph databases, schemaless databases, materialized views, distribution models, sharding, master-slave replication, peer-peer replication Consistency: relaxing consistency, version stamps.
MapReduce workflows, unit tests with MRUnit, test data and local tests, anatomy of MapReduce job run, classic Map-reduce – YARN, failures in classic Map-reduce and YARN – job scheduling, shuffle and sort, task execution, MapReduce types – input formats – output formats, MapReduce – partitioning and combining, Composing MapReduce Calculations.