Course Syllabus
Introduction: Parallel and Distributed architectures, models, complexity measures, Communication aspects, A Taxonomy of Distributed Systems – Models of computation: shared memory and message passing systems, synchronous and asynchronous systems, Global state and snapshot algorithms.
Distributed and Parallel databases : Centralized versus Distributed Systems, Parallel versus Distributed Systems, Distributed Database architectures-Shared disk, Shared nothing, Distributed Database Design – Fragmentation and Allocation, Optimization.
Query Processing and Optimization – Parallel/Distributed Sorting, Parallel/Distributed Join, Parallel/Distributed Aggregates, Network Partitions, Replication, Publish/Subscribe systems- Case study on Apache Kafka Distributed Publish/Subscribe messaging Hadoop and Map Reduce – Data storage and analysis, Design and concepts of HDFS, YARN, MapReduce workflows and Features, Setting up a Hadoop cluster.