Syllabus
Cloud computing overview: Definitions, benefits, and challenges – Service models: IaaS, PaaS, SaaS – Deployment models: Public, Private, Hybrid, Community – Virtualization: Hypervisors, VM management, Containers (Docker, Kubernetes) – Cloud storage systems: S3, Blob storage, HDFS- Case studies: AWS, Microsoft Azure, Google Cloud Platform
Big Data characteristics: Volume, Velocity, Variety, Veracity, Value – Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig – Apache Spark: RDDs, DataFrames, MLlib, Streaming – NoSQL Databases: HBase, Cassandra, MongoDB – Data ingestion tools: Flume, Sqoop, Kafka – Hands-on Labs: Basic Hadoop and Spark jobs using sample datasets
Big Data analytics pipeline in the cloud – Data Lake architecture and storage options – Scalable machine learning in the cloud (ML on AWS, Azure ML, Google AI Platform) – Serverless computing and Lambda functions – Real-time analytics using Spark Streaming / Apache Flink – Security, privacy, and compliance in cloud-based big data systems – Case studies: Recommendation engines, IoT analytics, social media mining.