Syllabus
Unit 1
Introduction to Big Data: Types of Digital Data – Characteristics of Data – Evolution of Big Data – Definition of Big Data – Challenges with Big Data-3Vs of Big Data – Non Definitional traits of Big Data – Business Intelligence vs. Big Data – Data warehouse and Hadoop environment – Coexistence. Big Data Analytics: Classification of analytics – Data Science – Terminologies in Big Data – CAP Theorem – BASE Concept. NoSQL: Types of Databases – Advantages – NewSQL – SQL vs. NOSQL vs NewSQL. Introduction to Hadoop: Features – Advantages – Versions – Overview of Hadoop Eco systems – Hadoop distributions – Hadoop vs. SQL – RDBMS vs. Hadoop – Hadoop Components – Architecture – HDFS – Map Reduce: Mapper – Reducer – Combiner – Partitioner – Searching – Sorting – Compression. Hadoop 2 (YARN): Architecture – Interacting with Hadoop Eco systems.
Unit 2
No SQL databases: Mongo DB: Introduction – Features – Data types – Mongo DB Query language – CRUD operations – Arrays – Functions: Count – Sort – Limit – Skip – Aggregate – Map Reduce. Cursors – Indexes – Mongo Import – Mongo Export. Cassandra: Introduction – Features – Data types – CQLSH – Key spaces – CRUD operations – Collections – Counter – TTL – Alter commands – Import and Export – Querying System tables.
Unit 3
Hadoop Eco systems: Hive – Architecture – data type – File format – HQL – SerDe – User defined functions – Pig: Features – Anatomy – Pig on Hadoop – Pig Philosophy – Pig Latin overview – Data types – Running pig – Execution modes of Pig – HDFS commands – Relational operators – Eval Functions – Complex data type – Piggy Bank – User defined Functions – Parameter substitution – Diagnostic operator. Jasper Report: Introduction – Connecting to Mongo DB – Connecting to Cassandra – Introduction to Machine learning: Linear Regression- Clustering – Collaborative filtering – Association rule mining – Decision tree.