Course Title: 
Foundations of Data Science
Course Code: 
Year Taught: 
Postgraduate (PG)
School of Engineering

'Foundations of Data Science' is a Soft Core course offered for the M. Tech. in Computer Science and Engineering program at School of Engineering, Amrita Vishwa Vidyapeetham.

Introduction: What is Data Science? Big Data and Data Science – Datafication - Current landscape of perspectives - Skill sets needed; Matrices - Matrices to represent relations between data, and necessary linear algebraic operations on matrices -Approximately representing matrices by decompositions (SVD and PCA); Statistics: Descriptive Statistics: distributions and probability - Statistical Inference: Populations and samples - Statistical modeling - probability distributions - fitting a model - Hypothesis Testing - Intro to R/ Python.

Data preprocessing: Data cleaning - data integration - Data Reduction Data Transformation and Data Discretization.Evaluation of classification methods – Confusion matrix, Students T-tests and ROC curves-Exploratory Data Analysis - Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA - The Data Science Process.

Basic Machine Learning Algorithms: Association Rule mining - Linear Regression- Logistic Regression - Classifiers - k-Nearest Neighbors (k-NN), k-means -Decision tree - Naive Bayes- Ensemble Methods - Random Forest. Feature Generation and Feature Selection - Feature Selection algorithms - Filters; Wrappers; Decision Trees; Random Forests.

Clustering: Choosing distance metrics - Different clustering approaches - hierarchical agglomerative clustering, k-means (Lloyd's algorithm), - DBSCAN - Relative merits of each method - clustering tendency and quality.

Data Visualization: Basic principles, ideas and tools for data visualization.



  1. Cathy O'Neil and Rachel Schutt, “ Doing Data Science, Straight Talk From The Frontline”, O'Reilly, 2014.
  2. Jiawei Han, Micheline Kamber and Jian Pei, “ Data Mining: Concepts and Techniques”, Third Edition. ISBN 0123814790, 2011.
  3. Mohammed J. Zaki and Wagner Miera Jr, “Data Mining and Analysis: Fundamental Concepts and Algorithms”, Cambridge University Press, 2014.
  4. Matt Harrison, “Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization , O'Reilly, 2016.
  5. Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly Media, 2015.
  6. Wes McKinney, “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”, O'Reilly Media, 2012.