Back close

Course Detail

Course Name Foundations of Data Science
Course Code 18CS621
Semester 1
Credits Coimbatore
Year Taught 2018


Course Syllabus

Introduction: What is Data Science? Big Data and Data Science – Datafication – Current landscape of perspectives – Skill sets needed; Matrices – Matrices to represent relations between data, and necessary linear algebraic operations on matrices -Approximately representing matrices by decompositions (SVD and PCA); Statistics: Descriptive Statistics: distributions and probability – Statistical Inference: Populations and samples – Statistical modeling – probability distributions – fitting a model – Hypothesis Testing – Intro to R/ Python.

Data preprocessing: Data cleaning – data integration – Data Reduction Data Transformation and Data Discretization.Evaluation of classification methods – Confusion matrix, Students T-tests and ROC curves-Exploratory Data Analysis – Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA – The Data Science Process.

Basic Machine Learning Algorithms: Association Rule mining – Linear Regression- Logistic Regression – Classifiers – k-Nearest Neighbors (k-NN), k-means -Decision tree – Naive Bayes- Ensemble Methods – Random Forest. Feature Generation and Feature Selection – Feature Selection algorithms – Filters; Wrappers; Decision Trees; Random Forests.

Clustering: Choosing distance metrics – Different clustering approaches – hierarchical agglomerative clustering, k-means (Lloyd’s algorithm), – DBSCAN – Relative merits of each method – clustering tendency and quality.

Data Visualization: Basic principles, ideas and tools for data visualization.

Text Books

  1. Cathy O’Neil and Rachel Schutt, “ Doing Data Science, Straight Talk From The Frontline”, O’Reilly, 2014.
  2. Jiawei Han, Micheline Kamber and Jian Pei, “ Data Mining: Concepts and Techniques”, Third Edition. ISBN 0123814790, 2011.
  3. Mohammed J. Zaki and Wagner Miera Jr, “Data Mining and Analysis: Fundamental Concepts and Algorithms”, Cambridge University Press, 2014.


  1. Matt Harrison, “Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization , O’Reilly, 2016.
  2. Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly Media, 2015.
  3. Wes McKinney, “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”, O’Reilly Media, 2012.

‘Foundations of Data Science’ is a Soft Core course offered for the M. Tech. in Computer Science and Engineering program at School of Engineering, Amrita Vishwa Vidyapeetham.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now