Introduction: What is Data Science? Big Data and Data Science – Datafication – Current landscape of perspectives – Skill sets needed; Matrices – Matrices to represent relations between data, and necessary linear algebraic operations on matrices -Approximately representing matrices by decompositions (SVD and PCA); Statistics: Descriptive Statistics: distributions and probability – Statistical Inference: Populations and samples – Statistical modeling – probability distributions – fitting a model – Hypothesis Testing.
Data pre-processing: Data cleaning – data integration – Data Reduction Data Transformation and Data Discretization. Evaluation of classification methods – Confusion matrix, Students T-tests and ROC curvesExploratory Data Analysis – Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA – The Data Science Process.
Basic Machine Learning Algorithms: Association Rule mining – Linear Regression- Logistic Regression – Classifiers – k-Nearest Neighbors (k-NN), k-means -Decision tree – Naive Bayes- Ensemble Methods – Random Forest. Feature Generation and Feature Selection – Feature Selection algorithms – Filters; Wrappers; Decision Trees; Random Forests.
Data Visualization: Basic principles, ideas and tools for data visualization.
Textbook / References
- Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The Frontline”, O’Reilly, 2014.
- Jiawei Han, MichelineKamber and Jian Pei, “Data Mining: Concepts and Techniques”, Third Edition. ISBN 0123814790, 2011.
- Mohammed J. Zaki and Wagner Miera Jr, “Data Mining and Analysis: Fundamental Concepts and Algorithms”, Cambridge University Press, 2014.
- Matt Harrison, “Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization, O’Reilly, 2016.
- Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly Media, 2015.
- Wes McKinney, “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”, O’Reilly Media, 2012.
- GalitShmueli, Peter C Bruce, InbalYahav, Nitin R Patel, Kenneth C Lichtendahl Jr. “Data Mining for Business Analytics: Concepts, Techniques, and Applications in R” ISBN: 978-1-118-87936-8, Wiley.
Evaluation Pattern 50:50 (Internal: External)
|Periodical 1 (P1)
|Periodical 2 (P2)
|*Continuous Assessment (CA)
|*CA – Can be Quizzes, Assignment, Projects, and Reports.