## Course Detail

 Course Name Data Science Course Code 19EAC313 Program B. Tech. in Electronics and Computer Engineering Semester 6 Year Taught 2019

### Syllabus

##### Module I

Introduction: What is Data Science? Big Data and Data Science – Datafication – Current landscape of perspectives – Skill sets needed; Matrices – Matrices to represent relations between data, and necessary linear algebraic operations on matrices -Approximately representing matrices by decompositions (SVD and PCA); Statistics: Descriptive Statistics: distributions and probability – Statistical Inference: Populations and samples – Statistical modeling – probability distributions – fitting a model – Hypothesis Testing.

##### Module II

Data pre-processing: Data cleaning – data integration – Data Reduction Data Transformation and Data Discretization. Evaluation of classification methods – Confusion matrix, Students T-tests and ROC curvesExploratory Data Analysis – Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA – The Data Science Process.

##### Module III

Basic Machine Learning Algorithms: Association Rule mining – Linear Regression- Logistic Regression – Classifiers – k-Nearest Neighbors (k-NN), k-means -Decision tree – Naive Bayes- Ensemble Methods – Random Forest. Feature Generation and Feature Selection – Feature Selection algorithms – Filters; Wrappers; Decision Trees; Random Forests.

Data Visualization: Basic principles, ideas and tools for data visualization.

### Objectives and Outcomes

Course Objectives

• To gain useful conclusions from large and diverse data sets through exploration, prediction, and inference.

Course Outcomes

• CO1: Ability to understand the statistical foundations of data science.
• CO2: Ability to apply pre-processing techniques over raw data so as to enable further analysis.
• CO3: Ability to conduct exploratory data analysis and create insightful visualizations to identify patterns.
• CO4: Ability to identify machine learning algorithms for predictions and classification.
• CO5: Ability to analyze the degree of certainty of predictions using statistical test and models

CO – PO Mapping

 PO/PSO/ CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 CO1 1 3 2 CO2 1 1 1 3 3 2 CO3 3 1 1 2 3 3 2 CO4 3 1 1 2 2 3 2 CO5 3 3 1 3 2 3 2

### Textbook / References

Textbook(s)

• Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The Frontline”, O’Reilly, 2014.
• Jiawei Han, MichelineKamber and Jian Pei, “Data Mining: Concepts and Techniques”, Third Edition. ISBN 0123814790, 2011.
• Mohammed J. Zaki and Wagner Miera Jr, “Data Mining and Analysis: Fundamental Concepts and Algorithms”, Cambridge University Press, 2014.
• Matt Harrison, “Learning the Pandas Library: Python Tools for Data Munging, Analysis, and Visualization, O’Reilly, 2016.
• Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly Media, 2015.
• Wes McKinney, “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython”, O’Reilly Media, 2012.
• GalitShmueli, Peter C Bruce, InbalYahav, Nitin R Patel, Kenneth C Lichtendahl Jr. “Data Mining for Business Analytics: Concepts, Techniques, and Applications in R” ISBN: 978-1-118-87936-8, Wiley.

Evaluation Pattern 50:50 (Internal: External)

 Assessment Internal External Periodical 1 (P1) 15 – Periodical 2 (P2) 15 – *Continuous Assessment (CA) 20 – End Semester – 50 *CA – Can be Quizzes, Assignment, Projects, and Reports.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.