Syllabus
Unit 1
Introduction, Causality and Experiments, Data Preprocessing: Data cleaning, Data reduction, Data transformation, Data discretization. Visualization and Graphing: Visualizing Categorical Distributions, Visualizing Numerical Distributions, Overlaid Graphs, plots, and summary statistics of exploratory data analysis. (15 hrs).
Unit 2
Randomness, Probability, Introduction to Statistics, Sampling, Sample Means and Sample Sizes, Descriptive statistics – Central tendency, dispersion, variance, covariance, kurtosis, five-point summary, Distributions, Bayes Theorem, Error Probabilities; Permutation Testing, Statistical Inference. (15 hrs)
Unit 3
Hypothesis Testing, Decisions and Uncertainty, Comparing Samples, A/B Testing, P-Values, Causality, Frequency Analysis, Assessing Models, Estimation, Prediction, Confidence Intervals, Inference for Regression, Classification, Graphical Models, Updating Predictions. (15 hrs)
Objectives and Outcomes
Course Outcomes:
CO1: Understand the various data visualization methods.
CO2: Understand the basics of the descriptive statistics.
CO3: Understand and apply the basic concepts of correlations and regressions to the given data.
CO4: Understand and apply the basic concepts of sampling techniques and simple hypothetical testing to the given data.
CO-PO Mapping
| PO/PSO |
PO1 |
PO2 |
PO3 |
PO4 |
PO5 |
PO6 |
PO7 |
PO8 |
PO9 |
PO10 |
PO11 |
PSO1 |
PSO2 |
PSO3 |
| CO |
| CO1 |
2 |
2 |
|
|
|
|
|
|
|
|
|
|
1 |
|
| CO2 |
2 |
2 |
|
|
|
|
|
|
|
|
|
|
1 |
|
| CO3 |
2 |
3 |
|
|
|
|
|
|
|
|
|
|
2 |
|
| CO4 |
2 |
3 |
|
|
|
|
|
|
|
|
|
|
2 |
|
Text Books / References
Textbook(s)
Adi Adhikari and John DeNero, “Computational and Inferential Thinking: The Foundations of Data Science”, e-book.
Reference(s)
- Data Mining for Business Analytics: Concepts, Techniques and Applications in R, by Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtendahl Jr., Wiley India, 2018.
- Rachel Schutt & Cathy O’Neil, “Doing Data Science” O’ Reilly, First Edition, 2013.