Introduction to Data Science, Causality and Experiments, Data Pre-processing – Data cleaning – Data reduction – Data transformation, Visualization and Graphing: Visualizing Categorical Distributions – Visualizing Numerical Distributions – Overlaid Graphs and plots – Summary statistics of exploratory data analysis, Randomness, Probability, Introduction to Statistics, Sampling, Sample Means and Sample Sizes.
Probability distributions and density functions (univariate and multivariate), Error Probabilities; Expectations and moments; Covariance and correlation; Sampling and Empirical distributions; Permutation Testing, Statistical Inference; Hypothesis testing of means, proportions, variances and correlations – Assessing Models – Decisions and Uncertainty, Comparing Samples – A/B Testing, P-Values, Causality.
Estimation – Resampling and Bootstrap – Confidence Intervals, Properties of Mean – Central Limit Theorem – Variability of mean -Choosing Sample Size, Prediction – Regression – Method of Least Squares – Visual and Numerical Diagnostics – Inference for true slope – Prediction intervals, Classification – Nearest neighbors – accuracy of a classifier, Updating Predictions – Making Decisions – Bayes Theorem, Graphical Models