Course Contents
Challenges in data mining – Data pre-processing – An overview of data cleaning methods – Data integration – Data reduction and data transformation – Dimensionality reduction – Linear regression – Regularisation.
Introduction to classification and clustering – Decision trees and random forests – Bayesian classifier – Support vector machines – Neural networks – Metrics for evaluating classifier performance – Model selection using statistical tests of significance – Comparing classifiers based on cost-benefit and ROC curves – Techniques to improve classification accuracy Cluster analysis – Distance measures – k-means and k-Medoids – Agglomerative versus divisive hierarchical clustering – Detecting outliers.
Data visualisation – Bar plots – Histogram – Box plots – Violin plots – Pairplots – Distplot – Scatter plots – Pie charts – Bubble plots – Regression plots – Quantile plots – Heatmaps – Plotting covariance matrices – Waffle chart – Word cloud – PCA – LDA – Manifold learning for data visualisation – t-SNE – UMAP.