COURSE NAME: Statistical Methods In Genetics & Bioinformatics
PROGRAM: MSc Bioinformatics


Describing Data: Introduction to the practice of data analysis. Topics include descriptive statistics such as mean, median, range, inter-quartile range and standard deviation as well as basic descriptive plots such as histogram, barplot, dotplot, scatterplot and pairwise scatterplot, Basic Probability: Introduction to probability concepts such as experiments, trials, outcomes, events and sample spaces. Initial focus is on simple probability spaces (coin flips and dice rolls) to give students a concrete understanding of the concepts, Probability Theory: Concept of probability: sample space and events, independent events, mutually exclusive events. axioms of probability, conditional probability, additional and multiplication theorem of probability, Baye's theorem, Introduction to Markov Chain Model. Meaning and objective of sampling, Sampling Error, Types of Sampling, Sampling Distribution, Sampling Distribution of Sample Mean and Sample Proportion, Standard Error Probability Distribution: Bernoulli trials, binomial distribution, normal distributions, Poisson distribution, & The Central Limit Theorem: From the simple notion of probability distributions developed in UNIT 2, the ideas are generalized to continuous distributions with the primary focus on the normal probability distribution culminating in a thorough discussion of the Central Limit Theorem and it’s utility to data analysts, Introduction To Inference: Basic introduction to statistical inference focusing on one- and two- sample problems. This unit includes discussion of the z- and t-test, F-test, chi-square test as a well as nonparametric methods including Wilcoxon Rank Sum and Kolmogorov-Smirnov tests, Anova: Introduction to the analysis of grouped data. Focus is on one- and two-way designs and their analysis. Additional topics in repeated measures designs for advanced students. Correlation and Regression: Principles of least squares, scatter diagram, correlation, covariance, correlation coefficient, properties of correlation coefficient, regression, properties of linear regression, rank correlation, multiple correlation. Regression: Introduction to regression analysis. Topics include multiple regression, model selection, and special fitting techniques such as robust estimation, local regression and regression splines. Multivariate Methods: Introduction to multivariate methods useful to bioinformatics including k-means clustering and principal components analysis. Bioinformatics Applications: Introduction to the analysis of bioinformatics data. Topics include end-to-end analysis of microarray gene expression data including data quality considerations, RNA degradation in Affymetrics Chips, 2-color CDNA arrays, data normalization and summarization and differential expression testing and annotation.


  1. Introduction to the Practice of Statistics by Moore and McCabe
  2. Course Manuals: S-PLUS Command Line Essentials, The Analysis of Microarrays