Back close

Course Detail

Course Name Machine Learning for Social Data Science
Course Code 24SDS602
Program M.Sc. in Social Data Science & Policy
Semester III
Credits 4
Campus Faridabad


Unit I

Introduction to machine learning. Machine learning, data science and artificial intelligence. Building models for policy analysis. Case studies in social data science. Review of relevant concepts in mathematics, statistics and regression analysis.

Unit II

Supervised Learning. Classification. Regression. Fine-Tuning a model. Preprocessing and pipelines.

Unit III

Unsupervised Learning. Clustering. Visualization with hierarchical clustering. Principal component analysis. Discovering interpretable features.

Unit IV

Machine Learning with tree-based models. The bias-variance tradeoff. Decision Trees & ensembles (Random Forests, Bagging, Boosting). Model Tuning.

Unit V

Deep learning. Basics of deep learning and neural networks. Deep learning models for regression and classification. Natural language processing for social data science. Qualitative data and AI. Text preprocessing, classification and annotation, information extraction, opinion mining, text summarization. Text translation (using Whisper AI). Large language models and related tools (ChatGPT).


Prerequisite: Programming for Social Data Science I & II, Research Methods for Policy Studies I & II.

Summary: This course offers an introduction to machine learning tailored for research in social sciences. In the world where the volume of social data is rapidly expanding, mastering machine learning techniques becomes imperative to extract actionable insights for informed policy-making. Machine learning integrates insights from artificial intelligence, probability theory, and statistical inference to automate tasks like pattern recognition and prediction. We’ll explore supervised and unsupervised learning techniques, focusing on the variety of their applications in social research. Ethical considerations surrounding automated analysis and decision-making will be discussed, including their potential to mitigate or exacerbate human biases. Key topics include the bias-variance tradeoff, model selection, cross-validation, regularization, and dimension reduction. Techniques covered range from linear regression variations to tree-based methods and introductory neural networks. Unsupervised methods like principal component analysis and clustering techniques will also be examined. By the end of the course, students will be well acquainted with some of the state-of-the-art toolkits of machine learning and be able to apply them in their own projects.

Course Objectives and Outcomes

Course Objectives:

  1. Understand the fundamentals of machine learning methods.
  2. Describe the statistical theory behind widely used supervised and unsupervised machine learning methods.
  3. Explain the variety of machine learning methods available for social science research.
  4. Identify appropriate machine learning methods to address a variety of research questions.
  5. Learn how to design, train, and deploy machine-learning models to produce insights relevant for addressing societal challenges.

Course Outcomes:

  • CO1: Students will develop a thorough understanding of machine learning principles as they relate to social sciences, enabling them to effectively extract insights from large social datasets.
  • CO2: Students will gain proficiency in selecting and applying appropriate machine learning techniques to address specific research questions and challenges within social data analysis.
  • CO3: Students will be able to critically evaluate the ethical implications associated with the use of machine learning algorithms in social research, including considerations of bias mitigation and fairness.
  • CO4: Students will acquire practical skills in implementing various machine learning algorithms, including supervised and unsupervised learning methods, to analyze social data sets effectively.
  • CO5: Students will be introduced to the foundational concepts of deep learning and natural language processing, gaining familiarity with key principles and applications of these methodologies within social sciences.


  • Data-driven decision-making: through practical application of machine learning techniques, students will acquire the skill to leverage data effectively for evidence-based decision-making in social research and policy formulation, enhancing their capacity to address complex societal challenges.
  • Ethical reasoning: students will develop ethical reasoning skills, enabling them to navigate and address ethical dilemmas inherent in the use of machine learning algorithms within social research, thus promoting responsible and ethical use of data-driven methodologies for societal benefit.

Program outcome PO – Course Outcomes CO Mapping


Program Specific Outcomes PSO – Course Objectives – Mapping


Evaluation Pattern:

Assessment Internal External
Midterm Exam 30
*Continuous Assessment


End Semester 40

*CA – Can be Quizzes, Assignment, Projects, and Reports, and Seminar

Textbooks and Papers

  • Murphy, K. P. (2012). Machine Learning : A Probabilistic Perspective. Cambridge, Mass: The MIT Press. Retrieved from http
  • Géron, A. (2017). Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc.
  • Müller, A. C., & Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media, Inc.
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 112). New York: Springer.
  • D’Orazio, V., Landis, S. T., Palmer, G., & Schrodt, P. (2014). Separating the wheat from the chaff: Applications of automated document classification using support vector machines. Political Analysis, 22(2), 224-242.
  • Jones, Z. M., & Lupu, Y. (2018). Is There More Violence in the Middle?. American Journal of Political Science, 62(3), 652-667.
  • Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 201218772.

Reference Books

  1. Géron, A. (2019). Hands-on machine learning with Scikit-Learn. Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems, 1.
  2. Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., & Scuse, D. (2016). WEKA manual for version 3-9-1. University of Waikato: Hamilton, New Zealand, 1-341.
  3. Hapke, H., Howard, C., & Lane, H. (2019). Natural Language Processing in Action: Understanding, analyzing, and generating text with Python. Simon and Schuster.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now