Back close

Course Detail

Course Name Foundations of AI & Data Science
Course Code 25SDS514
Program M.Sc. in Social Data Science & Policy
Semester 2
Credits 4
Campus Faridabad

Syllabus

Unit 1

Unit IExploratory Data Analysis (EDA) and Statistical ThinkingCentral tendency, dispersion, distributions; Sampling, hypothesis testing, correlation; Missing data handling and data cleaning principles; Correlations and Heat Maps, Missing data: mechanisms and strategies [10 hrs]

Unit 2

Unit II Introductory Modeling Paradigms What is a model? Understanding abstraction and representation; Supervised, Unsupervised, and Reinforcement Learning (conceptual overview); Components of a learning system: input features, labels, and loss functions. Overfitting vs underfitting; bias-variance tradeoff; The role of modeling in decision-making (prediction vs inference); Curse of Dimensionality and dimensionality reduction (PCA) [15 hrs]

Unit 3

Unit IIIModel EvaluationUse of models in policy cases: binary classification (e.g., loan eligibility), grouping (e.g., livelihood clusters); Model evaluation metrics: accuracy, precision, recall, F1-score, Regression metrics; Interpreting models for policy impact; Ethics – Transparency, accountability, and reproducibility [6 hrs]

Unit 4

Unit IVData Modeling and Storage ModelsEntity-Relationship (ER) modeling: entities, attributes, relationships; Cardinality, integrity constraints, composite and derived attributes; Mapping ER diagrams to logical schemas; Basics of normalization (1NF, 2NF, 3NF); Graph models; Network Models [15 hrs]

Unit 5

Unit VFoundations of NLPText preprocessing: tokenization, stopword removal, stemming; Representing text: Bag-of-Words, TF-IDF; Introduction to word embeddings (Word2Vec conceptual); Applied tasks: sentiment analysis, basic topic modeling (LDA); Concepts of Large Language Models and prompt engineering; [10 hrs]

Text Books / References

Reference Books:

  1. Python for Social Scientists https://gawron.sdsu.edu/python_for_ss/
  2. Core Python Programming https://www.udemy.com/course/core-python-3-and-oop-course-for-absolute- beginners/
  3. Cioffi-Revilla, C. (2014). Introduction to computational social science. Springer Verlag London Limited. https://link.springer.com/content/pdf/10.1007/978-3-319-50131-4.pdf
  4. VanderPlas, J. (2016). Python data science handbook: Essential tools for working with data. ” O’Reilly Media, Inc.”
  5. Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.

References:

  1. Burke, Moira, and Robert Kraut. “Using Facebook after losing a job: Differential benefits of strong and weak ties.” In Proceedings of the 2013 conference on Computer supported cooperative work, pp. 1419-1430. 2013.
  2. Jakesch, Maurice, Advait Bhat, Daniel Buschek, Lior Zalmanson, and Mor Naaman. “Co-writing with opinionated language models affects users views.” In Proceedings of the 2023 CHI conference on human factors in computing systems, pp. 1-15. 2023.
  3. Ziems, Caleb, et al. “Can large language models transform computational social science?.” Computational Linguistics 50.1 (2024): 237-291.
  4. Chang, Ray M., Robert J. Kauffman, and YoungOk Kwon. “Understanding the paradigm shift to computational social science in the presence of big data.” Decision support systems 63 (2014): 67-80.
  5. Chandrasekharan, Eshwar, Umashanthi Pavalanathan, Anirudh Srinivasan, Adam Glynn, Jacob Eisenstein, and Eric Gilbert. “You can’t stay here: The efficacy of reddit’s 2015 ban examined through hate speech.” Proceedings of the ACM on human-computer interaction 1, no. CSCW (2017): 1-22.

Introduction

Prerequisite: Programming for Social Data Science – I

This course is a continuation of Programming for Social Data Science I and introduces students to the core principles, tools, and reasoning approaches in data science, with a specific focus on social and policy- relevant applications. It emphasizes exploratory data analysis (EDA), statistical thinking, introductory modeling paradigms, and data modeling practices essential for understanding and working with real-world datasets.

Given that much of social/policy data is textreports, tweets, interviews, news, etc.the course also includes foundational exposure to natural language processing (NLP), enabling students to interpret unstructured data as they embark on learning data science. A strong ethical framework is interwoven throughout the course to ensure students critically engage with issues of fairness, transparency, and accountability in data-driven decision-making. By integrating conceptual theory with applied analysis, the course prepares students to contribute meaningfully to data- informed governance, policy evaluation, and social research.

Objectives and Outcomes

Course Objectives:

  1. Understanding the approaches to utilizing qualitative data for shaping social science theories and hypotheses
  2. Understand the application of qualitative programming in Social Data Science
  3. Define and understand basic procedures for the preparation, cleaning, and analyzing of qualitative data
  4. Implement and use functions and operate on qualitative files to read

Course Outcomes:

CO1: Perform exploratory data analysis and apply statistical reasoning to summarize and interpret social or administrative datasets.

CO2: Explain foundational modeling paradigms, including supervised, unsupervised, and reinforcement learning. CO3: Analyze classical models such as regression and clustering to support decision-making in policy contexts.

CO4: Design data storage models or schemas like ER diagrams and apply standardization principles (like normalization) to structure social data effectively.

CO5: Learn and apply basic NLP tasks, including preprocessing, to extract insights from unstructured text. CO6: Evaluate the ethical implications of data-driven systems in relation to bias, privacy, and responsible use.

Skills:

  • Structured thinking: students will learn to structure sets of qualitative data concerning a social problem in an empirical, reproducible way, that allows for a reliable conclusion.
  • Scientific communication: students will enhance their ability to summarize large quantities of written or auditive data, incorporating broader patterns as well as specific exemplary excerpts in order to communicate meaningful conclusions.

Program outcome PO – Course Outcomes CO Mapping

PO1

PO2

PO3

PO4

PO5

PO6

PO7

PO8

CO1

X

CO2

X

CO3

X

CO4

X

Program Specific Outcomes PSO – Course Objectives – Mapping

PSO1

PSO2

PSO3

PSO4

PSO5

CO1

X

CO2

X

CO3

X

CO4

X

Evaluation Pattern

Assessment

Internal

External

Midterm Evaluation

25

Continuous Assessments (theory + lab)

15

Capstone Project

20

End Semester

40

*CA – Can be Quizzes, Assignment, Projects, and Reports, and Seminar

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now