Back close

Course Detail

Course Name Reinforcement Learning
Course Code 25CSC437
Program 5 Year Integrated M.Sc in Data Science, Integrated M. Sc. Mathematics and Computing
Credits 3
Campus Coimbatore

Syllabus

Unit 1

Introduction to Reinforcement learning, Markov Decision Process (MDP) – Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs, Planning by Dynamic programming (DP) – Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions, model-free prediction and control.

 

Unit 2

Integrating planning with learning – Model-based RL, Integrated Architecture and Simulation-based Search, Monte-Carlo (MC) Learning, Exploration and exploitation – Multi-arm Bandits, Contextual Bandits and MDP Extensions, integrating AI search and learning – Classical Games: Combining Minimax Search and RL.

 

Unit 3

Hierarchical RL – Semi-Markov Decision Process, Learning with Options, Deep RL – Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Double Q-Learning, Multi-agent RL – Cooperative vs. Competitive Settings, Mixed Setting.

Objectives and Outcomes

Course Objectives

 

  • This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning.
  • It involves larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms.

 

Course Outcomes

 

CO1: Understand Markov decision process and reinforcement learning.

CO2: Apply AI search, planning, and learning.

CO3: Apply Hierarchical learning techniques.

CO4: Analyze Q-learning and multi-agent systems.

 

CO-PO Mapping

  PO/PSO

PO1

PO2

PO3

PO4

PO5

PO6

PO7

PO8

PO9

PO10

PO11

PO12

PSO1

PSO2

CO

CO1

3

2

3

3

2

0

0

2

2

2

0

0

3

3

CO2

3

2

3

3

3

0

0

2

2

2

0

0

3

3

CO3

3

2

3

3

3

0

0

2

2

2

0

0

3

3

CO4

3

2

3

3

3

0

0

2

2

2

0

0

3

3

Evaluation Pattern

Evaluation Pattern: 70:30

Assessment

Internal

End Semester

Midterm

20

 

Continuous Assessment – Theory (*CAT)

10

 

Continuous Assessment – Lab (*CAL)

40

 

**End Semester

 

30 (50 Marks; 2 hours exam)

 

*CAT – Can be Quizzes, Assignments, and Reports

*CAL – Can be Lab Assessments, Project, and Report

**End Semester can be theory examination/ lab-based examination/ project presentation

Text Books / References

Textbook(s) 

Richard S. Sutton and Andrew G. Barto; “Reinforcement Learning: An Introduction”; 2nd Edition, MIT Press, 2018.

 

Reference(s)

Dimitri P. Bertsekas; “Reinforcement Learning and Optimal Control”; 1st Edition, Athena Scientific, 2019.

Dimitri P. Bertsekas; “Dynamic Programming and Optimal Control (Vol. I and Vol. II)”; 4th Edition, Athena Scientific, 2017.

Csaba Szepesvári; “Algorithms of Reinforcement Learning (Synthesis Lectures on Artificial Intelligence and Machine Learning)”, Morgan & Claypool Publishers, 2010.

DISCLAIMER: The appearance of external links on this web site does not constitute endorsement by the School of Biotechnology/Amrita Vishwa Vidyapeetham or the information, products or services contained therein. For other than authorized activities, the Amrita Vishwa Vidyapeetham does not exercise any editorial control over the information you may find at these locations. These links are provided consistent with the stated purpose of this web site.

Admissions Apply Now