Syllabus
Unit 1
Introduction to Reinforcement learning, Markov Decision Process (MDP) – Markov Process, Markov Reward Process, Markov Decision Process and Bellman Equations, Partially Observable MDPs, Planning by Dynamic programming (DP) – Policy Evaluation, Value Iteration, Policy Iteration, DP Extensions, model-free prediction and control.
Unit 2
Integrating planning with learning – Model-based RL, Integrated Architecture and Simulation-based Search, Monte-Carlo (MC) Learning, Exploration and exploitation – Multi-arm Bandits, Contextual Bandits and MDP Extensions, integrating AI search and learning – Classical Games: Combining Minimax Search and RL.
Unit 3
Hierarchical RL – Semi-Markov Decision Process, Learning with Options, Deep RL – Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), Double Q-Learning, Multi-agent RL – Cooperative vs. Competitive Settings, Mixed Setting.
Objectives and Outcomes
Course Objectives
- This course primarily focuses on training students to frame reinforcement learning problems and to tackle algorithms from dynamic programming, Monte Carlo and temporal-difference learning.
- It involves larger state space environments using function approximation, deep Q-networks and state-of-the-art policy gradient algorithms.
Course Outcomes
CO1: Understand Markov decision process and reinforcement learning.
CO2: Apply AI search, planning, and learning.
CO3: Apply Hierarchical learning techniques.
CO4: Analyze Q-learning and multi-agent systems.
CO-PO Mapping
PO/PSO
|
PO1
|
PO2
|
PO3
|
PO4
|
PO5
|
PO6
|
PO7
|
PO8
|
PO9
|
PO10
|
PO11
|
PO12
|
PSO1
|
PSO2
|
CO
|
CO1
|
3
|
2
|
3
|
3
|
2
|
0
|
0
|
2
|
2
|
2
|
0
|
0
|
3
|
3
|
CO2
|
3
|
2
|
3
|
3
|
3
|
0
|
0
|
2
|
2
|
2
|
0
|
0
|
3
|
3
|
CO3
|
3
|
2
|
3
|
3
|
3
|
0
|
0
|
2
|
2
|
2
|
0
|
0
|
3
|
3
|
CO4
|
3
|
2
|
3
|
3
|
3
|
0
|
0
|
2
|
2
|
2
|
0
|
0
|
3
|
3
|
Evaluation Pattern
Evaluation Pattern: 70:30
Assessment
|
Internal
|
End Semester
|
Midterm
|
20
|
|
Continuous Assessment – Theory (*CAT)
|
10
|
|
Continuous Assessment – Lab (*CAL)
|
40
|
|
**End Semester
|
|
30 (50 Marks; 2 hours exam)
|
*CAT – Can be Quizzes, Assignments, and Reports
*CAL – Can be Lab Assessments, Project, and Report
**End Semester can be theory examination/ lab-based examination/ project presentation
Text Books / References
Textbook(s)
Richard S. Sutton and Andrew G. Barto; “Reinforcement Learning: An Introduction”; 2nd Edition, MIT Press, 2018.
Reference(s)
Dimitri P. Bertsekas; “Reinforcement Learning and Optimal Control”; 1st Edition, Athena Scientific, 2019.
Dimitri P. Bertsekas; “Dynamic Programming and Optimal Control (Vol. I and Vol. II)”; 4th Edition, Athena Scientific, 2017.
Csaba Szepesvári; “Algorithms of Reinforcement Learning (Synthesis Lectures on Artificial Intelligence and Machine Learning)”, Morgan & Claypool Publishers, 2010.