ELL729 Stochastic Control and Reinforcement Learning
TAs for this course:
Md. Taha Shah (bsz218183@dbst.iitd.ac.in)
Songita Das (bsz228102@dbst.iitd.ac.in)
This is a first course in reinforcement learning (RL) for students looking for an introduction to the subject. The lectures will largely follow the book by Sutton and Barto at the initial stages. Some topics will be discussed in depth from the books of Bertsekas.
The prerequisites include familiarity with stochastic processes, probability, and calculus.
If you have already taken a course in RL, this course may not be useful for you.
This course will neither cover deep RL in detail, nor will it involve discussion of coding (apart from a few illustrative examples), implementation, PyTorch, OpenAI Gym etc.
References:
Reinforcement Learning: An Introduction by Andrew Barto and Richard S. Sutton.
Reinforcement Learning and Optimal Control by Dimitri Bertsekas.
Dynamic Programming and Optimal Control: Volumes 1 and 2 by Dimitri P. Bertsekas.
Evaluation Components:
Minors - 30%
Majors - 30%
Quiz 1 - 10%
Quiz 2 - 10%
Demonstration - 20% (in two separate exercises).
Demo Task I:
Consider the Blackjack problem discussed in the class with a constant dealer policy.
Write a code to search for the optimal policy using epsilon greedy Monte Carlo
Write a code to search for the optimal policy using off-policy method.
Display:
The optimal policy determined by your method in a tabular form as discussed in class.
The performance of the optimal policy (win %).
Performance of the policy when placed against a different dealer policy.
For the main part of the exercise, consider that the house hits on 16 and stays on 17+.
For the changed policy, consider that the house hits on soft 17 (involving an Ace).
Impact of epsilon (%wins vs epsilon).