Publication Type:

Journal Article


Discrete Event Dynamic Systems, Volume 26, Number 3, p.477–509 (2016)



We present in this article a two-timescale variant of Q-learning with linear function approximation. Both Q-values and policies are assumed to be parameterized with the policy parameter updated on a faster timescale as compared to the Q-value parameter. This timescale separation is seen to result in significantly improved numerical performance of the proposed algorithm over Q-learning. We show that the proposed algorithm converges almost surely to a closed connected internally chain transitive invariant set of an associated differential inclusion.

Cite this Research Publication

S. Bhatnagar and K., L., “Multiscale Q-learning with linear function approximation”, Discrete Event Dynamic Systems, vol. 26, pp. 477–509, 2016.