ICML2025

Linear Q-Learning Does Not Diverge in L2: Convergence Rates to a Bounded Set

Xinyu Liu, Zixuan Xie, Shangtong Zhang

摘要

Q-learning is one of the most fundamental reinforcement learning algorithms. It is widely believed that Q-learning with linear function approximation (i.e., linear Q-learning) suffers from possible divergence until the recent work Meyn (2024) which establishes the ultimate almost sure boundedness of the iterates of linear Q-learning. Building on this success, this paper further establishes the first L 2 convergence rate of linear Q-learning iterates (to a bounded set). Similar to Meyn (2024), we do not make any modification to the original linear Q-learning algorithm, do not make any Bellman completeness assumption, and do not make any near-optimality assumption on the behavior policy. All we need is an ϵ-softmax behavior policy with an adaptive temperature. The key to our analysis is the general result of stochastic approximations under Markovian noise with fast-changing transition functions. As a side product, we also use this general result to establish the L 2 convergence rate of tabular Q-learning with an ϵ-softmax behavior policy, for which we rely on a novel pseudo-contraction property of the weighted Bellman optimality operator.