NeurIPS2024

Regularized Q-Learning

Han-Dong Lim, Donghwan Lee

摘要

We consider a single-loop algorithm for regularized Q-learning with linear function approximation. The proposed algorithm is motivated by a bilevel optimization formulation of regularized Q-learning wherein the lower level optimization problem aims to identify a value function approximation that satisfies Bellman’s recursive optimality condition, and the upper level aims to find the projection onto the span of basis vectors. We show that under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.