ICML2025

ADDQ: Adaptive distributional double Q-learning

Leif Döring, Benedikt Wille, Maximilian Birr, Mihail Bîrsan, Martin Slowik

Abstract

Bias problems in the estimation of maxima of random variables are a well-known obstacle that drastically slows down Q-learning algorithms. We propose to use additional insight gained from distributional reinforcement learning to deal with the overestimation in a locally adaptive way. This helps to combine the strengths and weaknesses of the different Q-learning variants in a unified framework. Our framework ADDQ is simple to implement, existing RL algorithms can be improved with a few lines of additional code. We provide experimental results in tabular, Atari, and MuJoCo environments for discrete and continuous control problems, comparisons with state-of-the-art methods, and a proof of convergence.