ICML2025

Concurrent Reinforcement Learning with Aggregated States via Randomized Least Squares Value Iteration

Yan Chen, Qinxun Bai, Yiteng Zhang, Maria Dimakopoulou, Shi Dong, Qi Sun, Zhengyuan Zhou

摘要

Designing learning agents that explore efficiently in a complex environment has been widely recognized as a fundamental challenge in reinforcement learning. While a number of works have demonstrated the effectiveness of techniques based on randomized value functions on a single agent, it remains unclear, from a theoretical point of view, whether injecting randomization can help a society of agents concurently explore an environment. The theoretical results established in this work tender an affirmative answer to this question. We adapt the concurrent learning framework to randomized least-squares value iteration (RLSVI) with aggregated state representation. We demonstrate polynomial worst-case regret bounds in both finite-and infinite-horizon environments. In both setups the per-agent regret decreases at an optimal rate of Θ 1 √ N , highlighting the advantage of concurent learning. Our algorithm exhibits significantly lower space complexity compared to (Russo, 2019) and (Agrawal et al., 2021) . We reduce the space complexity by a factor of K while incurring only a √ K increase in the worstcase regret bound, compared to (Agrawal et al., 2021; Russo, 2019) . Interestingly, our algorithm improves the worst-case regret bound of (Russo, 2019) by a factor of H 1/2 , matching the improvement in (Agrawal et al., 2021) . However, this result is achieved through a fundamentally different algorithmic enhancement and proof technique. Additionally, we conduct numerical experiments to demonstrate our theoretical findings.