ICML2024

Risk-Sensitive Reward-Free Reinforcement Learning with CVaR

Xinyi Ni, Guanlin Liu, Lifeng Lai

被引用 8 次

摘要

Exploration is a crucial phase in reinforcement learning (RL). The reward-free RL paradigm, as proposed by (Jin et al., 2020) , offers an efficient method to design exploration algorithms for riskneutral RL across various reward functions with a single exploration phase. However, as RL applications in safety critical settings grow, there's an increasing need for risk-sensitive RL, which takes potential risks into consideration for decisionmaking. Yet, efficient exploration strategies for risk-sensitive RL remain underdeveloped. This study presents a novel risk-sensitive reward-free framework based on Conditional Value-at-Risk (CVaR), designed to effectively address CVaR RL for any given reward function through a single exploration phase. We introduce an efficient algorithm named CVaR-RF-UCRL, which is shown to be (ϵ, p)-PAC, with a sample complexity upper bounded by Õ S 2 AH 4 ϵ 2 τ 2 with τ being the risk tolerance parameter. We also prove a Ω S 2 AH 2 ϵ 2 τ lower bound for any CVaR-RF exploration algorithm, demonstrating the near-optimality of our algorithm. Additionally, we propose the planning algorithms: CVaR-VI and its more practical variant, CVaR-VI-DISC. The effectiveness and practicality of our CVaR reward-free approach are further validated through numerical experiments.