AAAI2024

Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration

Honghao Wei, Xin Liu, Lei Ying

被引用 7 次

摘要

This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step. Existing studies have considered safe RL with hard instantaneous constraints, but their approaches rely on several key assumptions: piq the RL agent knows a safe action set for every state or knows a safe graph in which all the state-action-state triples are safe, and piiq the constraint/cost functions are linear. In this paper, we consider safe RL with instantaneous hard constraints without assumption piq and generalize piiq to Reproducing Kernel Hilbert Space (RKHS). Our proposed algorithm, LSVI-AE, achieves Õp ? d 3 H 4 Kq regret and ÕpH ? dKq hard constraint violation when the cost function is linear and OpHγK ? Kq hard constraint violation when the cost function belongs to RKHS. Here K is the learning horizon, H is the length of each episode, and γK is the information gain w.r.t the kernel used to approximate cost functions. Our results achieve the optimal dependency on the learning horizon K, matching the lower bound we provide in this paper and demonstrating the efficiency of LSVI-AE. Notably, the design of our approach encourages aggressive policy exploration, providing a unique perspective on safe RL with general cost functions and no prior knowledge of safe actions, which may be of independent interest.