NeurIPS2023

Train Hard, Fight Easy: Robust Meta Reinforcement Learning

Ido Greenberg, Shie Mannor, Gal Chechik, Eli A. Meirom

被引用 12 次

摘要

A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Meta-RL (MRL) addresses this issue by learning a meta-policy that adapts to new tasks. Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty. This limits system reliability since test tasks are not known in advance. In this work, we define a robust MRL objective with a controlled robustness level. Optimization of analogous robust objectives in RL is known to lead to both biased gradients and data inefficiency. We prove that the gradient bias disappears in our proposed MRL framework. The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML). RoML is a meta-algorithm that generates a robust version of any given MRL algorithm, by identifying and oversampling harder tasks throughout training. We demonstrate that RoML achieves robust returns on multiple navigation and continuous control benchmarks. We test our algorithms on several domains. Section 6.1 considers a navigation problem, where both CVaR-ML and RoML obtain better CVaR returns than their risk-neutral baseline. Furthermore, they learn substantially different navigation policies. Section 6.2 considers several continuous control environments with varying tasks. These environments are challenging for CVaR-ML, which entirely fails to learn. Yet, RoML preserves its effectiveness and consistently improves the robustness of the returns. In addition, Section 6.3 demonstrates that under certain conditions, RoML can be applied to supervised settings as well -providing robust supervised meta-learning.