NeurIPS2021

Compositional Reinforcement Learning from Logical Specifications

Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, Rajeev Alur

102 citations

Abstract

We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DIRL, that interleaves highlevel planning and reinforcement learning. First, DIRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines. However, G ex by itself is insufficient to determine the optimal path-e.g., it does not know that there is no path leading directly from S 2 to S 3 , which is a property of the environment. These differences