NeurIPS2022

Near-Optimal Multi-Agent Learning for Safe Coverage Control

Manish Prajapat, Matteo Turchetta, Melanie N. Zeilinger, Andreas Krause

被引用 21 次

摘要

In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known a priori, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to a priori unknown safety constraints. In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety. We first propose a conditionally linear submodular coverage function that facilitates theoretical analysis. Utilizing this structure, we develop MACOPT, a novel algorithm that efficiently trades off the exploration-exploitation dilemma due to partial observability, and show that it achieves sublinear regret. Next, we extend results on single-agent safe exploration to our multi-agent setting and propose SAFEMAC for safe coverage and exploration. We analyze SAFEMAC and give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety. We extensively evaluate our algorithms on synthetic and real problems, including a biodiversity monitoring task under safety constraints, where SAFEMAC outperforms competing methods. Introduction In multi-agent coverage control (MAC) problems, multiple agents coordinate to maximize coverage over some spatially distributed events. Their applications abound, from collaborative mapping [1], environmental monitoring [2], inspection robotics [3] to sensor networks [4] . In addition, the coverage formulation can address core challenges in cooperative multi-agent RL [5, 6] , e.g., exploration [7], by providing high-level goals. In these applications, agents often encounter safety constraints that may lead to critical accidents when ignored, e.g., obstacles [8] or extreme weather conditions [9, 10] . Deploying coverage control solutions in the real world presents many challenges: (i) for a given density of relevant events, this is an NP hard problem [11] ; (ii) such density is rarely known in practice [2] and must be learned from data, which presents a complex active learning problem as the quantity we measure (the density) differs from the one we want to optimize (its coverage); (iii) agents often operate under safety-critical conditions, [8-10], that may be unknown a priori. This requires cautious exploration of the environment to prevent catastrophic outcomes. While prior work addresses subsets of these challenges (see Section 7), we are not aware of methods that address them jointly.