NeurIPS2024

On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Games

Awni Altabaa, Zhuoran Yang

摘要

In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other. Classical models of reinforcement learning (e.g., MDPs, POMDPs, Dec-POMDPs, and POMGs) assume a simple and highly regular information structure, while more general models like predictive state representations do not explicitly model the information structure. By contrast, real-world sequential decision-making problems typically involve a complex and time-varying interdependence of system variables, requiring a rich and flexible representation of information structure. In this paper, we argue for the perspective that explicit representation of information structures is an important component of analyzing and solving reinforcement learning problems. Taking inspiration from the control literature, we propose partially-observable sequential teams and partially-observable sequential games as reinforcement learning models with an explicit representation of information structure as part of the problem specification, capturing classical models of reinforcement learning as special cases. We then use these models to carry out an information-structural analysis of the statistical hardness of general sequential decision-making problems, obtaining a characterization via a graph-theoretic analysis of the DAG representation of the information structure. The central quantity in this analysis is the minimal set of variables, observable or latent, that d-separates the past observations from future observations. We prove an upper bound on the sample complexity of learning a general sequential decision-making problem in terms of its information structure by exhibiting an algorithm achieving the upper bound. This recovers known tractability results and gives a novel perspective on reinforcement learning in general sequential decision-making problems, providing a systematic way of identifying new tractable classes of problems.