NeurIPS2021

The Value of Information When Deciding What to Learn

Dilip Arumugam, Benjamin Van Roy

被引用 17 次

摘要

All sequential decision-making agents explore so as to acquire knowledge about a particular target. It is often the responsibility of the agent designer to construct this target which, in rich and complex environments, constitutes a onerous burden; without full knowledge of the environment itself, a designer may forge a suboptimal learning target that poorly balances the amount of information an agent must acquire to identify the target against the target's associated performance shortfall. While recent work has developed a connection between learning targets and rate-distortion theory to address this challenge and empower agents that decide what to learn in an automated fashion, the proposed algorithm does not optimally tackle the equally important challenge of efficient information acquisition. In this work, building upon the seminal design principle of information-directed sampling [Russo and Van Roy, 2014] , we address this shortcoming directly to couple optimal information acquisition with the optimal design of learning targets. Along the way, we offer new insights into learning targets from the literature on rate-distortion theory before turning to empirical results that confirm the value of information when deciding what to learn. Recent work has developed a connection between rate-distortion theory [Shannon, 1959 , Berger, 1971] and the problem of deciding what an agent should learn [Arumugam and Van Roy, 2021] , optimally balancing between the requisite information needed by any agent to identify a learning target and the corresponding performance shortfall associated with this target. Intuitively, the complexity of a learning problem can be quantified by the bits of information an agent must acquire from the 35th Conference on Neural Information Processing Systems (NeurIPS 2021).