NeurIPS2020

Reward-rational (implicit) choice: A unifying formalism for reward learning

Hong Jun Jeon, Smitha Milli, Anca D. Dragan

201 citations

Abstract

It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback. The types of behavior interpreted as evidence of the reward function have expanded greatly in recent years. We've gone from demonstrations, to comparisons, to reading into the information leaked when the human is pushing the robot away or turning it off. And surely, there is more to come. How will a robot make sense of all these diverse types of behavior? Our key observation is that different types of behavior can be interpreted in a single unifying formalism -as a reward-rational choice that the human is making, often implicitly. We use this formalism to survey prior work through a unifying lens, and discuss its potential use as a recipe for interpreting new sources of information that are yet to be uncovered. Overall, there is much information out there, some purposefully communicated, other leaked. While existing papers are instructing us how to tap into some of it, one can only imagine that there is much more that is yet untapped. There are probably new yet-to-be-invented ways for people to purposefully provide feedback to robots -e.g. guiding them on which part of a trajectory was particularly good or bad. And, there will probably be new realizations about ways in which human behavior already leaks information, beyond the state of the world or turning the robot off. How will robots make sense of all these diverse sources of information? 34th Conference on Neural Information Processing Systems (NeurIPS 2020),