NeurIPS2023
Continual Learning for Instruction Following from Realtime Feedback
Alane Suhr, Yoav Artzi
23 citations
Abstract
We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions. We design a contextual bandit learning approach, converting user feedback to immediate reward. We evaluate through thousands of human-agent interactions, demonstrating 15.4% absolute improvement in instruction execution accuracy over time. We also show our approach is robust to several design variations, and that the feedback signal is roughly equivalent to the learning signal of supervised demonstration data. * Work done while at Cornell University. 2 We use the term continual learning to refer to a learning setting where an agent continually improves its task performance [14], in our case following instructions. The term is used at times to describe the domain adaptation challenge of continually learning to handle new tasks. Our work does not address this problem. 3 Throughout the paper, we use agent to refer to an automated instruction-following system. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).