ICLR2026
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
Sanjana Srivastava, Kangrui Wang, Yung-Chieh Chan, Tianyuan Dai, Manling Li, Ruohan Zhang, Mengdi Xu, Jiajun Wu, Li Fei-Fei
Abstract
Intelligent embodied agents not only need to accomplish preset tasks, but also learn to align with individual human needs and preferences. Extracting reward signals from human language preferences allows an embodied agent to adapt through reinforcement learning. However, human language preferences are unconstrained, diverse, and dynamic, making constructing learnable reward from them a major challenge. We present ROSETTA, a framework that uses foundation models to ground and disambiguate unconstrained natural language preference, construct multi-stage reward functions, and implement them with code generation. Unlike prior works requiring extensive offline training to get general reward models or fine-grained correction on a single task, ROSETTA allows agents to adapt online to preference that evolves and is diverse in language and content. We test ROSETTA on both short-horizon and long-horizon manipulation tasks and conduct extensive human evaluation, finding that ROSETTA outperforms SOTA baselines and achieves 87% average success rate and 86% human satisfaction across 116 preferences. Push the ball onto the target. We validate ROSETTA iteratively: multiple steps of taking a language preference, generating a reward, training an agent, evaluating, and taking a new preference. We evaluate on 35 sequences of two to four preferences each, total 116 preferences, in five task-agnostic manipulation environments (Fig. 1 ). ROSETTA successfully interprets ambiguous language, adapts to unseen preferences even after four interaction steps, and produces semantically matched and optimizable rewards that result in aligned