ICLR2025
Discriminator-Guided Embodied Planning for LLM Agent
Haofu Qian, Chenjia Bai, Jiatao Zhang, Fei Wu, Wei Song, Xuelong Li
摘要
Large Language Models (LLMs) have showcased remarkable reasoning capabilities in various domains, yet face challenges in complex embodied tasks due to coherent long-term policy, context-sensitive environmental understanding. Previous work performed LLM refinement relying on outcome-supervised feedback, which can be costly and ineffective. In this work, we introduce a novel framework, Discriminator-Guided Action OPtimization (DGAP) for facilitating optimization of LLM action plans via step-wise signals. Specifically, we employ a limited set of demonstrations to enable the discriminator in learning a score function, which assesses the alignment between LLM-generated action and the underlying optimal one at every step. Based on the discriminator, LLM is prompted to generate actions to maximize the score utilizing historical action-score pairs trajectory as guidance. Under mild conditions, DGAP resembles the critic-regularized optimization and is demonstrated to achieve a stronger policy than the LLM planner. In experiments across different LLMs (GPT-4, Llama3-70B) in ScienceWorld and VirtualHome, our method obtains superior performance and better efficiency than previous methods.