NeurIPS2023
Describe, Explain, Plan and Select: Interactive Planning with LLMs Enables Open-World Multi-Task Agents
Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao Liang
被引用 141 次
摘要
We investigate the challenge of task planning for multi-task embodied agents in open-world environments. 2 Two main difficulties are identified: 1) executing plans in an open-world environment (e.g., Minecraft) necessitates accurate and multi-step reasoning due to the long-term nature of tasks, and 2) as vanilla planners do not consider how easy the current agent can achieve a given sub-task when ordering parallel sub-goals within a complicated plan, the resulting plan could be inefficient or even infeasible. To this end, we propose "Describe, Explain, Plan and Select" (DEPS), an interactive planning approach based on Large Language Models (LLMs). DEPS facilitates better error correction on initial LLM-generated plan by integrating description of the plan execution process and providing selfexplanation of feedback when encountering failures during the extended planning phases. Furthermore, it includes a goal selector, which is a trainable module that ranks parallel candidate sub-goals based on the estimated steps of completion, consequently refining the initial plan. Our experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks and nearly double the overall performances. Further testing reveals our method's general effectiveness in popularly adopted non-open-ended domains as well (i.e., ALFWorld and tabletop manipulation). The ablation and exploratory studies detail how our design beats the counterparts and provide a promising update on the ObtainDiamond grand challenge with our approach. The code is released at https://github.com/CraftJarvis/MC-Planner . * Corresponding Author. 2 We borrow the term "open world" from the game community. It highlights that the agent can navigate inside a diverse environment and accomplish open-ended tasks freely. 37th Conference on Neural Information Processing Systems (NeurIPS 2023). (Re-)Planner LLM * Controller Goal-conditioned Policy Selector HPM Descriptor VLM Explainer LLM * Instruction plan 𝑃 ! goal 𝑔 ! feedback action obs description 𝑑 ! explain Environment obs Task instruction: Obtain a diamond in Minecraft survival mode step-by-step? Candidate goals: Selected Goal 𝒈 𝟏 : × 4 The agent locates in the birch forest, which only has birch wood. Description 𝒅 𝒕 : I succeed on goal 1-5. I fail on goal 6, mining 3 with . Now my inventory has 5 planks, … Initial Plan 𝑷 𝟎 :