ACL2024
Open Grounded Planning: Challenges and Benchmark Construction
Shiguang Guo, Ziliang Deng, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
被引用 2 次
摘要
The emergence of large language models (LLMs) has increasingly drawn attention to the use of LLMs for human-like planning. Existing work on LLM-based planning either focuses on leveraging the inherent language generation capabilities of LLMs to produce free-style plans or employs reinforcement learning approaches to learn decision-making for a limited set of actions within restricted environments. However, both approaches exhibit significant discrepancies between the open and executable requirements in real-world planning. In this paper, we propose a new planning task-open grounded planning. The primary objective of open grounded planning is to ask the model to generate an executable plan based on a variable action set, thereby ensuring the executability of the produced plan. To this end, we establish a benchmark for open grounded planning spanning a wide range of domains. Then we test current state-of-the-art LLMs along with five planning approaches, revealing that existing LLMs and methods still struggle to address the challenges posed by grounded planning in open domains. The outcomes of this paper define and establish a foundational dataset for open grounded planning, and shed light on the potential challenges and future directions of LLM-based planning. Our code and datasets are at https://github.com/Shiguang-Guo/ Open-Grounded-Planning * Equal contribution † Corresponding author How to Make Fried Chicken with Tarragon and Buttermilk? Can you check if this address is valid and deliverable? Here's the address: xxx Detect the soft hed boundary of the cake in the image. Restricted Grounded Planning Open Grounded Planning • verifyUSAddress • standardizeUSAddress Closed Action Set Plan verifyUSAddress Open Action Libraries How to Recover from Workout Soreness? Plans Hed Detection On Image Place chicken pieces in a glass or stoneware bowl. Melt Crisco on medium setting. Add chicken to the hot Crisco. Task: How to Activate the Dark Theme on YouTube Method: Using the YouTube App for Android Action Candidate Set: * Close the Tool Options window. * Double click the file. * Do price forecasting. * Click on the blue coloured YOUTUBE STUDIO BETA button. * Open the YouTube app on your iPhone or iPad. * Launch the YouTube app on your Android device. * <other steps>... Steps: 1. Launch the YouTube app on your Android device. 2. Tap on your profile picture. 3. Tap on Settings. 4. Select the General option. 5. Tap on the grey switch, right across Dark theme text. 6. Enjoy YouTube in dark mode <REWRITE PROMPT> INSTRUCTION: You will be given a task, a method to complete the task, a current plan and several candidate actions. Candidate actions are called <Actions in Library>. If no method is specified it will be set to "None". If the current plan is empty, the plan will also be set to "None". Use the actions listed below to refine your current steps to complete your task. Actions marked with <TO BE REPLACED> indicate that the content was not found in the action library, and actions marked with <IN LIB> indicate that they are in the action library. You need to analyze which actions in the provided action library can be added to the action list and replace some or all of the actions marked with <TO BE REPLACED>. We encourage you to add more <TO BE REPLACED> content to complete these steps. You can do the following: * Replace any number of <TO BE REPLACED>-like operations with any number of <IN LIB> operations. * Replace any number of <IN LIB> operations with any number of <IN LIB> operations as the latter are better suited to the task. * Replace any number of <IN LIB> operations with more <IN LIB> operations. * Insert any number of <IN LIB> operations that differ from existing steps. * Insert any number of <TO BE REPLACED> operations to fill in missing content between steps. * Remove any number of redundant <IN LIB> operations. * Remove any number of redundant <TO BE REPLACED> operations. * Remove any number of overly verbose <IN LIB> operations.