ICLR2026

Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI

Xiaotian Liu, Armin Toroghi, Jiazhou Liang, David Courtis, Ruiwen Li, Ali Pesaranghader, Jaehong Kim, Tanmana Sadhu, Hyejeong Jeon, Scott Sanner

出版方

摘要

Planning in open-world environments, where agents must act with partially observed states and incomplete knowledge, is a central challenge in embodied AI. Open-world planning involves not only sequencing actions but also determining what information the agent needs to sense to enable those actions. Existing approaches using Large Language Models (LLM) and Vision-Language Models (VLM) cannot reliably plan over long horizons and complex goals, where they often hallucinate and fail to reason causally over agent-environment interactions. Alternatively, classical PDDL planners offer correct and principled reasoning, but fail in open-world settings: they presuppose complete models and depend on exhaustive grounding over all objects, states, and actions; they cannot address misalignment between goal specifications (e.g., "heat the bread") and action specifications (e.g., "toast the bread"); and they do not generalize across modalities (e.g., text, vision). To address these core challenges: (i) we extend symbolic PDDL into a flexible natural language representation that we term NL-PDDL, improving accessibility for non-expert users as well as generalization over modalities; (ii) we generalize regression-style planning to NL-PDDL with commonsense entailment reasoning to determine what needs to be observed for goal achievement in partially-observed environments with potential goal-action specification misalignment; and (iii) we leverage the lifted specification of NL-PDDL to facilitate open-world planning that avoids exhaustive grounding and yields a time and space complexity independent of the number of ground objects, states, and actions. Our experiments in three diverse domains -classical Blocksworld and the embodied ALFWorld environment with both textual and visual states -show that NL-PDDL substantially outperforms existing baselines, is more robust to longer horizons and more complex goals, and generalizes across modalities. ˚Equal contribution Published as a conference paper at ICLR 2026 "Please Heat the bread and leave it on a plate for me." "(?r) can toast (Bread)"∧"(?y) is a plate"∧"The agent holds (Bread)" "(Bread) is heated"∧"(?y) is a plate"∧"The agent holds (Bread)" Initial State No Op. "(Bread) is heated"∧"(?y) is a plate"∧"(Bread) is on (?y)" Action: "pick up (?o)" ⊬ Any fluent Action: "boil (?o) using (?r)" ⊬ Any fluent "(?r) can toast (Bread)"∧"The agent is near (Bread)"∧"(?y) is a plate" No Op. Action: "toast (?o) using (?r)" "(Bread) is toasted"⊢"(Bread) is heated" ?o/Bread