ICLR2026

Model Predictive Adversarial Imitation Learning for Planning from Observation

Tyler Han, Yanda Bao, Bhaumik Mehta, Gabriel Guo, Sanghun Jung, Anubhav Vishwakarma, Emily Kang, Rosario Scalise, Jason Liren Zhou, Bryan Xu, Byron Boots

被引用 2 次

摘要

Humans can often perform a new task after observing a few demonstrations by inferring the underlying intent. For robots, recovering the intent of the demonstrator through a learned reward function can enable more efficient, interpretable, and robust imitation through planning. A common paradigm for learning how to plan-from-demonstration involves first solving for a reward via Inverse Reinforcement Learning (IRL) and then deploying it via Model Predictive Control (MPC). In this work, we unify these two procedures by introducing planning-based Adversarial Imitation Learning, which simultaneously learns a reward and improves a planning-based agent through experience while using observation-only demonstrations. We study advantages of planning-based AIL in generalization, interpretability, robustness, and sample efficiency through experiments in simulated control tasks and real-world navigation from few or single observation-only demonstration.