CVPR2023

Imitation Learning as State Matching via Differentiable Physics

Siwei Chen, Xiao Ma, Zhongwen Xu

摘要

Existing imitation learning (IL) methods such as inverse reinforcement learning (IRL) usually have a double-loop training process, alternating between learning a reward function and a policy and tend to suffer long training time and high variance. In this work, we identify the benefits of differentiable physics simulators and propose a new IL method, i.e., Imitation Learning as State Matching via Differentiable Physics (ILD), which gets rid of the double-loop design and achieves significant improvements in final performance, convergence speed, and stability. The proposed ILD incorporates the differentiable physics simulator as a physics prior into its computational graph for policy learning. ILD unrolls the dynamics by sampling actions from a parameterized policy and minimizing the distance between the expert trajectory and the agent trajectory. It backpropagates the gradient into the policy via temporal physics operators, which improves the transferability to unseen environments and yields higher final performance. ILD has a single-loop structure that stabilizes and speeds up training. It dynamically selects learning objectives for each state during optimization to simplify the complex optimization landscape. Experiments show that ILD outperforms state-of-theart methods in continuous control tasks with Brax, and can be applied to deformable object manipulation tasks, generalized to unseen configurations. 1 ‡ This work is completed at the SEA AI Lab.