NeurIPS2023
DiffVL: Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics
Zhiao Huang, Feng Chen, Yewen Pu, Chunru Lin, Hao Su, Chuang Gan
被引用 6 次
摘要
Combining gradient-based trajectory optimization with differentiable physics simulation is an accurate and efficient technique for solving soft-body manipulation problems. Using a well-crafted optimization objective, the solver can quickly converge onto a valid trajectory. However, writing the appropriate objective functions requires expert knowledge, making it difficult to collect a large set of naturalistic problems from non-expert users. We introduce DiffVL, a framework that integrates the process from task collection to trajectory generation leveraging a combination of visual and linguistic task descriptions. A DiffVL task represents a long horizon soft-body manipulation problem as a sequence of 3D scenes (key frames) and natural language instructions connecting adjacent key frames. We built GUI tools and tasked non-expert users to transcribe 100 soft-body manipulation tasks inspired by real-life scenarios from online videos. We also developed a novel method that leverages large language models to translate task language descriptions into machine-interpretable optimization objectives, which can then help differentiable physics solvers to solve these long-horizon multistage tasks that are challenging for previous baselines. Experiments show that existing baselines cannot complete complex tasks, while our method can solve them well. Videos can be found on the website https://sites.google.com/view/diffvl/home .