CVPR2024

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Jieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang

Abstract

Sit down, bent torso, legs folded at knees Raise two arms Kick, left leg forward, right leg retreats Waltz dance, left foot step backward, right hand extends Kick the white ball (side view) Kick the white ball (front view) Raise arm, open the door (rear view) Figure 1. Diverse motions generated by AnySkill conditioned on various instructions. When provided with an open-vocabulary text description of a motion, AnySkill is adept at learning natural and flexible motions that closely align with the description, facilitated by an image-based reward mechanism. Additionally, AnySkill demonstrates proficiency in learning interactions with dynamic objects, showcasing its versatile motion generation capabilities.