CVPR2025

HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction

Yuan Wang, Yali Li, Xiang Li, Shengjin Wang

摘要

cn ‡ Corresponding Author (a) Text-Conditioned HSI Generation (b) Multiple-Modal Controlled HSI Generation (c) Text-based Motion Generation (d) Motion Captioning (e) Generalized Motion Completion walk to the door walk to the refrigerator walk to the desk A person uses right arm to arm wrestle whilst standing. The man holds hand up and turns in a circle to the right. walk to the chair walk to the door walk to the sink A person walks forward and then turns right A man crawled out of the ground and then stood up A person moves forward with both arms raised straight up. Figure 1 . Illustration of our HSI-GPT's supported tasks. Given different instruction prompts, the proposed HSI-GPT not only accommodates multiple control conditions but handles various HSI-related tasks as well as motion-centric understanding and generation tasks uniformly.