CVPR2025

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Kun Liu, Qi Liu, Xinchen Liu, Jie Li, Yongdong Zhang, Jiebo Luo, Xiaodong He, Wu Liu

摘要

A man is seated at a white table with a white box in front of a gray background. He is wearing a burgundy hoodie, a black cap, and glasses. The man is actively engaged with a laptop computer, which he holds up in his right hand. With his left hand, he gestures towards the bottom of the laptop, seemingly explaining the details to the camera. Subsequently, he closes the laptop. With an interested and satisfied smile, he looks content. Finally, he gently places the computer back on the white table in front of him using both hands, taking care to position it neatly. The scene is a simple and plain room with a gray background and a technological atmosphere. The lighting is bright and even, highlighting the man and the objects in his hands. •• Human Descriptions •• •• •• •• HOI Descriptions Human Action Descriptions Scene Descriptions Close-up View of HOI One-person HOI Multi-person HOI Figure 1. Overview of HOIGen-1M. HOIGen-1M contains over one million video clips for HOI video generation with multiple types of HOI videos, diverse scenarios (15, 000+ objects and 7, 000+ interaction types), and expressive captions.