CVPR2025
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu
Abstract
Evidence: The <obj_start> man <obj_end> <box_start> [[575, 513, 544, 972]] ... with <obj_start> pink pills <obj_end> <box_start> [[355, 443, 33, 61]] <box_end> later in the images.