CVPR2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad, Vibhav Vineet, Yogesh Singh Rawat
摘要
Figure 1. Effectiveness of HierarQ in capturing task-relevant information. HierarQ adaptively focuses on task-relevant video segments, achieving a task-aware, comprehensive understanding. Here, color-coded frames are shown to demonstrate how entity-focused information complements the broader prompt-relevant context, enhancing overall video relevance and understanding.