CVPR2025

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

Shehreen Azad, Vibhav Vineet, Yogesh Singh Rawat

Abstract

Figure 1. Effectiveness of HierarQ in capturing task-relevant information. HierarQ adaptively focuses on task-relevant video segments, achieving a task-aware, comprehensive understanding. Here, color-coded frames are shown to demonstrate how entity-focused information complements the broader prompt-relevant context, enhancing overall video relevance and understanding.