AAAI2026
HARK: Hierarchical Agentic Retrieval with Keyframing for Video Understanding (Student Abstract)
Jingcheng Li, Ye Qiao, Sitao Huang
Abstract
Current video understanding models struggle with temporal reasoning and efficient processing while balancing detail preservation with computational efficiency. We propose a hierarchical memory system that segments videos into action and scene units, combined with question-aware agentic keyframe selection. Our method achieves 70.3% overall accuracy on VideoMME short video benchmarks.