ICLR2025
F3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
Zhaoyu Liu, Kan Jiang, Murong Ma, Zhe Hou, Yun Lin, Jin Song Dong
Abstract
Analyzing Fast, Frequent, and Fine-grained (F 3 ) events presents a significant challenge in video analytics and multi-modal LLMs. Current methods struggle to identify events that satisfy all the F 3 criteria with high accuracy due to challenges such as motion blur and subtle visual discrepancies. To advance research in video understanding, we introduce F 3 Set, a benchmark that consists of video datasets for precise F 3 event detection. Datasets in F 3 Set are characterized by their extensive scale and comprehensive detail, usually encompassing over 1,000 event types with precise timestamps and supporting multi-level granularity. Currently F 3 Set contains several sports datasets, and this framework may be extended to other applications as well. We evaluated popular temporal action understanding methods on F 3 Set, revealing substantial challenges for existing techniques. Additionally, we propose a new method, F 3 ED, for F 3 event detections, achieving superior performance. The dataset, model, and benchmark code are available at https: //github.com/F3Set/F3Set .