CVPR2022

RCL: Recurrent Continuous Localization for Temporal Action Detection

Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan

54 citations

Abstract

Temporal representation is the cornerstone of modern action detection techniques. State-of-the-art methods mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the temporal domain with a dis-cretized grid, and then regress the accurate boundaries. In this paper, we revisit this foundational stage and introduce Recurrent Continuous Localization (RCL), which learns a fully continuous anchoring representation. Specifically, the proposed representation builds upon an explicit model con-ditioned with video embeddings and temporal coordinates, which ensure the capability of detecting segments with arbi-trary length. To optimize the continuous representation, we develop an effective scale-invariant sampling strategy and recurrently refine the prediction in subsequent iterations. Our continuous anchoring scheme is fully differentiable, al-lowing to be seamlessly integrated into existing detectors, e.g., BMN [20] and G-TAD [41]. Extensive experiments on two benchmarks demonstrate that our continuous represen-tation steadily surpasses other discretized counterparts by 2% mAP. As a result, RCL achieves 52.92% mAP@0.5 on THUMOS14 and 37.65% mAP on ActivtiyNet vl.3, outper-forming all existing single-model detectors.