CVPR2023

Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization

Mengyuan Chen, Junyu Gao, Changsheng Xu

摘要

Targeting at recognizing and localizing action instances with only video-level labels during training, Weaklysupervised Temporal Action Localization (WTAL) has achieved significant progress in recent years. However, living in the dynamically changing open world where unknown actions constantly spring up, the closed-set assumption of existing WTAL methods is invalid. Compared with traditional open-set recognition tasks, Open-world WTAL (OW-TAL) is challenging since not only are the annotations of unknown samples unavailable, but also the fine-grained annotations of known action instances can only be inferred ambiguously from the video category labels. To address this problem, we propose a Cascade Evidential Learning frame- work at an evidence level, which targets at OWTAL for the first time. Our method jointly leverages multi-scale temporal contexts and knowledge-guided prototype information to progressively collect cascade and enhanced evidence for known action, unknown action, and background separation. Extensive experiments conducted on verify the effectiveness of our method. Besides the classification metrics adopted by previous openset recognition methods, we also evaluate our method on localization metrics which are more reasonable for OWTAL.