CVPR2023

Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies

Bei Gan, Xiujun Shu, Ruizhi Qiao, Haoqian Wu, Keyu Chen, Hanjun Li, Bo Ren

Abstract

Motivation Movie highlights stand out of the screenplay for efficient browsing and play a crucial role on social media platforms. Based on existing efforts, this work has two observations: (1) F or different annotators, labeling highlight has uncertainty, which leads to inaccurate and ti me-consuming annotations. (2) Besides previous supervised or unsupervised settings, som e existing video corpora can be useful, e.g., trailers, but they are often noisy and incomplet e to cover the full highlights. In this work, we study a more practical and promising settin g, i.e., reformulating highlight detection as "learning with noisy labels". This setting does not require time-consuming manual annotations and can fully utilize existing abundant vid eo corpora. First, based on movie trailers, we leverage scene segmentation to obtain compl ete shots, which are regarded as noisy labels. Then, we propose a Collaborative noisy Lab el Cleaner (CLC) framework to learn from noisy highlight moments. CLC consists of two modules: augmented cross-propagation (ACP) and multi-modality cleaning (MMC). The f ormer aims to exploit the closely related audio-visual signals and fuse them to learn unifie d multi-modal representations. The latter aims to achieve cleaner highlight labels by obser ving the changes in losses among different modalities. To verify the effectiveness of CLC, we further collect a large-scale highlight dataset named MovieLights. Comprehensive exp eriments on MovieLights and YouTube Highlights datasets demonstrate the effectiveness of our approach. Code has been made available at https: / / github . com / TencentYoutuR esearch / HighlightDetection-CLC.