CVPR2021

Shot Contrastive Self-Supervised Learning for Scene Boundary Detection

Shixing Chen, Xiaohan Nie, David Fan, Dongqing Zhang, Vimal Bhat, Raffay Hamid

Abstract

Approach Overview -Representative frames of 10 shots from 2 different scenes of the movie Stuart Little are shown. The story-arch of each scene is distinguishable and semantically coherent. We consider similar nearby shots (e.g. 5 and 3) as augmented versions of each other. This augmentation approach is able to capitalize on the underlying film-production process and can encode the scenestructure better than the existing augmentation methods. Given a current shot (query) we find a similar shot (key) within its neighborhood and: (a) maximize the similarity between the query and the key, and (b) minimize the similarity of the query with randomly selected shots.