CVPR2021

Learning by Aligning Videos in Time

Sanjay Haresh, Sateesh Kumar, Huseyin Coskun, Shahram Najam Syed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran

Abstract

Embedding Video 1 Embedding Video 2 Encoder Figure 1 : We propose a self-supervised method to learn video representations by aligning videos in time, despite many differences between the videos such as appearance, motion, and viewpoint. We optimize the embedding space by using both the temporal alignment loss between the videos and the temporal regularization applied separately on each video. Our learned representations can be useful for many video-based temporal understanding tasks such as temporal video alignment.