CVPR2021

DeepVideoMVS: Multi-View Stereo on Video With Recurrent Spatio-Temporal Fusion

Arda Düzçeker, Silvano Galliani, Christoph Vogel, Pablo Speciale, Mihai Dusmanu, Marc Pollefeys

Abstract

No Spatio-temporal Fusion (Backbone Network) Runtime 30 ms • FPS 34 • Memory = 795 MB (a) Groundtruth (c) With Spatio-temporal Fusion Runtime 37 ms • FPS 27 • Memory = 1061 MB Figure 1: 3D reconstructions of a scene from ScanNet [11]. Extending our stereo backbone with our proposed spatio-temporal fusion module improves the temporal consistency and accuracy of the predicted depth maps, leading to better reconstructions with negligible computational overhead. Runtime is per forward pass on an NVIDIA GTX 1080Ti with image size 320 × 256.