CVPR2025

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang, Lei Ke, Torben Peters, Katerina Fragkiadaki, Anton Obukhov, Konrad Schindler

DOI Publisher

Abstract

617AA@91832052()@'442&4%7& Figure 1. The RollingDepth model takes an unconstrained video and reconstructs a corresponding depth video. Unlike methods that rely on video diffusion models, it extends a single-image monodepth estimator such that it can process short snippets. To account for temporal context, snippets with varying frame rates are sampled from the video, processed, and reassembled through a global alignment algorithm to obtain long, temporally coherent depth videos. Depth is colour-coded near far.