ICLR2025

Depth Any Video with Scalable Synthetic Data

Honghui Yang, Di Huang, Wei Yin, Chunhua Shen, Haifeng Liu, Xiaofei He, Binbin Lin, Wanli Ouyang, Tong He

摘要

Figure 1: We present Depth Any Video, a versatile foundation model supporting both image (top half) and video (bottom half) depth estimation. Derived from Stable Video Diffusion and fine-tuned with diverse and high-quality synthetic data, our model achieves remarkably robust generalization across various real and synthetic unseen scenarios. Additionally, it faithfully captures intricate fine-grained details while ensuring temporal consistency throughout the video.