CVPR2025

Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

David Yifan Yao, Albert J. Zhai, Shenlong Wang

摘要

Input Video t Input Video Output 4D Scene Models Output 4D Scene Models Figure 1. Given a casually captured video, Uni4D harnesses pretrained visual foundation models and multi-stage optimization to jointly estimate camera poses, dynamic geometry, and dense 3D motion. The resulting camera poses and geometry are accurate, consistent, and coherent both temporally and spatially.