CVPR2025

ShowMak3r: Compositional TV Show Reconstruction

Sangmin Kim, Seunguk Do, Jaesik Park

Abstract

Reconstructing dynamic radiance fields from video clips is challenging, especially when entertainment videos like TV shows are given. Many challenges make the reconstruction difficult due to actors occluding with each other and having diverse facial expressions, cluttered stages, and small baseline views or sudden shot changes. Reconstruction becomes even more challenging when dealing with general monocular web videos, which present an even greater degree of unpredictability and complexity compared to controlled environments. To address these issues, we present ShowMak3r++, a unified reconstruction pipeline that targets both controlled settings like TV shows and uncontrolled scenarios like web videos. Our pipeline allows the editing of scenes like how video clips are made in a production control room after the reconstruction is done. In our pipeline, we propose a spatio-temporal positioning module that locates actors on the stage by using depth prior while maintaining 2D image alignment and natural 3D motions. ShotMatcher module then tracks the actors under shot changes. Finally, a face-fitting network dynamically recovers the actors' expressions. Experiments on Sitcoms3D and CMU Panoptic datasets show that our pipeline can reassemble TV show scenes with new cameras at different timestamps. We also demonstrate that our method can successfully reconstruct challenging web videos, including dynamic action clips, dance videos, and movie clips. Furthermore, we demonstrate that our pipeline enables interesting applications such as synthetic shot-making, actor relocation, insertion, deletion, and pose manipulation. Project