CVPR2023

Instant Multi-View Head Capture through Learnable Registration

Timo Bolkart, Tianye Li, Michael J. Black

Abstract

Figure 1. Given calibrated multi-view images (top: 4 of 16 views; contrast enhanced for visualization), TEMPEH directly infers 3D head meshes in dense semantic correspondence (bottom) in about 0.3 seconds. TEMPEH reconstructs heads with varying expressions (left) and head poses (right) for subjects unseen during training. Applied to multi-view video input, the frame-by-frame inferred meshes are temporally coherent, making them directly applicable to full-head performance-capture applications. See Sup. Mat. for the video output.