CVPR2024

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

Lukas Höllein, Aljaz Bozic, Norman Müller, David Novotný, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner

DOI Publisher

Abstract

2 Meta https://lukashoel.github.io/ViewDiff/ a stuffed bear sitting on a wooden box Input Multi-view generated images Figure 1. Multi-view consistent image generation. Our method takes as input a text description, or any number of posed input images, and generates high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings from any desired camera poses.