ICCV2023

ATT3D: Amortized Text-to-3D Object Synthesis

Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas

100 citations

DOI arXiv Publisher

Abstract

NeRF "A monkey sitting in a chair wearing a suit .. party hat" "A pig riding a motorbike wearing a backpack .. top hat" ...etc... 1hr per prompt 1sec per prompt NeRF text NeRF mapping network trained offline ... ... expensive per-prompt optimization Existing Methods ATT3D: Amortized Text-to-3D Requires 1 hour Requires < 1 sec Figure 1 : Our method initially trains one network to output 3D objects consistent with various text prompts. After, when we receive an unseen prompt, we produce an accurate object in < 1 second, with 1 GPU. Existing methods re-train the entire network for every prompt, requiring a long delay for the optimization to complete. Further, we can interpolate between prompts for user-guided asset generation (Fig. 3 ). We include a project webpage with an overview and videos.