NeurIPS2023

Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase

Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Sida Peng, Yujun Shen

7 citations

Abstract

Despite the rapid advance of 3D-aware image synthesis, existing studies usually adopt a mixture of techniques and tricks, leaving it unclear how each part contributes to the final performance in terms of generality. Following the most popular and effective paradigm in this field, which incorporates a neural radiance field (NeRF) into the generator of a generative adversarial network (GAN), we build a well-structured codebase, dubbed Carver, through modularizing the generation process. Such a design allows researchers to develop and replace each module independently, and hence offers an opportunity to fairly compare various approaches and recognize their contributions from the module perspective. The reproduction of a range of cutting-edge algorithms demonstrates the availability of our modularized codebase. We also perform a variety of in-depth analyses, such as the comparison across different types of point feature, the necessity of the tailing upsampler in the generator, the reliance on the camera pose prior, etc., which deepen our understanding of existing methods and point out some further directions of the research work. We release code and models here to facilitate the development and evaluation of this field. Module Analysis Observation Point Embedder MLP v.s Volume v.s Tri-plane Different point features exhibit competitive capacities. Combination of multiple types of point feature The contribution is marginal compared to a single type of point feature. Number of planes Bi-planes performs on par with tri-planes. Feature Decoder Decoder depth (i.e., number of layers) The depth only matters for MLP-based point embedder. Activation function SIREN is better than ReLU when upsampler module is absent. Volume Renderer Density-based v.s SDF-based SDF-based representation currently lags behind the density-based one. Upsampler Effects on the generation quality and consistency Upsamplers benefit the quality but harm the multi-view consistency. Pose Sampler Effects of the pre-defined pose priors The more accurate the poses are, the better the generation quality is.