CVPR2023

Towards Unbiased Volume Rendering of Neural Implicit Surfaces with Geometry Priors

Yongqiang Zhang, Zhipeng Hu, Haoqian Wu, Minda Zhao, Lincheng Li, Zhengxia Zou, Changjie Fan

Abstract

Learning surface by neural implicit rendering has been a promising way for multi-view reconstruction in recent years. Existing neural surface reconstruction methods, such as NeuS [24] and VolSDF [32] , can produce reliable meshes from multi-view posed images. Although they build a bridge between volume rendering and Signed Distance Function (SDF), the accuracy is still limited. In this paper, we argue that this limited accuracy is due to the bias of their volume rendering strategies, especially when the viewing direction is close to be tangent to the surface. We revise and provide an additional condition for the unbiased volume rendering. Following this analysis, we propose a new rendering method by scaling the SDF field with the angle between the viewing direction and the surface normal vector. Experiments on simulated data indicate that our rendering method reduces the bias of SDF-based volume rendering. Moreover, there still exists non-negligible bias when the learnable standard deviation of SDF is large at early stage, which means that it is hard to supervise the rendered depth with depth priors. Alternatively we supervise zerolevel set with surface points obtained from a pre-trained Multi-View Stereo network. We evaluate our method on the DTU dataset and show that it outperforms the state-of-thearts neural implicit surface methods without mask supervision. * Corresponding author certain multistage pipeline, including grouping the related views, depth prediction, filtering with photometric consistency and geometry consistency, fusion of points from different views, meshing the dense points by off-the-shelf methods such as screened Poisson Surface Reconstruction [8], and texture mapping finally. Later MVS networks [5, 20, 27, 36] are developed rapidly benefiting from the available large-scale 3D datasets. This kind of MVS networks use Convolutional Neural Network (CNN) to predict depth maps effectively, then follow the traditional pipeline to fuse a global dense point cloud and mesh it. However, MVS networks suffer from texture-less regions and sudden depth changes, so there usually exist many holes in the recovered meshes. Recently, neural implicit surface and differentiable rendering methods present a promising way to improve and simplify the progress of the Multi-View 3D reconstruction. The surfaces are represented as Signed Distance Functions (SDF) [18, 24, 32, 33] or occupancy field [16, 17] . At the same time, neural radiance field [13, 35] are proposed with different volume rendering. The neural surface-based rendering method can recover reliable and smooth surfaces, but it is hard to train without mask supervision. On the contrary, the different volume rendering can achieve good 2D views without mask supervision, but the quality of 3D geometry is rather coarse. Is there some connections between the SDF field and occupancy field? NeuS [24] and VolSDF [32] point that the connection can be conducted with a certain Cumulative Distribution Function (CDF). Thanks to this significant progress, it is able to learn 3D surfaces effectively from neural implicit surface with the self-supervised volume rendering. The necessary input can only be well-posed 2D images. Masks could be removed, because it is hard to obtain accurate masks for many complex objects in the real world. Although these great methods have made big progress on 3D reconstruction from calibrated multi-view images, This CVPR paper is the Open Access version, provided by the Computer Vision Foundation. Except for this watermark, it is identical to the accepted version; the final published version of the proceedings is available on IEEE Xplore.