NeurIPS2024

From Chaos to Clarity: 3DGS in the Dark

Zhihao Li, Yufei Wang, Alex C. Kot, Bihan Wen

摘要

Novel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation. Our study reveals that 3D Gaussian Splatting (3DGS) is particularly susceptible to this noise, leading to numerous elongated Gaussian shapes that overfit the noise, thereby significantly degrading reconstruction quality and reducing inference speed, especially in scenarios with limited views. To address these issues, we introduce a novel self-supervised learning framework designed to reconstruct HDR 3DGS from a limited number of noisy raw images. This framework enhances 3DGS by integrating a noise extractor and employing a noise-robust reconstruction loss that leverages a noise distribution prior. Experimental results show that our method outperforms LDR/HDR 3DGS and previous state-of-the-art (SOTA) self-supervised and supervised pre-trained models in both reconstruction quality and inference speed on the RawNeRF dataset across a broad range of training views. Code can be found in https://lizhihao6. github.io/Raw3DGS . Introduction Novel view synthesis (NVS) is fundamental to 3D vision, with extensive applications in virtual and augmented reality (VR/AR) [15; 16], autonomous driving [17; 33], and 3D asset creation [21; 4; 19]. Neural Radiance Fields (NeRF) [23] have revolutionized this field by rendering colors through the accumulation of RGB values along sampling rays, employing an implicit MultiLayer Perceptron (MLP) representation. Typically, this method uses low dynamic range (LDR) RGB images processed by image signal processing (ISP) modules, leading to a significant loss of crucial scene details, especially in high-contrast areas like highlights and shadows, which can degrade performance in high dynamic range (HDR) environments such as tunnels, sunsets, or dimly lit scenes. Moreover, reliance on ISP-processed RGB images restricts post-capture color and tone adjustments, presenting significant challenges for photographers and modelers during post-production. In contrast, raw images before ISP offer a higher dynamic range and preserve more scene information. Recent research has indicated that utilizing raw images can significantly enhance the performance of downstream computer vision tasks in complex lighting conditions [18] and offer greater flexibility in post-production adjustments [36; 5]. Building on this advantage, RawNeRF [22] first employed raw images as the optimization target in NeRF, achieving marked improvements over traditional RGB-based LDR NeRF approaches. However, RawNeRF's reliance on implicit 3D representation is computationally demanding, requiring up to 48 hours to train a single scene and about one minute to render a single view, which limits its practicality for real-time applications.