CVPR2023

Unsupervised Continual Semantic Adaptation Through Neural Rendering

Zhizheng Liu, Francesco Milano, Jonas Frey, Roland Siegwart, Hermann Blum, Cesar Cadena

Abstract

a) Pseudo-label formation (b) Adaptation through joint training and long-term memory Figure 1. We propose a method to continually adapt a semantic segmentation model f in an unsupervised fashion across multiple scenes, using neural rendering. For each scene Si: a) RGB(-D) images Ii from multiple viewpoints Pi and their corresponding predictions S θ i-1 (Ii) by the latest model f θ i-1 are used to supervise a (Semantic-)NeRF model N ϕ i ; b) Adaptation on Si is performed through a joint training , in which the segmentation network is supervised using the 3D-aware, view-consistent pseudo-labels Ŝϕ i rendered from N ϕ i and the NeRF model through the smooth predictions of f θ i-1 . For each scene, the NeRF model can be compactly stored in a long-term memory, from which images and pseudo-labels from arbitrary viewpoints P can be rendered into a fixed-size rendering buffer and mixed with the renderings from the current scene to reduce forgetting. Bold and dotted lines denote supervision signals and inputs/outputs, respectively.