CVPR2024

IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM

Minghao Yin, Shangzhe Wu, Kai Han

5 citations

Abstract

In this paper, we address the challenging problem of visual SLAM with neural scene representations. Recently, neural scene representations have shown promise for SLAM to produce dense 3D scene reconstruction with high qual-ity. However, existing methods require scene-specific op-timization, leading to time-consuming mapping processes for each individual scene. To overcome this limitation, we propose IBD-SLAM, an Image-Based Depth fusion frame-work for generalizable SLAM. In particular, we adopt a Neural Radiance Field (NeRF) for scene representation. Inspired by multi-view image-based rendering, instead of learning a fixed-grid scene representation, we propose to learn an image-based depth fusion model that fuses depth maps of multiple reference views into a xyz-map represen-tation. Once trained, this model can be applied to new, uncalibrated monocular RGBD videos of unseen scenes, without the need for retraining, and reconstructs full 3D scenes efficiently with a light-weight pose optimization pro-cedure. We thoroughly evaluate IBD-SLAM on public visual SLAM benchmarks, outperforming the previous state-of-the-art while being 10x faster in the mapping stage. Project page:https://visual-ai.github.io/ibd-slam