CVPR2024
SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar, Yuliang Guo, Xinyu Huang, Liu Ren, Xiaoming Liu
Abstract
Improve KITTI-360 Val SoTA. (b) Improve nuScenes Val SoTA. (c) Theory Advancement. Figure 1. Teaser (a) SoTA frontal detectors struggle with large objects (low APLrg) even on a nearly balanced KITTI-360 dataset (Skewness in Fig. 7). Our proposed SeaBird achieves significant Mono3D improvements, particularly for large objects. (b) SeaBird also improves two SoTA BEV detectors, BEVerse-S [116] and HoP [121] on the nuScenes dataset, particularly for large objects. (c) Plot of convergence variance Var(ϵ) of dice and regression losses with the noise σ in depth prediction. The y-axis denotes the deviation from the optimal weight, so the lower the better. SeaBird leverages dice loss, which we prove is more noise-robust than regression losses for large objects.