CVPR2021

Learning To Segment Rigid Motions From Two Frames

Gengshan Yang, Deva Ramanan

Abstract

Reference frame (b) Geometric (black: rigid background) (d) Our rigid motion (projected to 2D) (a) PointRend (trained on MSCOCO) (e) Our two-frame reconstruction Figure 1: (a) Many data-driven segmentation methods heavily rely on appearance cues, and fail for novel test scenes. For instance, PointRend [25] trained on MSCOCO fails to detect coral reef fishes even with a low confidence threshold of 0.1. (b) On the other hand, geometric motion segmentation [5, 58] generalizes to novel appearance, but fails due to noisy flow inputs and degenerate motion configurations. (c)-(e) We propose a neural architecture powered by geometric reasoning that decomposes a scene into a rigid background and multiple moving rigid bodies, parameterized by 3D rigid transformations. It demonstrates generalization ability to novel scenes and robustness to noisy inputs as well as motion degeneracies. The inferred rigid motions significantly improve depth and scene flow accuracy.