CVPR2021

RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction

Yinyu Nie, Ji Hou, Xiaoguang Han, Matthias Nießner

摘要

In this section, we provide all the parameters, layer specifications and weights used in loss functions. We uniformly denote the fully-connected layers by MLP [l 1 , ..., l d ], where l i is the number of neurons in the i-th layer. A.1. 3D Detector In section 3.1, we predict object proposals from N input points with VoteNet [9] as the backbone. It produces N p proposals with D p -dim features (i.e. proposal features F p ∈ R Np×Dp in our paper), from which we regress the D b -dim box parameters with MLP [128, 128, 69] (N =80K, N p =256, D p =128, D b =69). As in [9], the 69-dim box parameters are encoded by center c ∈ R 3 , scale s 3 ∈ R 3 , heading angle θ ∈ R, semantic label l, and objectness score s obj . s obj is a probability value indicating whether the proposal is close to (<0.3 meter, positive) or far from (>0.6 meter, negative) any ground-truth object center.