CVPR2021
RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction
Yinyu Nie, Ji Hou, Xiaoguang Han, Matthias Nießner
摘要
In this section, we provide all the parameters, layer specifications and weights used in loss functions. We uniformly denote the fully-connected layers by MLP [l 1 , ..., l d ], where l i is the number of neurons in the i-th layer. A.1. 3D Detector In section 3.1, we predict object proposals from N input points with VoteNet [9] as the backbone. It produces N p proposals with D p -dim features (i.e. proposal features F p ∈ R Np×Dp in our paper), from which we regress the D b -dim box parameters with MLP [128, 128, 69] (N =80K, N p =256, D p =128, D b =69). As in [9], the 69-dim box parameters are encoded by center c ∈ R 3 , scale s 3 ∈ R 3 , heading angle θ ∈ R, semantic label l, and objectness score s obj . s obj is a probability value indicating whether the proposal is close to (<0.3 meter, positive) or far from (>0.6 meter, negative) any ground-truth object center.