ICLR2026
PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION
Jianping Zhong, Zhaobo Qi, Kaiwen Duan, Xinyan Liu, Beichen Zhang, Weigang Zhang, Qingming Huang
摘要
3D object detection using LiDAR point cloud data is critical for autonomous driving systems. However, recent two-stage detectors still struggle to deliver satisfactory performance primarily due to inadequate proposal quality, which stems from significant geometric detail degradation in generated proposal features caused by high sparsity and uneven distribution of point clouds, as well as a complete failure to exploit surrounding contextual cues during independent proposal refinement, losing complementary details from adjacent proposals. To this end, we propose a Proposal-centric Transformer Network (PTN), which includes a Hierarchical Attentive Feature Alignment (HAFA) and a Collaborative Proposal Refinement Module (CPRM). More concretely, HAFA employs a dual-stream architecture to extract multi-granularity proposal representations, including coarse-grained multi-scale voxel features and fine-grained coordinate point features to enhance proposals' object geometric representation ability. CPRM first generates hybrid object queries for all objects and then establishes contextual-aware interactions through the 3D parameter-guided deformable attention mechanism to effectively aggregate spatial location cues and category-specific information across proposals that are spatially adjacent and semantically correlated. Extensive experiments on the large-scale Waymo and KITTI benchmarks demonstrate the superiority of PTN. The code is available at https://github.com/ZhongJianPing1/ ptnet.git .