ICLR2026

PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION

Jianping Zhong, Zhaobo Qi, Kaiwen Duan, Xinyan Liu, Beichen Zhang, Weigang Zhang, Qingming Huang

摘要

3D object detection using LiDAR point cloud data is critical for autonomous driving systems. However, recent two-stage detectors still struggle to deliver satisfactory performance primarily due to inadequate proposal quality, which stems from significant geometric detail degradation in generated proposal features caused by high sparsity and uneven distribution of point clouds, as well as a complete failure to exploit surrounding contextual cues during independent proposal refinement, losing complementary details from adjacent proposals. To this end, we propose a Proposal-centric Transformer Network (PTN), which includes a Hierarchical Attentive Feature Alignment (HAFA) and a Collaborative Proposal Refinement Module (CPRM). More concretely, HAFA employs a dual-stream architecture to extract multi-granularity proposal representations, including coarse-grained multi-scale voxel features and fine-grained coordinate point features to enhance proposals' object geometric representation ability. CPRM first generates hybrid object queries for all objects and then establishes contextual-aware interactions through the 3D parameter-guided deformable attention mechanism to effectively aggregate spatial location cues and category-specific information across proposals that are spatially adjacent and semantically correlated. Extensive experiments on the large-scale Waymo and KITTI benchmarks demonstrate the superiority of PTN. The code is available at https://github.com/ZhongJianPing1/ ptnet.git .