ICLR2026
QPrompt-R1: Real-Time Reasoning for Domain-Generalized Semantic Segmentation via Group-Relative Query Alignment
Fengyuan Lu, Zixuan Duan, Xunzhi Xiang, Zhicheng Zhang, Wenbin Li, Yang Gao, Qi Fan
Abstract
Deploying semantic segmentation in driving and robotics requires both real-time inference and robustness to domain shifts, formalized as Real-Time Domain-Generalized Semantic Segmentation (RT-DGSS), a challenge not fully addressed. Existing methods treat real-time (RT) inference and domain generalization (DG) separately, with DG improving robustness but lacking real-time performance. To tackle the RT-DGSS problem, we identify that the bottleneck in DG is the prediction head, not the backbone. We introduce QPrompt-R1, a real-time Query-Prompt architecture based on the powerful VFM backbone. QPrompt-R1 integrates reasoning by injecting learnable queries into the final transformer block, leveraging contextual learning to enhance segmentation performance under domain shifts while maintaining real-time inference. To further optimize reasoning without extra inference cost, we introduce a Group Relative Query Alignment (GRQA) training objective, which strengthens the relationship between queries and image tokens through group-relative advantage supervision, unlocking the domain generalization potential of VFMs. QPrompt-R1 achieves 54 FPS, delivering strong performance in synthetic-to-real transfer, real-to-real generalization, and robustness under adverse conditions. GRQA functions as a plug-and-play module, improving DGSS methods such as REIN (+1.2) and SoMA (+0.6) without introducing inference-time overhead. The code is available at QPrompt-R1.