CVPR2025
Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang
摘要
Existing Weakly Supervised Semantic Segmentation (WSSS) relies on the CNN-based Class Activation Map (CAM) and Transformer-based self-attention map to generate classspecific masks for semantic segmentation. However, CAM and self-attention maps usually cause incomplete segmentation due to classification bias issue. To address this issue, we propose a Multi-Label Prototype Visual Spatial Search (MuP-VSS) method with a spatial query mechanism. Specifically, MuP-VSS consists of two key components: multilabel prototype representation and multi-label prototype optimization. The former designs a global embedding to learn the global tokens from the images, and then proposes a Prototype Embedding Module (PEM) to interact with patch tokens to understand the local semantic information. The latter utilizes the exclusivity and consistency principles of the multi-label prototypes to design three prototype losses to optimize them, which contain cross-class prototype (CCP) contrastive loss, cross-image prototype (CIP) contrastive loss, and patch-to-prototype (P2P) consistency loss. CCP loss models exclusivity of multi-label prototypes learned from a single image to enhance the discriminative properties of each class better. CCP loss learns the consistency of the same class-specific prototypes extracted from multiple images to enhance the semantic consistency. P2P loss is proposed to control the semantic response of the prototype to the image patches. Experimental results on Pascal VOC 2012 and MS COCO show that MuP-VSS significantly outperforms recent methods and achieves state-of-the-art performance. * Corresponding author 𝓕𝓕 ∈ ℝ 𝑫𝑫×𝑯𝑯×𝑾𝑾 Class Weight 𝓦𝓦 ∈ ℝ 𝑫𝑫×𝑪𝑪 (a) CNN-based Methods Similarity Scores (c) Our Query-based MuP-VSS 𝓕𝓕 ∈ ℝ 𝑯𝑯𝑾𝑾×𝑫𝑫 𝓟𝓟 ∈ ℝ 𝑪𝑪×𝑫𝑫 Multi-label Tokens 𝓕𝓕 𝓟𝓟 𝒊𝒊 Search Encoder Encoder Channel Aggregation 𝓕𝓕 ∈ ℝ 𝑫𝑫×𝑯𝑯×𝑾𝑾 Encoder Conv Class-aware Feature (b) Transformer-based Methods Refine Patch Attention Maps