ICLR2025

Learning Clustering-based Prototypes for Compositional Zero-Shot Learning

Hongyu Qu, Jianan Wei, Xiangbo Shu, Wenguan Wang

Abstract

Learning primitive (i.e., attribute and object) concepts from seen compositions is the primary challenge of Compositional Zero-Shot Learning (CZSL). Existing CZSL solutions typically rely on oversimplified data assumptions, e.g., modeling each primitive with a single centroid primitive representation, ignoring the natural diversities of the attribute (resp. object) when coupled with different objects (resp. attribute). In this work, we develop CLUSPRO, a robust clusteringbased prototype mining framework for CZSL that defines the conceptual boundaries of primitives through a set of diversified prototypes. Specifically, CLUSPRO conducts within-primitive clustering on the embedding space for automatically discovering and dynamically updating prototypes. These representative prototypes are subsequently used to repaint a well-structured and independent primitive embedding space, ensuring intra-primitive separation and inter-primitive decorrelation through prototype-based contrastive learning and decorrelation learning. Moreover, CLUSPRO efficiently performs prototype clustering in a nonparametric fashion without the introduction of additional learnable parameters or computational budget during testing. Experiments on three benchmarks demonstrate CLUSPRO outperforms various top-leading CZSL solutions under both closed-world and open-world settings. Our code is available at CLUSPRO. * Equal contribution † Corresponding author 1 Given that CLIP might be exposed to certain unseen compositions during pre-training, we provide detailed data overlap discussion in §G of Appendix.