ACL2025

Modeling Uncertainty in Composed Image Retrieval via Probabilistic Embeddings

Haomiao Tang, Jinpeng Wang, Yuang Peng, Guanghao Meng, Ruisheng Luo, Bin Chen, Long Chen, Yaowei Wang, Shutao Xia

被引用 8 次

摘要

Composed Image Retrieval (CIR) enables users to search for images using multimodal queries that combine text and reference images. While metric learning methods have shown promise, they rely on deterministic point embeddings that fail to capture the inherent uncertainty in the input data, in which user intentions may be imprecisely specified or open to multiple interpretations. We address this challenge by refor-mulating CIR through our proposed Co mposed P robabilistic E mbedding (C O PE) framework, which represents both queries and targets as Gaussian distributions in latent space rather than fixed points. Through careful design of probabilistic distance metrics and hierarchical learning objectives, C O PE explicitly captures uncertainty at both instance and feature levels, enabling more flexible, nuanced, and robust matching that can handle polysemy and ambiguity in search intentions. Extensive experiments across multiple benchmarks demonstrate that C O PE effectively quantifies both quality and semantic uncertainties within Com-posed Image Retrieval, achieving state-of-the-art performance on recall rate. Code: https: //github.com/tanghme0w/ACL25-CoPE .