CVPR2024

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar González-Franco

被引用 61 次

摘要

Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero- shot transfer segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any anno-tations is still challenging. In this paper, we propose to uti-lize the self-attention layers in stable diffusion models to achieve this goal because the pre-trained stable diffusion model has learned inherent concepts of objects within its attention layers. Specifically, we introduce a simple yet ef-fective iterative merging process based on measuring KL divergence among attention maps to merge them into valid segmentation masks. The proposed method does not re-quire any training or language dependency to extract qual-ity segmentation for any images. On COCO-Stuff-27, our method surpasses the prior unsupervised zero-shot trans-fer SOTA method by an absolute 26% in pixel accuracy and 17% in mean IoU. The project page is at https://sites.google.com/view/diffseg/home.<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup><sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>Georgia Institute of Technology