CVPR2024

Unsupervised Salient Instance Detection

Xin Tian, Ke Xu, Rynson W. H. Lau

2 citations

Abstract

The significant amount of manual efforts in annotating pixel-level labels has triggered the advancement of unsu-pervised saliency learning. However, without supervision signals, state-of-the-art methods can only infer region-level saliency. In this paper, we propose to explore the unsu-pervised salient instance detection (USID) problem, for a more fine-grained visual understanding. Our key obser-vation is that self-supervised transformer features may exhibit local similarities as well as different levels of contrast to other regions, which provide informative cues to iden-tify salient instances. Hence, we propose SCoCo, a novel network that models saliency coherence and contrast for USID. SCoCo includes two novel modules: (1) a global background adaptation (GBA) module with a scene-level contrastive loss to extract salient regions from the scene by searching the adaptive “saliency threshold” in the self-supervised transformer features, and (2) a locality-aware similarity (LAS) module with an instance-level contrastive loss to group salient regions into instances by modeling the in-region saliency coherence and cross-region saliency contrasts. Extensive experiments show that SCoCo outperforms state-of-the-art weakly-supervised SID methods and care-fully designed unsupervised baselines, and has comparable performances to fully-supervised SID methods.