CVPR2024

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li

49 citations

Abstract

In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Any-thing Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmen-tation of specific objects in target image. It is note that the VRP encoder can support a variety of annotation for-mats for reference images, including point, box, scribble, and mask. VRP-SAM achieves a breakthrough within the SAM framework by extending its versatility and applicabil-ity while preserving SAM's inherent strengths, thus enhancing user-friendliness. To enhance the generalization abil-ity of VRP-SAM, the VRP encoder adopts a meta-learning strategy. To validate the effectiveness of VRP-SAM, we con-ducted extensive empirical studies on the Pascal and COCO datasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visual reference segmentation with mini-mal learnable parameters. Furthermore, VRP-SAM demon-strates strong generalization capabilities, allowing it to per-form segmentation of unseen objects and enabling cross-domain segmentation. The source code and models will be available at https://github.com/syp2ysy/VRP-SAM