CVPR2025

CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models

Kiet A. Nguyen, Adheesh Sunil Juvekar, Tianjiao Yu, Muntasir Wahed, Ismini Lourentzou

Abstract

Can you segment the common object in these images? The common object is the dog. Both images show a dog. The common part is the head. The images show a dog. The unique parts are the neck (IMAGE1), the leg, the foot, the body, and the tail (IMAGE2). Common Parts: Please segment the common parts of the objects. Unique Parts: What are the unique parts of the objects in these images? Please output segmentation masks. CALICO Figure 1. Multi-Image Part-focused Object Comparison with CALICO. Our pixel-grounded Large Vision-Language Model, CALICO, performs part-focused semantic co-segmentation, a newly introduced task where the goal is to identify, segment, and label common objects, as well as common and unique object parts across multiple images.