CVPR2025

DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution

Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan

Abstract

Illustration of our DynRefer approach, which dynamically determines proper region views for each task through stochastic vision-language alignment and selectively multimodal referring. Right: Performance comparison on region-level multimodal tasks.