CVPR2025
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan
Abstract
Illustration of our DynRefer approach, which dynamically determines proper region views for each task through stochastic vision-language alignment and selectively multimodal referring. Right: Performance comparison on region-level multimodal tasks.