ICLR2025
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
Zhiyang Xu, Minqian Liu, Ying Shen, Joy Rimchala, Jiaxin Zhang, Qifan Wang, Yu Cheng, Lifu Huang
摘要
Interleaved Text-and-Image Generation • Interleaved generation [1,2] that requires models generating both text and images is an increasingly important task in multimodal learning.