CVPR2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Jianfeng Gao
摘要
Figure 1 . We introduce Magma, the first foundation model that is capable of interpreting and grounding multimodal inputs, and taking actions towards a goal in both digital and physical environments. With our newly proposed pretraining techniques, Magma learns effectively from images, videos and robotics data to bridge verbal and spatial intelligence, taking a step further to an intelligent multimodal AI agent.