CVPR2025

Magma: A Foundation Model for Multimodal AI Agents

Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Jianfeng Gao

DOI 出版方

摘要

Figure 1 . We introduce Magma, the first foundation model that is capable of interpreting and grounding multimodal inputs, and taking actions towards a goal in both digital and physical environments. With our newly proposed pretraining techniques, Magma learns effectively from images, videos and robotics data to bridge verbal and spatial intelligence, taking a step further to an intelligent multimodal AI agent.