CVPR2025

Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning

Jing Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra

摘要

Figure 1. Visualization of our Multimodal Graph Benchmark (MM-GRAPH). All nodes of our benchmark have both visual and text features. (a) Amazon-Sports: The image and text come from the original image and title of the sports equipment. (b) Goodreads-LP: The images correspond to book covers. We do not show the text features of Goodreads-LP since the book description is very long. (c) Ele-fashion: The images and texts correspond to the original image and title of the fashion product, respectively.