WWW2026

MCRec: Few-Shot Multimodal Cover Recommendation via User Interest Profiles

Weixin Zheng, Chunyao Song, Tingjian Ge

Abstract

Recommendation systems play a central role in modern services, yet often treat item cover images as static attributes, overlooking their influence on user decisions. We introduce the task of cover recommendation and study few-shot, interaction-free selection using multimodal user interest profiles. To address cold-start and sparsity challenges in traditional methods, we propose Multimodal Cover Recommendation (MCRec), a framework that leverages Vision-Language Models (VLMs) for multimodal feature extraction. Our approach includes: (1) a Text-Guided Visual Interest Aggregation network (TGVIA) integrating visual and textual representations; (2) multimodal interest embeddings fused via templated prompts; and (3) a multimodal-driven textual inversion technique enabling training-free generalization to new scenarios. We further propose MCRec+, a fine-tuning variant using hybrid sampling. To support evaluation, we construct three benchmarks and propose two new metrics. Extensive experiments show our methods significantly outperform baselines across datasets, especially with average gains of 3.72% in Recall@1, 1.70% in APMS and 1.25% in MPMS on MCRec. Code and data are publicly available from https://github.com/WeixinZhengRec/MCRec.