WWW2026

Multimodal-enhanced Federated Recommendation: A Group-wise Fusion Approach

Chunxu Zhang, Weipeng Zhang, Guodong Long, Zhiheng Xue, Riting Xia, Bo Yang

Abstract

Federated Recommendation (FR) has emerged as a promising paradigm for addressing the learn-to-rank problem in a privacy-preserving manner. However, effectively incorporating multimodal item features into FR remains an open challenge, due to efficiency constraints, distribution heterogeneity, and feature utilization alignment with the recommendation objective. To tackle these issues, we propose GFMFR, a novel multimodal fusion framework for federated recommendation. Specifically, multimodal representation learning is offloaded to the server, which stores item content and employs a high-capacity encoder to generate expressive representations, thereby alleviating the computational burden on clients. In addition, a group-aware multimodal aggregation mechanism learns shared representations for users with similar interests, enabling knowledge sharing while alleviating distribution heterogeneity. Finally, GFMFR adopts a preference-guided distillation strategy that leverages multimodal information in a way directly aligned with recommendation objectives. The proposed framework can be seamlessly integrated into existing federated recommender systems, enhancing their effectiveness by incorporating multimodal features. Extensive experiments on five benchmark datasets demonstrate that GFMFR consistently outperforms state-of-the-art multimodal FR baselines. The implementation code is available. https://github.com/Zhangwp2420/GFMFR.