ICLR2025

Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models

Jun Luo, Chen Chen, Shandong Wu

摘要

Federated prompt learning benefits federated learning with CLIP-like Vision-Language Model's (VLM's) robust representation learning ability through prompt learning. However, current federated prompt learning methods are habitually restricted to the traditional FL paradigm, where the participating clients are generally only allowed to download a single globally aggregated model from the server. While justifiable for training full-sized models under federated settings, in this work, we argue that this paradigm is ill-suited for lightweight prompts. By facilitating the clients to download multiple pre-aggregated prompts as fixed nonlocal experts, we propose Personalized Federated Mixture of Adaptive Prompts (pFedMoAP), a novel FL framework that personalizes the prompt learning process through the lens of Mixture of Experts (MoE). pFedMoAP implements a local attention-based gating network that learns to generate enhanced text features for better alignment with local image data, benefiting from both local and downloaded non-local adaptive prompt experts. Extensive experiments on 9 datasets under various federated settings demonstrate the efficacy of the proposed pFedMoAP algorithm. The code is available at https://github. com/ljaiverson/pFedMoAP . How can we devise a personalized federated learning framework, tailored for prompt learning in CLIP-like VLMs, while fully exploiting the lightweight nature of the prompts? In light of these challenges and opportunities, we propose a novel framework: Personalized Federated Mixture of Adaptive Prompts (pFedMoAP). Tailored specifically for prompt learning in CLIP-like VLMs, our proposed framework aims to unleash the potential of the lightweight prompt