ACL2025

Forward Knows Efficient Backward Path: Saliency-Guided Memory-Efficient Fine-tuning of Large Language Models

Yeachan Kim, SangKeun Lee

Abstract

Federated fine-tuning enables privacy-preserving Large Language Model (LLM) adaptation, but its high memory cost limits participation from resourceconstrained devices. We propose FEDPRUNER, an innovative federated finetuning paradigm that tackles this via intelligent layer pruning. FEDPRUNER flexibly prunes the global model, creating personalized submodels based on device memory constraints. It employs a macro-micro synergistic pruning framework: a macro-level functionality-driven layer orchestration mechanism groups layers, while a micro-level importance-aware layer selection strategy prunes within groups to build device-specific submodels. We further introduce a fine-grained variant that independently prunes Multi-Head Attention and Feed-Forward Network components to precisely preserve critical architectural elements. Extensive experimental results demonstrate that FEDPRUNER significantly outperforms state-of-the-art approaches, achieving up to a 1.98% improvement in average model accuracy while reducing peak memory usage by 75%.