FSE2025
Beyond PEFT: Layer-Wise Optimization for More Effective and Efficient Large Code Model Tuning
Chaozheng Wang, Jia Feng, Shuzheng Gao, Cuiyun Gao, Zongjie Li, Ting Peng, Hailiang Huang, Yuetang Deng, Michael R. Lyu
2 citations
Abstract
Large Code Models (LCMs) have demonstrated remarkable effectiveness across various code intelligence tasks. Supervised fine-tuning is essential to optimize their performance for specific downstream tasks. Compared with the traditional full-parameter fine-tuning (FFT) method, Parameter-Efficient Fine-Tuning (PEFT) methods can train LCMs with substantially reduced resource consumption and have gained widespread attention among researchers and practitioners. While existing studies have explored PEFT methods for code intelligence tasks, they have predominantly focused on a limited subset of scenarios, such as code generation with publicly available datasets, leading to constrained generalizability of the findings. To mitigate the limitation, we conduct a comprehensive study on exploring the effectiveness of the PEFT methods, which involves five code intelligence tasks containing both public and private data. Our extensive experiments reveal a considerable performance gap between PEFT methods and FFT, which is contrary to the findings of existing studies. We also find that this disparity is particularly pronounced in tasks involving private data. To improve the tuning performance for LCMs while reducing resource utilization during training, we propose a Layer-Wise Optimization (LWO) strategy in the paper. LWO incrementally updates the parameters of each layer in the whole model architecture, without introducing any additional component and inference overhead. Experiments across five LCMs and five code intelligence tasks demonstrate that LWO trains LCMs more effectively and efficiently compared to previous PEFT methods, with significant improvements in tasks using private data. For instance, in the line-level code completion task using our private code repositories, LWO outperforms the state-of-the-art LoRA method by 22% and 12% in terms of accuracy and BLEU scores, respectively. Furthermore, LWO can enable more efficient LCM tuning, reducing the training time by an average of 42.7% compared to LoRA.