ASE2025

Coding-Fuse: Efficient Fusion of Code Pre-Trained Models for Classification Tasks

Yu Zhao, Lina Gong, Zhiqiu Huang, Yuchen Jin, Mingqiang Wei

Abstract

Software engineering (SE) classification tasks play a vital role in improving software quality. Nevertheless, SE researchers and practitioners tend to rely on a single code pre-trained model (PTM) for downstream classification tasks. Previous studies have found that different code PTMs yield different performance in SE classification tasks, which triggers our thinking of whether the integration of multiple code PTMs improves the performance of classification tasks. Therefore, we first conduct preliminary exploratory research to analyze the impact of fusing multiple PTMs on code classification tasks. The result shows that compared to the single code PTM, the fusion of multiple code PTMs can improve the performance of SE classification tasks. However, the performance improvement also brings about the problem of increased finetuning resources and reduced application efficiency, which does not meet the greenness requirements. In order to address these issues, we propose Coding-Fuse, a framework of efficient fusion of code PTMs for SE classification tasks. Coding-Fuse first introduces evidence theory to evaluate the adaptability of the output features of each layer of code PTMs and data labels, and locates the potential best performance layer of different code PTMs. Then, Coding-Fuse uses a soft voting strategy to fuse the outputs of these layers to obtain a new model. We conduct experiments for effectiveness by comparing Coding-Fuse with the full PTM fusion method and the original single PTM using five different code PTMs on three different SE classification tasks and two task scenarios. The results show that Coding-Fuse can achieve better performance than the full PTM fusion method with higher efficiency and fewer hardware resources, and can achieve better performance than the original single PTM at the same efficiency and hardware resource level. We encourage SE practitioners to use our Coding-Fuse method in practice to fully utilize the advantages of each code PTM in the PTM repository according to task requirements to easily create new SE intelligent PTMs to achieve performance and greenness improvements.