ICSE2023

An Empirical Comparison of Pre-Trained Models of Source Code

Changan Niu, Chuanyi Li, Vincent Ng, Dongxiao Chen, Jidong Ge, Bin Luo

71 citations

Abstract

While a large number of pre-trained models of source code have been successfully developed and applied to a variety of software engineering (SE) tasks in recent years, our understanding of these pre-trained models is arguably fairly limited. With the goal of advancing our understanding of these models, we perform the first systematic empirical comparison of 19 recently-developed pre-trained models of source code on 13 SE tasks. To gain additional insights into these models, we adopt a recently-developed 4-dimensional categorization of pretrained models, and subsequently investigate whether there are correlations between different categories of pre-trained models and their performances on different SE tasks. Index Terms-Pre-training of Source Code, AI for SE TABLE I DETAILS OF EVALUATION TASKS, DATASETS AND METRICS.