ICSE2022

Cross-Domain Deep Code Search with Meta Learning

Yitian Chai, Hongyu Zhang, Beijun Shen, Xiaodong Gu

42 citations

Abstract

Recently, pre-trained programming language models such as Code-BERT have demonstrated substantial gains in code search. Despite showing great performance, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicality in domainspecific languages that have relatively scarce and expensive data. In this paper, we propose CroCS, a novel approach for domainspecific code search. CroCS employs a transfer learning framework where an initial program representation model is pre-trained on a large corpus of common programming languages (such as Java and Python), and is further adapted to domain-specific languages such as Solidity and SQL. Unlike cross-language CodeBERT, which is directly fine-tuned in the target language, CroCS adapts a few-shot meta-learning algorithm called MAML to learn the good initialization of model parameters, which can be best reused in a domain-specific language. We evaluate the proposed approach on two domain-specific languages, namely Solidity and SQL, with model transferred from two widely used languages (Python and Java). Experimental results show that CroCS significantly outperforms conventional pre-trained code models that are directly finetuned in domain-specific languages, and it is particularly effective for scarce data. CCS CONCEPTS • Software and its engineering → Reusability; Automatic programming.