EMNLP2023

SUT: Active Defects Probing for Transcompiler Models

Mengnan Qi, Yufan Huang, Maoquan Wang, Yongqiang Yao, Zihan Liu, Bin Gu, Colin B. Clement, Neel Sundaresan

1 citation

Abstract

Program translation, i.e. transcompilation has been attracting increasing attention from researchers due to its enormous application value. However, we observe that current program translating models still make elementary syntax errors, particularly when the source language uses syntax elements not present in the target language, which is exactly what developers are concerned about while may not be well exposed by frequently used metrics such as BLEU, CodeBLEU and Computation Accuracy. In this paper, we focus on evaluating the model's ability to address these basic syntax errors and developed an novel active defects probing suite, the Syntactic Unit Tests (SUT) and highly interpretable evaluation harness including Syntax Unit Test Accuracy (SUT Acc) metric and Syntax Element Test Score (SETS), to help diagnose and promote progress in this area. Our Syntactic Unit Test fills the gap in the community for a fine-grained evaluation dataset for program translation. Experimental analysis shows that our evaluation harness is more accurate, reliable, and in line with human judgments compared to previous metrics.