ACL2021

Analysis of Tree-Structured Architectures for Code Generation

Samip Dahal, Adyasha Maharana, Mohit Bansal

Abstract

Code generation is the task of generating code snippets from input user specifications in natural language. Leveraging the linguisticallymotivated hierarchical structure of the input can benefit code generation, especially since the specifications are complex sentences containing multiple variables and operations over various data structures. Moreover, recent advances in Transformer architectures have led to improved performance with tree-to-tree style generation for other seq2seq tasks e.g., machine translation. Hence, we present an empirical analysis of the significance of input parse trees for code generation. We run textto-tree, linearized tree-to-tree, and structured tree-to-tree models, using constituency-based parse trees as input, where the target is Abstract Syntax Tree (AST) of the code. We evaluate our models on the Python-based code generation dataset CoNaLa and a semantic parsing dataset ATIS. We find that constituency trees encoded using a structure-aware model improve performance for both datasets. We also provide an analysis of those aspects of the input parse trees which are most impactful. For instance, we find that structure-aware encodings are better at modelling inputs with multiple variables and capturing long-range dependencies for code generation. 1