ASE2021
Learning Highly Recursive Input Grammars
Neil Kulkarni, Caroline Lemieux, Koushik Sen
被引用 24 次
摘要
This paper presents ARVADA, an algorithm for learning context-free grammars from a set of positive examples and a Boolean-valued oracle. ARVADA learns a context-free grammar by building parse trees from the positive examples. Starting from initially flat trees, ARVADA builds structure to these trees with a key operation: it bubbles sequences of sibling nodes in the trees into a new node, adding a layer of indirection to the tree. Bubbling operations enable recursive generalization in the learned grammar. We evaluate ARVADA against GLADE and find it achieves on average increases of 4.98× in recall and 3.13× in F1 score, while incurring only a 1.27× slowdown and requiring only 0.87× as many calls to the oracle. ARVADA has a particularly marked improvement over GLADE on grammars with highly recursive structure, like those of programming languages. • We introduce ARVADA, which learns grammars from inputs strings and oracle via bubble-and-merge operations. • We distribute ARVADA's implementation as open source: https://github.com/neil-kulkarni/arvada . • We evaluate ARVADA on a variety of benchmarks against the state-of-the-art method GLADE. II. MOTIVATING EXAMPLE ARVADA takes as input a set of example strings S and an oracle O. The oracle returns True if its input string is valid and False otherwise. ARVADA's goal is to learn a grammar