ICLR2021

Mathematical Reasoning via Self-supervised Skip-tree Training

Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

65 citations

Abstract

We demonstrate that self-supervised language modeling applied to mathematical formulas enables logical reasoning. To measure the logical reasoning abilities of language models, we formulate several evaluation (downstream) tasks, such as inferring types, suggesting missing assumptions, and completing equalities. For training language models for formal mathematics, we propose a novel skip-tree task. We find that models trained on the skip-tree task show surprisingly strong mathematical reasoning abilities, and outperform models trained on standard skipsequence tasks. We also analyze the models' ability to formulate new conjectures by measuring how often the predictions are provable and useful in other proofs. Published as a conference paper at ICLR 2021 Reasoning can refer to a wide range of abilities, and thus we measure the mathematical reasoning abilities of language models on a variety of tasks, including mechanical derivations, such as type inference, and also creative tasks, such as predicting under which assumptions a statement is true. As we want to study what reasoning capabilities can be acquired just through self-supervised training, we do not employ fine-tuning on these tasks. Instead, we designed the tasks to be syntactically similar to the training task, such that the language model may produce correct answers. An advantage of formal language compared to natural language is that we can attempt to automatically evaluate statements. That is, we can let our language models produce conjectures, which we then try to prove using the DeepHOL theorem prover (Bansal et al., 2019; 2020). Besides evaluating the provability of the produced statements, we go one step further and evaluate their usefulness, by measuring how many times they are used as premises in proofs of other theorems. Our contributions are as follows: 1. We show that self-supervised training on mathematical formulas alone leads to logical reasoning capabilities. 2. We introduce a new skip-tree training task that outperforms the state-of-the-art skip-sequence training. We also introduce several evaluation tasks that are subsumed by skip-tree training (i.e. predict a missing subexpression), but test specific logical reasoning abilities to make the performance of the models interpretable. 3. We suggest a way to create and evaluate mathematical conjectures using existing neural theorem provers. The remainder of this paper is structured as follows: First, we review related work on language modeling and deep learning for mathematics in Section 2. Then, in Section 3 we discuss the source corpus of formal mathematical statements from which we generate our training data. In Section 4, we present the skip-tree training task, as well as several variations that we used in our ablation studies. We present the evaluation tasks in Section 5, discuss our experimental findings in Section 6, and conclude in Section 7. RELATED WORK Recently, we have seen a series of rapid improvements in language modeling stemming from better pretraining tasks (