ICLR2025

Rethinking and Improving Autoformalization: Towards a Faithful Metric and a Dependency Retrieval-based Approach

Qi Liu, Xinhao Zheng, Xudong Lu, Qinxiang Cao, Junchi Yan

摘要

As a central component in formal verification, statement autoformalization has been widely studied including the recent efforts from machine learning community, but still remains a widely-recognized difficult and open problem. In this paper, we delve into two critical yet under-explored gaps: 1) absence of faithful and universal automated evaluation for autoformalization results; 2) agnosia of contextual information, inducing severe hallucination of formal definitions and theorems. To address the first issue, we propose BEq (Bidirectional Extended Definitional Equivalence), an automated neuro-symbolic method to determine the equivalence between two formal statements, which is formal-grounded and wellaligned with human intuition. For the second, we propose RAutoformalizer (Retrieval-augmented Autoformalizer), augmenting statement autoformalization by Dependency Retrieval, retrieving potentially dependent objects from formal libraries. We parse the dependencies of libraries and propose to structurally informalise formal objects by the topological order of dependencies. To evaluate OOD generalization and research-level capabilities, we build a novel benchmark, Con-NF, consisting of 961 informal-formal statement pairs from frontier mathematical researches. Experiments validate the effectiveness of our approaches: BEq is evaluated on 200 diverse formal statement pairs with expert-annotated equivalence label, exhibiting significantly improved accuracy (82.50% → 90.50%) and precision (70.59% → 100.0%). For dependency retrieval, a strong baseline is devised. Our RAutoformalizer substantially outperforms SOTA baselines in both in-distribution ProofNet benchmark (12.83% → 18.18%, BEq@8) and OOD Con-NF scenario (4.58% → 16.86%, BEq@8).