ICLR2023

Transformer-based model for symbolic regression via joint supervised learning

Wenqiang Li, Weijun Li, Linjun Sun, Min Wu, Lina Yu, Jingyi Liu, Yanjie Li, Songsong Tian

摘要

Inferring the underlying mathematical expressions from real-world observed data is a central challenge in scientific discovery. Symbolic regression (SR) techniques stand out as a primary method for addressing this challenge, as they explore a function space characterized by interpretable analytical expressions. Recently, transformer-based approaches have gained widespread popularity for solving symbolic regression problems. However, these existing transformer-based models rely on pre-order traversal of expressions as supervision, essentially compressing the information within a computation tree into a token sequence. This compression makes the derived formula highly sensitive to the order of decoded tokens. To address this sensitivity issue, we introduce a novel model architecture called the Graph Transformer (GT), which is purpose-built for directly predicting the tree structure of mathematical formulas. In empirical evaluations, our proposed method demonstrates significant improvements in terms of formula skeleton recovery rates and R 2 scores for data fitting when compared to state-of-the-art transformer-based approaches.