NeurIPS2022

CoNSoLe: Convex Neural Symbolic Learning

Haoran Li, Yang Weng, Hanghang Tong

被引用 13 次

摘要

Learning the underlying equation from data is a fundamental problem in many disciplines. Recent advances rely on Neural Networks (NNs) but do not provide theoretical guarantees in obtaining the exact equations owing to the non-convexity of NNs. In this paper, we propose Convex Neural Symbolic Learning (CONSOLE) to seek convexity under mild conditions. The main idea is to decompose the recovering process into two steps and convexify each step. In the first step of searching for right symbols, we convexify the deep Q-learning. The key is to maintain double convexity for both the negative Q-function and the negative reward function in each iteration, leading to provable convexity of the negative optimal Q function to learn the true symbol connections. Conditioned on the exact searching result, we construct a Locally-Convex equation Learner (LOCAL) neural network to convexify the estimation of symbol coefficients. With such a design, we quantify a large region with strict convexity in the loss surface of LOCAL for commonly used physical functions. Finally, we demonstrate the superior performance of the CONSOLE framework over the state-of-the-art on a diverse set of datasets. Related Work Symbolic regression using neural networks. In addition to the review in the Introduction, there are studies treating NNs as a data augmentation tool to create high-quality data for SR [18, 2] . Neural architecture search. Searching the connections of LOCAL falls into the area of Neural Architecture Search (NAS). NAS tries to find an optimal architecture of a target NN with the best performance [19] . The searching algorithms can be divided into RL-based, evolutionary algorithmbased, sequential optimization-based, and gradient descent-based methods. For RL-based methods, [20] employs an RNN model to sample the architecture and utilizes the accuracy of the sampled network as the reward. [21] uses tabular Q-learning to find the connections of a target NN. However, tabular Q-learning can hardly be applicable when the state and action spaces are large, e.g., SR problem. The evolutionary algorithm employs methods such as GP [22] and tournament selection [23] . However, these methods may lack the scalability for SR problem [3] . The sequential optimizationbased method is more scalable as the model complexity increases in a sequential manner [24] . Finally, the gradient descent-based method [25] builds a large and over-parameterized network to search and train. Then, regularizations like dropout are added to find the best connections. However, for these methods, the theoretical guarantees remain opaque for SR problem. Global optimum in neural networks. Many studies have been conducted to seek global optimality in NNs, and they can be categorized into finding global optima for weights or input variables. The weight optimization is directly linked to finding symbol coefficients in LOCAL. Specifically, [26] investigates a single hidden layer with unbounded neurons and non-Euclidean regularization. The authors show the training can be done via convex optimization problems. [27] considers finite neurons and develops a novel duality theory to train two-layer NNs with convexity. [28] establishes a strong result that every local optimum is a global optimum for deep non-linear networks under several assumptions. [29] eliminates these assumptions and finds that with weight decay regularization (e.g., l 2 norm), the loss function of NN with ReLU activations is piece-wise strongly convex in local regions. However, LOCAL doesn't fit the above conditions. The second group of input variable optimization can help search for optimal inputs. [17] proposes ICNN such that the output of ICNN is convex in input variables. The key of ICNN is to restrict some weights and activation functions to preserve convexity. ICNN facilitates to design a convexified search algorithm.