ICLR2021
Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time
Tolga Ergen, Mert Pilanci
4 citations
Abstract
We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several two-and three-layer CNN architectures. We first prove that two-layer CNNs can be globally optimized via an 2 norm regularized convex program. We then show that multi-layer circular CNN training problems with a single ReLU layer are equivalent to an 1 regularized convex program that encourages sparsity in the spectral domain. We also extend these results to three-layer CNNs with two ReLU layers. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers. INTRODUCTION Convolutional Neural Networks (CNNs) have shown a remarkable success across various machine learning problems (LeCun et al., 2015) . However, our theoretical understanding of CNNs still remains restricted, where the main challenge arises from the highly non-convex and nonlinear structure of CNNs with nonlinear activations such as ReLU. Hence, we study the training problem for various CNN architectures with ReLU activations and introduce equivalent finite dimensional convex formulations that can be used to globally optimize these architectures. Our results characterize the role of network architecture in terms of equivalent convex regularizers. Remarkably, we prove that the proposed methods are polynomial time with respect to all problem parameters. Convex neural network training was previously considered in Bengio et al. (2006) ; Bach (2017). However, these studies are restricted to two-layer fully connected networks with infinite width, thus, the optimization problem involves infinite dimensional variables. Moreover, it has been shown that even adding a single neuron to a neural network leads to a non-convex optimization problem which cannot be solved efficiently (Bach, 2017