NeurIPS2020

Learning Parities with Neural Networks

Amit Daniely, Eran Malach

被引用 92 次

摘要

In recent years we see a rapidly growing line of research which shows learnability of various models via common neural network algorithms. Yet, besides a very few outliers, these results show learnability of models that can be learned using linear methods. Namely, such results show that learning neural-networks with gradientdescent is competitive with learning a linear classifier on top of a data-independent representation of the examples. This leaves much to be desired, as neural networks are far more successful than linear methods. Furthermore, on the more conceptual level, linear models don't seem to capture the "deepness" of deep networks. In this paper we make a step towards showing leanability of models that are inherently non-linear. We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network. On the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods. How far can neural network theory go beyond linear models? In this work we show a family of distributions on which neural-networks trained with gradient-descent achieve small error. On the other hand, approximating the same family using a linear classifier on top of an embedding of the input space in R N , requires N which grows exponentially, or otherwise requires a linear classifier with exponential norm. Specifically, we focus on a standard and notoriously difficult family of target functions: parities over small subsets of the input bits. We show that this family is learnable with neural-networks under some specific choice of distributions. This implies that neural-networks algorithms are strictly stronger than linear methods, as the same family cannot be approximated by any polynomial-size linear model. Preprint. Under review.