NeurIPS2020

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, Rong Ge

15 citations

Abstract

Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an l-th order tensor in (R d ) ⊗l of rank r (where r ≪ d), can variants of gradient descent find a rank m decomposition where m > r? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least m = Ω(d l-1 ), while a variant of gradient descent can find an approximate tensor when m = O * (r 2.5l log d). Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data. * Equal contribution. Notations We use [n] as a shorthand for 1, 2, • • • , n. We use O(•), Ω(•) to hide constant factor dependencies. We use O * (•) to hide constant factors and also the dependency on accuracy ǫ. We use poly(•) to represent a polynomial on the relevant parameters with constant degree. Tensor notations: We use ⊗ as the tensor product (outer product). An l-th order d-dimensional tensor is defined as an element in space A tensor is symmetric if the entry values remain unchanged for any permutation of its indices. We define vec(•) to be the vectorize operator for tensors, mapping a tensor in (R d ) ⊗l to a vector in and the rank of a tensor is defined as the minimum integer k such that this tensor equals the sum of k rank-1 tensors. Norm and inner product: We use v to denote the ℓ 2 norm of a vector v. For l-th order tensors T, T ′ ∈ (R d ) ⊗l (vectors and matrices can be viewed as tensors with order 1 and 2, respectively), we define the inner product as T, T 3 Problem setup and challenges In this section we discuss the objective for over-parameterized tensor decomposition and explain the challenges in optimizing this objective.