NeurIPS2020

How many samples is a good initial point worth in Low-rank Matrix Recovery?

Jialun Zhang, Richard Y. Zhang

被引用 16 次

摘要

Given a sufficiently large amount of labeled data, the non-convex low-rank matrix recovery problem contains no spurious local minima, so a local optimization algorithm is guaranteed to converge to a global minimum starting from any initial guess. However, the actual amount of data needed by this theoretical guarantee is very pessimistic, as it must prevent spurious local minima from existing anywhere, including at adversarial locations. In contrast, prior work based on good initial guesses have more realistic data requirements, because they allow spurious local minima to exist outside of a neighborhood of the solution. In this paper, we quantify the relationship between the quality of the initial guess and the corresponding reduction in data requirements. Using the restricted isometry constant as a surrogate for sample complexity, we compute a sharp "threshold" number of samples needed to prevent each specific point on the optimization landscape from becoming a spurious local minimum. Optimizing the threshold over regions of the landscape, we see that for initial points around the ground truth, a linear improvement in the quality of the initial guess amounts to a constant factor improvement in the sample complexity. Introduction A perennial challenge in non-convex optimization is the possible existence of bad or spurious critical points and local minima, which can cause a local optimization algorithm like gradient descent to slow down or get stuck. Several recent lines of work showed that the effects of non-convexity can be tamed through a large amount of diverse and high quality training data [17, 1, 9, 3, 18, 12] . Concretely, these authors showed that, for classes of problems based on random sampling, spurious critical points and local minima become progressively less likely to exist with the addition of each new sample. After a sufficiently large number of samples, all spurious local minima are eliminated, so any local optimization algorithm is guaranteed to converge to the globally optimal solution starting from an arbitrary, possibly random initial guess. This notion of a global guarantee-one that is valid starting from any initial point-is considerably stronger than what is needed for empirical success to be observed [8] . For example, the existence of a spurious local minimum may not pose an issue if gradient descent does not converge towards it. 34th Conference on Neural Information Processing Systems (NeurIPS 2020),