NeurIPS2020

Random Reshuffling is Not Always Better

Christopher De Sa

被引用 24 次

摘要

Many learning algorithms, such as stochastic gradient descent, are affected by the order in which training examples are used. It is generally believed that sampling the training examples without-replacement, also known as random reshuffling, causes learning algorithms to converge faster. We give a counterexample to the Operator Inequality of Noncommutative Arithmetic and Geometric Means, a longstanding conjecture that relates to the performance of random reshuffling in learning algorithms [19] . We use this to give an example of a learning task and algorithm for which with-replacement random sampling outperforms random reshuffling. Conjecture 1 (Operator Inequality of Noncommutative Arithmetic and Geometric Means). Let A 1 , . . . , A n ∈ R d×d be a collection of (symmetric) positive semidefinite matrices. Then it is 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.