NeurIPS2020

Random Reshuffling is Not Always Better

Christopher De Sa

24 citations

Abstract

Many learning algorithms, such as stochastic gradient descent, are affected by the order in which training examples are used. It is generally believed that sampling the training examples without-replacement, also known as random reshuffling, causes learning algorithms to converge faster. We give a counterexample to the Operator Inequality of Noncommutative Arithmetic and Geometric Means, a longstanding conjecture that relates to the performance of random reshuffling in learning algorithms [19] . We use this to give an example of a learning task and algorithm for which with-replacement random sampling outperforms random reshuffling. Conjecture 1 (Operator Inequality of Noncommutative Arithmetic and Geometric Means). Let A 1 , . . . , A n ∈ R d×d be a collection of (symmetric) positive semidefinite matrices. Then it is 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.