ICML2021

Nondeterminism and Instability in Neural Network Optimization

Cecilia Summers, Michael J. Dinneen

55 citations

Abstract

Nondeterminism in neural network optimization produces uncertainty in performance, making small improvements difficult to discern from runto-run variability. While uncertainty can be reduced by training multiple model copies, doing so is time-consuming, costly, and harms reproducibility. In this work, we establish an experimental protocol for understanding the effect of optimization nondeterminism on model diversity, allowing us to isolate the effects of a variety of sources of nondeterminism. Surprisingly, we find that all sources of nondeterminism have similar effects on measures of model diversity. To explain this intriguing fact, we identify the instability of model training, taken as an end-to-end procedure, as the key determinant. We show that even onebit changes in initial parameters result in models converging to vastly different values. Last, we propose two approaches for reducing the effects of instability on run-to-run variability. Introduction Consider this common scenario: you have a baseline "current best" model, and are trying to improve it. One of your experiments has produced a model whose metrics are slightly better than the baseline. Yet you have your reservations -how do you know the improvement is "real" and not due to run-to-run variability? Similarly, consider hyperparameter optimization, in which many possible values exist for a set of hyperparameters, with minor differences in performance between them. How do you pick the best hyperparameters, and how can you be sure that you've actually picked wisely? In both scenarios, the standard practice is to train multiple independent copies of your model to understand its variability. While this helps address the problem, it is extremely