NeurIPS2022

Deep Ensembles Work, But Are They Necessary?

Taiga Abe, Estefany Kelly Buchanan, Geoff Pleiss, Richard S. Zemel, John P. Cunningham

88 citations

Abstract

Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of individual larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer distinct benefits beyond predictive power: namely, uncertainty quantification and robustness to dataset shift. In this work, we demonstrate limitations to these purported benefits, and show that a single (but larger) neural network can replicate these qualities. First, we show that ensemble diversity, by any metric, does not meaningfully contribute to an ensemble's uncertainty quantification on out-of-distribution (OOD) data, but is instead highly correlated with the relative improvement of a single larger model. Second, we show that the OOD performance afforded by ensembles is strongly determined by their in-distribution (InD) performance, and-in this sense-is not indicative of any "effective robustness." While deep ensembles are a practical way to achieve improvements to predictive power, uncertainty quantification, and robustness, our results show that these improvements can be replicated by a (larger) single model. Recent research suggests that deep ensembles may be preferable to single models in safety-critical applications and settings where data shifts significantly away from the training distribution. First, Lakshminarayanan et al. [45] demonstrate that deep ensembles provide well-calibrated estimates of * Equal contribution. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).