NeurIPS2022

Deep Ensembles Work, But Are They Necessary?

Taiga Abe, Estefany Kelly Buchanan, Geoff Pleiss, Richard S. Zemel, John P. Cunningham

被引用 88 次

摘要

Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of individual larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer distinct benefits beyond predictive power: namely, uncertainty quantification and robustness to dataset shift. In this work, we demonstrate limitations to these purported benefits, and show that a single (but larger) neural network can replicate these qualities. First, we show that ensemble diversity, by any metric, does not meaningfully contribute to an ensemble's uncertainty quantification on out-of-distribution (OOD) data, but is instead highly correlated with the relative improvement of a single larger model. Second, we show that the OOD performance afforded by ensembles is strongly determined by their in-distribution (InD) performance, and-in this sense-is not indicative of any "effective robustness." While deep ensembles are a practical way to achieve improvements to predictive power, uncertainty quantification, and robustness, our results show that these improvements can be replicated by a (larger) single model. Recent research suggests that deep ensembles may be preferable to single models in safety-critical applications and settings where data shifts significantly away from the training distribution. First, Lakshminarayanan et al. [45] demonstrate that deep ensembles provide well-calibrated estimates of * Equal contribution. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).