CVPR2020

Improving Confidence Estimates for Unfamiliar Examples

Zhizhong Li, Derek Hoiem

Abstract

In Sec. 1, we compare the entropy and cross-entropy (NLL) of three approaches to analyze overconfidence. In Sec. 2, we show experimental results on a simple dataset to illustrate why ensembles perform well for unfamiliar samples and how use of unsupervised samples by Gdistill can lead it to mimic the performance of the ensemble (at least in the ideal case where unsupervised samples cover a superset of the unfamiliar samples). In Sec. 3, we show the complete table of results, mainly to simplify comparisons by any later works. Note that in the supplemental material methods without "T-scaling" in the name do not use calibration. In the main table of the paper, for brevity, only results with calibration are shown except where noted. So "Ensemble" in the main paper is "Ensemble of T-scaled models" here. In Sec. 4, we show results on one of the tasks with DenseNet-161, supporting the same conclusions as we found based on experiments with ResNet-18. We leave a more complete exploration of depth and architecture of network to future work.