NeurIPS2021

Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification

Youngseog Chung, Willie Neiswanger, Ian Char, Jeff Schneider

124 citations

Abstract

Among the many ways of quantifying uncertainty in a regression setting, specifying the full quantile function is attractive, as quantiles are amenable to interpretation and evaluation. A model that predicts the true conditional quantiles for each input, at all quantile levels, presents a correct and efficient representation of the underlying uncertainty. To achieve this, many current quantile-based methods focus on optimizing the pinball loss. However, this loss restricts the scope of applicable regression models, limits the ability to target many desirable properties (e.g. calibration, sharpness, centered intervals), and may produce poor conditional quantiles. In this work, we develop new quantile methods that address these shortcomings. In particular, we propose methods that can apply to any class of regression model, select an explicit balance between calibration and sharpness, optimize for calibration of centered intervals, and produce more accurate conditional quantiles. We provide a thorough experimental evaluation of our methods, which includes a high dimensional uncertainty quantification task in nuclear fusion. Preliminaries and Background We first lay out the notation, terminology, and class of models considered in this paper. Then we provide an overview of evaluation metrics in UQ and demonstrate how the pinball loss may be inadequate both as an evaluation metric and as an optimization objective. Notation Bold upper case letters X, Y denote random variables, lower case letters x, y, denote their values, and calligraphic upper case letters X , Y denote sets of possible values. We use x ∈ X to denote the input feature vector and y ∈ Y to denote the corresponding target. Additionally, we consider the regression setting where Y ⊂ R and X ⊂ R n . We use F X , F Y|x , F Y to denote the true cumulative distribution of the subscript random variable. For any x ∈ X , we assume there exists a true conditional distribution F Y|x over Y, and we assume Q p (x) denotes the true p th quantile of this distribution, i.e. F Y|x (Q p (x)) = p. Any estimates of the true functions F, Q p will be denoted with a hat, F, Qp . We will specifically refer to any family of estimates for Q p , with p ∈ (0, 1), as a "quantile model", denoted Q : X × (0, 1) → Y. Unless otherwise noted, we will always consider the conditional problem of estimating quantities in the target space Y, conditioned on a value x ∈ X . Assessing the Quality of Predictive UQ While various metrics have been proposed to assess the quality of UQ, there has been a great deal of recent focus on the notions of calibration and sharpness [15, 13, 65, 60, 55, 35, 21, 20] . We introduce calibration here, but for a more thorough treatment, see Zhao et al. [65]. Broadly speaking, calibration in the regression setting requires that the probability of observing the target random variable below a predicted p th quantile is equal to the expected probability p, for all p ∈ (0, 1). We refer to the former quantity as the observed probability and denote it p obs (p), for an expected probability p, which we 0 200 400 600 800 Train Epoch 10 1 6 × 10 2 2 × 10 1 3 × 10 1 Pinball Loss SQR Val Ep: 350 10 2 10 1 Calibration Loss Cali Val Ep: 410 (a) Test Loss Curves 0 200 400 600 800 Train Epoch 0.05 0.10 0.15 0.20 0.25 Calibration Error SQR Cali SQR Val Ep (350) Cali Val Ep (410) (b) Test Calibration