ICLR2026

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

Ilia Azizi, Juraj Bodik, Jakob Heiss, Bin Yu

被引用 4 次

摘要

Accurate uncertainty quantification is critical for reliable predictive modeling. Existing methods typically address either aleatoric uncertainty due to measurement noise or epistemic uncertainty resulting from limited data, but not both in a balanced manner. We propose CLEAR, a calibration method with two distinct parameters, γ 1 and γ 2 , to combine the two uncertainty components and improve the conditional coverage of predictive intervals for regression tasks. CLEAR is compatible with any pair of aleatoric and epistemic estimators; we show how it can be used with (i) quantile regression for aleatoric uncertainty and (ii) ensembles drawn from the Predictability-Computability-Stability (PCS) framework for epistemic uncertainty. Across 17 diverse real-world datasets, CLEAR achieves an average improvement of 28.3% and 17.5% in the interval width compared to the two individually calibrated baselines while maintaining nominal coverage. Similar improvements are observed when applying CLEAR to Deep Ensembles (epistemic) and Simultaneous Quantile Regression (aleatoric). The benefits are especially evident in scenarios dominated by high aleatoric or epistemic uncertainty. Project page: https://unco3892. github.io/clear/ Published as a conference paper at ICLR 2026 2 METHOD 2.1 PROBLEM SCENARIO Consider a classical setting, where an i.i.d. sample (X i , Y i ), i = 1, . . . , n is drawn from distribution P X × P Y |X . The goal of conformal inference is to construct a prediction set C(X n+1 ) ⊆ supp(Y ) for a new data-point (X n+1 , Y n+1 ) satisfying marginal coverage where α ∈ (0, 1) is for instance α = 0.05. In order to construct C, data D = (X i , Y i ), i = 1, . . . , n can be split into train and calibration subsets D train , D cal . On the training data, a first estimate of C can be constructed, and then we can use data from D cal to calibrate C such that (2) is satisfied. In case of CQR, we first estimate conditional quantiles qα/2 (x), q1-α/2 (x) using D train , and then construct where the calibration parameter γ is chosen so that the prediction interval While this procedure guarantees finite-sample distribution-free marginal coverage (Angelopoulos et al., 2024) , conditional coverage does not need to hold. As pointed out in Lei & Wasserman (2014); Barber et al. ( 2020 ), any algorithm with finite-sample distribution-free conditional coverage guarantees for all x must be trivial C(x) = (-∞, ∞). However, we aim to design estimators such that conditional coverage holds approximately under reasonable real-world scenarios, even if exact finite-sample guarantees are impossible in general. EPISTEMIC UNCERTAINTY The traditional machine learning approach trains a predictive algorithm on a single version of the cleaned/preprocessed dataset and uses the best-performing algorithm (compared using the validation set) for future predictions. While theoretically sound in the infinite-sample limit, this approach ignores the uncertainty stemming from finite sample size and model choice (epistemic uncertainty). Various methods have been proposed to estimate this uncertainty, including Deep Ensembles (Laksh-