ICLR2022

An Information Fusion Approach to Learning with Instance-Dependent Label Noise

Zhimeng Jiang, Kaixiong Zhou, Zirui Liu, Li Li, Rui Chen, Soo-Hyun Choi, Xia Hu

35 citations

Abstract

The generation of label noise is often modeled as a process involving a probability transition matrix (also interpreted as the annotator confusion matrix) imposed onto the label distribution. Under this model, learning the "ground-truth classifier"i.e., the classifier that can be learned if no noise was present-and the confusion matrix boils down to a model identification problem. Prior works along this line demonstrated appealing empirical performance, yet identifiability of the model was mostly established by assuming an instance-invariant confusion matrix. Having an (occasionally) instance-dependent confusion matrix across data samples is apparently more realistic, but inevitably introduces outliers to the model. Our interest lies in confusion matrix-based noisy label learning with such outliers taken into consideration. We begin with pointing out that under the model of interest, using labels produced by only one annotator is fundamentally insufficient to detect the outliers or identify the ground-truth classifier. Then, we prove that by employing a crowdsourcing strategy involving multiple annotators, a carefully designed loss function can establish the desired model identifiability under reasonable conditions. Our development builds upon a link between the noisy label model and a columncorrupted matrix factorization mode-based on which we show that crowdsourced annotations distinguish nominal data and instance-dependent outliers using a lowdimensional subspace. Experiments show that our learning scheme substantially improves outlier detection and the classifier's testing accuracy. * Equal contribution. 38th Conference on Neural Information Processing Systems (NeurIPS 2024). model the annotators' expertise level and the difficulty of annotating each class/sample, and thus is considered intuitive. Under this model, learning the "label noise-free" target neural classifier boils down to identifying the confusion matrix. The confusion matrix-based models have proven quite useful in practice-algorithms developed in this line of work often exhibits appealing empirical performance; see, e.g., [9, [14] [15] [16] [17] [19] [20] [21] [23] [24] [25] [26] [27] [28] [29] [30] [31] . In addition, these models admit interesting statistical and algebraic structures, leading to plausible results on identifiability of the confusion matrix and/or the "ground-truth classifier"-i.e., the classifier that can be learned if no noisy annotations were present. However, most of the aforementioned early works considered an instance-invariant confusion matrix-i.e., a confusion matrix is not affected by sample features, but only classes-for analytical and computational simplicity. Considering instancedependent confusion models is more realistic, as the sample characteristics, e.g., lightening and resolution of an image, affect the annotation accuracy [32] . The existence of such (at least occasionally occurred) instance-dependent noisy labels inevitably introduces outliers to the instance-invariant confusion models, leading to performance degradation. In general, learning under instance-dependent confusion matrices is heavily ill-posed. Hence, various problem-specific structures were exploited to add regularization terms and constraints; see, e.g., [27, 28, [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] . Nonetheless, unlike the instanceinvariant confusion matrix case, identifiability guarantees of the target classifier have been largely under-studied. The lack of theoretical understanding also affects algorithm design-many approaches in this domain had to resort to somewhat ad-hoc treatments with multi-stage training procedures, often involving nontrivial pre-and post-processing; see [27, 28, 33-35, 37, 39, 40]. Contributions. To advance understanding, we consider a model where instance-dependent confusion matrices occur occasionally across the samples, and the rest of data share a common nominal confusion matrix. This way, the instance-dependent noisy labels can be regarded as outliers. The model is motivated by the fact that only a proportion of all instances may have a labeling difficulty that significantly deviates from the general population [36, 44, 45] . Our contributions are as follows: