NeurIPS2022

Learning from Label Proportions by Learning with Label Noise

Jianxin Zhang, Yutong Wang, Clayton Scott

被引用 36 次

摘要

Learning from label proportions (LLP) is a weakly supervised classification problem where data points are grouped into bags, and the label proportions within each bag are observed instead of the instance-level labels. The task is to learn a classifier to predict the labels of future individual instances. Prior work on LLP for multi-class data has yet to develop a theoretically grounded algorithm. In this work, we propose an approach to LLP based on a reduction to learning with label noise, using the forward correction (FC) loss of Patrini et al. [30] . We establish an excess risk bound and generalization error analysis for our approach, while also extending the theory of the FC loss which may be of independent interest. Our approach demonstrates improved empirical performance in deep learning scenarios across multiple datasets and architectures, compared to the leading methods. Recently, Scott and Zhang [39] demonstrated a principled approach to LLP with performance guarantees based on a reduction to learning with label noise (LLN) in the binary setting. Their basic strategy was to pair bags, and view each pair of bags as an LLN problem, where the observed label proportions are related to the "label flipping" or "noise transition" probabilities. Using an existing technique for LLN based on loss correction, which allows the learner to train directly on the noisy data, they formulated an overall objective based on a (weighted) sum of objectives for each pair of bags. They established generalization error analysis and consistency for the method, and also showed that in the context of kernel methods, their approach outperformed the leading kernel methods. 36th Conference on Neural Information Processing Systems (NeurIPS 2022).