NeurIPS2023
Reliable learning in challenging environments
Maria-Florina Balcan, Steve Hanneke, Rattana Pukdee, Dravyansh Sharma
被引用 9 次
摘要
The problem of designing learners that provide guarantees that their predictions are provably correct is of increasing importance in machine learning. However, learning theoretic guarantees have only been considered in very specific settings. In this work, we consider the design and analysis of reliable learners in challenging testtime environments as encountered in modern machine learning problems: namely 'adversarial' test-time attacks (in several variations) and 'natural' distribution shifts. In this work, we provide a reliable learner with provably optimal guarantees in such settings. We discuss computationally feasible implementations of the learner and further show that our algorithm achieves strong positive performance guarantees on several natural examples: for example, linear separators under log-concave distributions or smooth boundary classifiers under smooth probability distributions. provide correctness guarantees for individual predictions: i.e., reliability. In this work, we advance this line of work by developing a general understanding of how to learn reliably in the presence of corruptions or changes to the test set, specifically under adversarial test-time attacks as well as distribution shift between the training (source) and test (target) data. Our results. We consider algorithms that provide robustly-reliable predictions which are guaranteed to be correct under standard assumptions from statistical learning theory, for both test-time attacks and distribution shift. Our first main set of results tackles the challenging case of adversarial test-time perturbations. For this setting, we introduce a novel compelling reliability criterion on a learner that particularly captures the challenge of reliability under the test-time attacks. Given a test point z, a robustly-reliable classifier either abstains from prediction, or outputs both a prediction y and a reliability guarantee η with the guarantee that y is correct unless one of two bad events has occurred: 1) the true target function does not belong to the given hypothesis set H or, 2) a test-point z is perturbed from its original point by adversarial strength of at least η (measured in the relevant metric). In the case of distribution shift, we provide novel analysis and a complexity measure that extend the classical notion of reliable learning to the setting when the test distribution is allowed to be an arbitrary new distribution. Summary of contributions 1. We propose robustly-reliable learners for test-time attacks which guarantee reliable learning in the presence of test-time attacks, and characterize the region of instance space where they are simultaneously robust and reliable. Specifically, under the realizable setting, for adversarial perturbations within metric balls around the test points, we use the radius of the metric ball as a natural notion of adversarial strength. We output a reliability radius η with a guarantee that our prediction on a point is correct as long as it was perturbed with a distance less than η (under a given metric). We further show that our proposed robustly-reliable learner achieves pointwise optimal values for this reliability radius: that is, no robustly-reliable learner can output a reliability radius larger than our learner for any point in the instance space (Theorem B.1, B.2). 2. The pointwise optimal algorithm is easy to derive from our definition. We discuss a computationally efficient implementation of the optimal learners. (Section 4). 3. We discuss variants of these algorithms and guarantees appropriate for three different variants of adversarial losses studied in the literature: depending on whether the perturbed point must have the same label as the original point, or in lieu of this, whether the algorithm should predict the true label of the perturbed point, or the same label as the original point (Definition 1). 4. We further introduce a safely-reliable region, which captures the challenge caused by the adversary's ability to perturb a test point to cause a reduction in our reliability radius (Definition 6). As examples, we show that the safely-reliable region can be large for linear separators under log-concave distributions and for classifiers with smooth decision boundaries under nearly-uniform distributions and as a consequence, the robustly-reliable region is large as well (Theorem 3.3). 5. We extend this characterization to abstention-based reliable predictions for arbitrary adversarial perturbation sets, where we no longer restrict ourselves to metric balls. We again get a tight characterization of the robustly-reliable region (Theorem C.1). 6. We also consider reliability in the distribution shift setting where the test data points come from a different distribution. We introduce a novel refinement to the notion of disagreement coefficient [Han07], to measure the transferability of reliability guarantees across distributions. We provide bounds on the probability mass of the reliable