ICML2024

H-Consistency Guarantees for Regression

Anqi Mao, Mehryar Mohri, Yutao Zhong

被引用 18 次

摘要

We present a detailed study of H-consistency bounds for regression. We first present new theorems that generalize the tools previously given to establish H-consistency bounds. This generalization proves essential for analyzing H-consistency bounds specific to regression. Next, we prove a series of novel H-consistency bounds for surrogate loss functions of the squared loss, under the assumption of a symmetric distribution and a bounded hypothesis set. This includes positive results for the Huber loss, all ℓ p losses, p ≥ 1, the squared ǫ-insensitive loss, as well as a negative result for the ǫ-insensitive loss used in squared Support Vector Regression (SVR). We further leverage our analysis of H-consistency for regression and derive principled surrogate losses for adversarial regression (Section 5). This readily establishes novel algorithms for adversarial regression, for which we report favorable experimental results in Section 6. Recent seminal work by Awasthi, Mao, Mohri, and Zhong (2022a,b) and Mao, Mohri, and Zhong (2023f,c,e,b) has analyzed H-consistency bounds for broad families of surrogate losses in binary classication, multi-class classification, structured prediction, and abstention (Mao et al., 2023a). These bounds are more informative than Bayes-consistency since they are hypothesis set-specific and do not require the entire family of measurable functions. Moreover, they offer finite sample, non-asymptotic guarantees. In light of these recent guarantees, the following questions naturally arise: Can we derive a non-asymptotic analysis of regression taking into account the hypothesis set? How can we benefit from that analysis? While there is some previous work exploring Bayes-consistency in regression (Caponnetto, 2005; Christmann and Steinwart, 2007; Steinwart, 2007) , we are not aware of any prior H-consistency bounds or similar finite sample guarantees for surrogate losses in regression, such as, for example, the Huber loss or the squared ǫ-insensitive loss. This paper presents the first in-depth study of H-consistency bounds in the context of regression. We first present new theorems that generalize the tools previously given by Awasthi et al. (2022a,b) and Mao et al. (2023f,c,e,b) to establish H-consistency bounds (Section 3). This generalization proves essential in regression for analyzing H-consistency bounds for surrogate losses such as Huber loss and the squared ǫ-insensitive loss. It also provides finer bounds for the ℓ 1 loss. Next, we prove a series of H-consistency bounds for surrogate loss functions of the squared loss, under the assumption of a symmetric distribution and a bounded hypothesis set (Section 4). We prove the first H-consistency bound for the Huber loss, which is a commonly used surrogate loss used to handle outliers, contingent upon a specific condition concerning the Huber loss parameter δ and the distribution mass around the mean. We further prove that this condition is necessary when H is realizable. We then extend our analysis to cover H-consistency bounds for ℓ p losses, for all values of p ≥ 1. In particular, remarkably, we give guarantees for the ℓ 1 loss and ℓ p losses with p ∈ (1, 2). We further analyze the ǫ-insensitive and the squared ǫ-insensitive losses integral to the definition of the SVR (Support Vector Regression) and quadratic SVR algorithms (Vapnik, 2000) . These loss functions and SVR algorithms admit the benefit of yielding sparser solutions. We give the first H-consistency bound for the quadratic ǫ-insensitive loss. We also prove a negative result for the ǫ-insensitive loss: this loss function used in the definition of SVR does not admit H-consistency bounds with respect to the squared loss, even under some additional assumptions on the parameter ǫ and the distribution. Subsequently, leveraging our analysis of H-consistency for regression, we derive principled surrogate losses for adversarial regression (Section 5). This readily establishes a novel algorithm for adversarial regression, for which we report favorable experimental results in Section 6. Previous work. Bayes-consistency has been extensively studied in various learning problems.