NeurIPS2023
On the Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: an Improved Analysis
Ta Duy Nguyen, Alina Ene, Huy L. Nguyen
摘要
In this work, we revisit the generalization error of stochastic mirror descent for quadratically bounded losses studied in Telgarsky (2022). Quadratically bounded losses is a broad class of loss functions, capturing both Lipschitz and smooth functions, for both regression and classification problems. We study the high probability generalization for this class of losses on linear predictors in both realizable and non-realizable cases when the data are sampled IID or from a Markov chain. The prior work relies on an intricate coupling argument between the iterates of the original problem and those projected onto a bounded domain. This approach enables blackbox application of concentration inequalities, but also leads to suboptimal guarantees due in part to the use of a union bound across all iterations. In this work, we depart significantly from the prior work of Telgarsky (2022), and introduce a novel approach for establishing high probability generalization guarantees. In contrast to the prior work, our work directly analyzes the moment generating function of a novel supermartingale sequence and leverages the structure of stochastic mirror descent. As a result, we obtain improved bounds in all aforementioned settings. Specifically, in the realizable case and non-realizable case with light-tailed sub-Gaussian data, we improve the bounds by a log T factor, matching the correct rates of 1/T and 1/ √ T , respectively. In the more challenging case of heavy-tailed polynomial data, we improve the existing bound by a poly T factor. Recently, [35] proposes a new approach to analyze the generalization error with high probability of stochastic mirror descent for a broad class of quadratically bounded losses, beyond the realizable setting. This class of losses encapsulates both Lipschitz and smooth functions, for both regression and classification problems. The obtained bounds complement existing in-expectation bounds [12] and nearly match the counterpart of convergence rates in optimization. While this result pushes forward the state of the art, the obtained guarantees do not completely resolve the problem. The central piece of the proposed approach is a "coupling" technique between the iterates of the original problem and those projected onto a bounded domain. In this technique, one first constrains the problem in a bounded domain with a well chosen diameter. The bounded domain diameter allows to apply concentration inequalities as a blackbox and obtain bounds in high probability. Then using an inductive argument and a union bound across all iterations, one can show that the iterates in the original problem coincide with the ones in the constrained problem. Due to the union bound, the success probability decreases from 1 -δ to 1 -T δ, where T is the number of iterations in the algorithm. This loss translates to a milder log T factor loss in the guarantee in the case of realizable data , and a more significant poly T factor loss in the non-realizable setting when the data has polynomial tails. Thus a natural question arises of whether we can obtain a stronger analysis that closes these remaining gaps. In this paper, we revisit these generalization bounds for quadratically bounded losses by [35] . We introduce a novel approach to analyze the generalization errors of stochastic mirror descent in both realizable and non-realizable cases when the data are sampled IID or from a Markov chain. In all these cases, we remove the need to use the union bound argument, thus preventing the loss in the success probability. This translates to the following improvements: -In the realizable, and the non-realizable cases with sub-gaussian tailed data and Markovian data, we improve the bounds by a log T factor. This improvement comes from analyzing the moment generating function of a martingale difference sequence with well-chosen coefficients. In these cases, we also remove the necessity of using the coupling-based argument used in the same work by [35] . Instead, by solely making use of the problem structure, we arrive at the same conclusion that with high probability, the iterates of stochastic mirror descent for quadratically bounded losses behave as if the problem domain is bounded. -In the non-realizable case with polynomial tailed data, we improve the existing bound by a poly T factor. Due to the polynomial dependency on 1 δ , being able to maintain the same success probability through all iterations is crucial in this case. Unlike the previous work, we rely on a truncation technique. Using a more refined analysis of the truncated random variables, in combination with suitable concentration inequalities and the coupling technique, we improve the existing bounds significantly. Related Work Broadly speaking, there is a rich body of works in optimization and generalization that provide convergence guarantees and generalization bounds for stochastic methods. Earlier works often focus on in-expectation bounds [8, 24, 26, 18, 12, 3] , a