NeurIPS2025

Certifying Deep Network Risks and Individual Predictions with PAC-Bayes Loss via Localized Priors

Wen Dong

摘要

As machine learning increasingly relies on large, opaque foundation models powering generative and agentic AI, deploying these systems in safety-critical contexts demands rigorous generalization guarantees beyond training data. PAC-Bayes theory provides principled certificates linking training performance to generalization risk, yet existing approaches remain impractical: simple theoretical priors yield vacuous bounds, while data-dependent priors require costly second-stage training or introduce bias. To bridge this critical gap, we propose a localized PAC-Bayes prior-a structured, computationally efficient prior softly concentrated around parameters favored during standard training. By integrating this localized prior directly into the standard training objective, we deliver practically tight generalization certificates with minimal workflow disruption. Under standard neural tangent kernel assumptions, our bound shrinks as networks widen and datasets grow, becoming negligible in realistic regimes. Empirically, we demonstrate tight generalization certificates on tasks ranging from image classification (MNIST, CIFAR, ImageNet) and NLP fine-tuning (GLUE) to semantic segmentation (Cityscapes), typically within three percentage points of test error at ImageNet scale. Additionally, our approach provides rigorous guarantees for individual predictions, selective rejection of uncertain predictions, adversarial robustness, and accurate calibration-directly addressing key requirements for trustworthy AI deployment. * Affiliation provided for identification only; work performed in personal capacity on personal time. 39th Conference on Neural Information Processing Systems (NeurIPS 2025). priors produce vacuous bounds on large-scale architectures: the resulting KL divergence can reach thousands of nats, rendering the method practically unusable. We address this fundamental limitation by employing a localized prior, an approach inspired by Catoni's original work [6] and subsequent extensions [7, 8] . Specifically, our localized prior is defined as π loc (θ) ∝ π(θ) exp[-ξλr S (θ)], with 0 < ξ < 1. Here, π(θ) is the original dataindependent prior, r S (θ) is the empirical loss, and the parameters λ, ξ control the strength of the shift toward promising parameters. This prior closely mirrors the ideal Gibbs posterior distribution ρ(θ) ∝ π(θ) exp[-λr S (θ)], which emerges naturally as the limit case when ξ = 1. Thus, the factor ξ < 1 introduces controlled softening, preventing overfitting by keeping the prior more dispersed than the fully empirical Gibbs posterior. Optimizing ξ and λ as part of standard SGD seamlessly integrates PAC-Bayes regularization into training, yielding tight and practically meaningful generalization bounds for modern neural networks. Empirically, our localized PAC-Bayes bound integrates seamlessly into standard training by directly replacing traditional loss objectives with PAC-Bayes-based counterparts. On benchmarks ranging from classical image classification (MNIST, CIFAR-10/100, ImageNet) to modern tasks such as Cityscapes semantic segmentation and GLUE NLP fine-tuning, our method consistently provides rigorous, tight, and meaningful certificates. The certification overhead is minimal, comparable to adding just one training epoch, while offering reliable individual-level guarantees, selective prediction strategies, and robust adversarial input detection. Theoretically, we confirm that our bound converges favorably with increasing data and network size, establishing clear relationships between localization parameters, network width, and sample size. Informally stated, under standard scaling conditions, the KL term shrinks linearly in network width and inversely with the sample size, vanishing entirely in infinite-width limits (Theorem 3.2). In summary, we (i) introduce a localized, trajectory-aware PAC-Bayes bound, transforming PAC-Bayes from theoretical curiosity to practical diagnostic, (ii) demonstrate straightforward, effective integration into real-world deep-learning pipelines, and (iii) validate our approach across diverse vision, language, and control tasks, aligning closely with emerging regulatory requirements. PAC-Bayes Preliminaries and Notation Machine-learning papers typically report empirical training loss, yet practical deployment demands guarantees on unseen data. The difference between these defines a deviation event. Formally, given a dataset S = (x i , y i ) N i=1 drawn from an unknown distribution D, a predictor parameterized by θ, and a per-example loss ℓ(θ; x i , y i ) ∈ [0, 1], we denote the empirical loss by r S (θ) = 1 N i ℓ(θ; x i , y i ) and the population (true) loss by R(θ) = E (x,y)∼D [ℓ(θ; x, y)]. Our central question: how improbable is the event R(θ) -r S (θ) ≫ 0? The simplest way to bound such deviations uses Markov's inequality, controlling the tail of a nonnegative random variable X via its expectation: P[X > k EX] ≤ 1/k. To sharpen control, we transform deviat