NeurIPS2025
Private Evolution Converges
Tomás González Lara, Giulia Fanti, Aaditya Ramdas
2 citations
Abstract
Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data) is less consistent. To date, the only theoretical analysis of the convergence of PE depends on unrealistic assumptions about both the algorithm's behavior and the structure of the sensitive dataset. In this work, we develop a new theoretical framework to understand PE's practical behavior and identify sufficient conditions for its convergence. For d-dimensional sensitive datasets with n data points from a convex and compact domain, we prove that under the right hyperparameter settings and given access to the Gaussian variation API proposed in [33] , PE produces an (ε, δ)-DP synthetic dataset with expected 1-Wasserstein distance Õ(d(nε) -1/d ) from the original; this establishes worst-case convergence of the algorithm as n → ∞. Our analysis extends to general Banach spaces as well. We also connect PE to the Private Signed Measure Mechanism, a method for DP synthetic data generation that has thus far not seen much practical adoption. We demonstrate the practical relevance of our theoretical findings in experiments. Recently, [33] introduced Private Evolution (PE), a promising new framework for DP synthetic data generation that relies on public, pretrained data generators [48, 32, 27, 41, 50, 28] . PE is currently competitive with-and sometimes improves on-state-of-the-art models in terms of Fréchet inception distance (FID) and downstream task performance in settings such as images and text [33, 48, 32, 27] . In addition, PE is training-free, whereas current state-of-the-art approaches typically train (or finetune) a generative model on the sensitive dataset using DP-SGD [49, 35, 14, 11] . However, in some settings, including tabular data [41] , and image data with mismatched distributions [21] , PE has achieved limited success. To better understand when PE works, it is crucial to improve our theoretical understanding of the algorithm. At a high level, PE works as follows (Figure 1 ). First, it creates a synthetic data set S 0 with an API that is independent of the sensitive dataset S (e.g a foundation model trained on public data). Then, iteratively it refines the synthetic data, creating S 1 , S 2 , ..., where S t is obtained from S t-1 by 39th Conference on Neural Information Processing Systems (NeurIPS 2025).