ICLR2026

A Bayesian Nonparametric Framework for Private, Fair, and Balanced Tabular Data Synthesis

Forough Fazeli-Asl, Michael Minyi Zhang, Linglong Kong, Bei Jiang

Abstract

A fundamental challenge in data synthesis is protecting the fairness and privacy of the individual, particularly in data-scarce environments where underrepresented groups are at risk of further marginalization by reproducing the biases inherent in the data modeling process. We introduce a privacy-and fairness-aware generative model, which fuses the conditional generator within the framework of Bayesian nonparametric learning (BNPL). This conditional structure imposes fairness constraints in our generative model by minimizing the mutual information between generated outcomes and protected attributes. Unlike existing methods that primarily focus on sensitive binary-valued attributes, our framework extends seamlessly to non-binary attributes. Moreover, our method provides a systematic solution to class imbalance, ensuring adequate representation of underrepresented protected groups. Our proposed approach offers a scalable, privacy-preserving framework for ethical and equitable data generation, which we demonstrate by theoretical guarantees and extensive experiments on sensitive empirical examples. Randomized Response Mechanism (RRM): Randomized response is a privacy-preserving mechanism used to privatize categorical data (Wang et al., 2016). Let X be a categorical random variable taking values from a discrete set [[K]], where K is the number of categories. The RRM, denoted by M RRM (X; ϵ), perturbs the original value X according to a privacy budget ϵ, which controls the trade-off between privacy and accuracy. The ϵ-differential privacy mechanism is given as PRIVACY AND FAIRNESS PRESERVATION WITH BAYESIAN NONPARAMETRIC LEARNING Our proposed generative model uses BNPL (Fong et al., 2019) as a method of ensuring privacy and fairness protection by resampling the data from a Dirichlet process (DirP) posterior, which we will first introduce in this section. Corollary 1 (Global Perfect Privacy) Under the conditions of Proposition 1, as a → ∞, we have (i) ϵ glo → 0; moreover, (ii) δ glo p -→ 0 for fixed |W | = N -1.