KDD2025

Synthetic Survey Data Generation and Evaluation

Yanru Jiang, Siyu Liang, Junwon Choi

摘要

Survey data are common and invaluable in social science research for understanding population processes and supporting policymaking and planning. Depending on the nature and scale, survey data sharing comes with privacy risks, and data collectors and agencies are constrained by disclosure permissions, limiting usage across research groups and institutes. Previous methods for synthetic data generation and deidentification may not entirely prevent information disclosures, or they may sacrifice data quality and granularity.