KDD2025
Synthetic Survey Data Generation and Evaluation
Yanru Jiang, Siyu Liang, Junwon Choi
摘要
Survey data are common and invaluable in social science research for understanding population processes and supporting policymaking and planning. Depending on the nature and scale, survey data sharing comes with privacy risks, and data collectors and agencies are constrained by disclosure permissions, limiting usage across research groups and institutes. Previous methods for synthetic data generation and deidentification may not entirely prevent information disclosures, or they may sacrifice data quality and granularity.