USENIX Security2025

How Researchers De-Identify Data in Practice

Wentao Guo, Paige Pepitone, Adam J. Aviv, Michelle L. Mazurek

Abstract

Human-subjects researchers are increasingly expected to deidentify and publish data about research participants. However, de-identification is difficult, lacking objective solutions for how to balance privacy and utility, and requiring significant time and expertise. To understand researchers' approaches, we interviewed 18 practitioners who have deidentified data for publication and 6 curators who review data submissions for repositories and funding organizations. We find that researchers account for the kinds of risks described by k-anonymity, but they address them through manual and social processes and not through systematic assessments of risk across a dataset. This allows for nuance but may leave published data vulnerable to re-identification. We explore why researchers take this approach and highlight three main barriers to more rigorous de-identification: threats seem unrealistic, stronger standards are not incentivized or supported, and tools do not meet researchers' needs. We conclude with takeaways for repositories, funding agencies, and privacy experts.