WWW2024
Cardinality Counting in "Alcatraz": A Privacy-aware Federated Learning Approach
Nan Wu, Xin Yuan, Shuo Wang, Hongsheng Hu, Minhui Xue
8 citations
Abstract
The task of cardinality counting, pivotal for data analysis, endeavors to quantify unique elements within datasets and has significant applications across various sectors like healthcare, marketing, cybersecurity, and web analytics. Current methods, categorized into deterministic and probabilistic, often fail to prioritize data privacy. Given the fragmentation of datasets across various organizations, there is an elevated risk of inadvertently disclosing sensitive information during collaborative data studies using state-of-the-art cardinality counting techniques. This study introduces an innovative privacy-centric solution for the cardinality counting dilemma, leveraging a federated learning framework. Our approach involves employing a locally differentially private data encoding for initial processing, followed by a privacy-aware federated K-means clustering strategy, ensuring that cardinality counting occurs across distinct datasets without necessitating data amalgamation. The efficacy of our methodology is underscored by promising results from tests on both real-world and simulated datasets, pointing towards a transformative approach to privacy-sensitive cardinality counting in contemporary data science.