SIGMOD2025

A Fast, Mergeable, and LDP Compatible Sketch for Counting the Number of Distinct Values in Fully Dynamic Tables

Zhicheng Li, Pinghui Wang, Zeli Lin, Bichun Chen, Dongdong Xie

Abstract

Counting the number of distinct values (NDV) is a fundamental problem in web applications and databases, particularly under memory constraints. Sketch-based methods, such as the Flajolet-Martin sketch, construct compact data summaries to estimate NDV but primarily focus on insertion-only scenarios. However, supporting delete operations is crucial for maintaining accurate and up-to-date cardinality estimates in many real-world applications, such as databases. Existing methods for fully dynamic scenarios, involving both insertions and deletions, often incur considerable computational and memory overhead. Furthermore, collaborative computation often requires sharing sketches with external or untrusted parties, which introduces significant privacy risks. To address these challenges, we propose a novel sketch method, GMod, specifically designed for fully dynamic scenarios and compatible with local differential privacy (LDP) for both NDV estimation and privacy preservation. Our method supports efficient deletions with minimal additional overhead by utilizing a single discrete uniformly distributed random variable. Additionally, we introduce a lightweight probabilistic estimation model to compute NDV, achieving 3× faster performance compared to the state-of-the-art. By incorporating carefully designed sketch perturbation mechanisms, our model mitigates the impact of LDP noise. Experimental results demonstrate that our method uses 1/3 of the memory to achieve comparable estimation accuracy in local settings and provides 8× higher accuracy under LDP scenarios compared to state-of-the-art methods.