KDD2022
HyperLogLogLog: Cardinality Estimation With One Log More
Matti Karppa, Rasmus Pagh
24 citations
Abstract
We present HyperLogLogLog, a practical compression of the Hy-perLogLog sketch that compresses the sketch from ๐ (๐ log log ๐) bits down to ๐ log 2 log 2 log 2 ๐ + ๐ (๐ + log log ๐) bits for estimating the number of distinct elements ๐ using ๐ registers. The algorithm works as a drop-in replacement that preserves all estimation properties of the HyperLogLog sketch, it is possible to convert back and forth between the compressed and uncompressed representations, and the compressed sketch maintains mergeability in the compressed domain. The compressed sketch can be updated in amortized constant time, assuming ๐ is sufficiently larger than ๐. We provide a C++ implementation of the sketch, and show by experimental evaluation against well-known implementations by Google and Apache that our implementation provides small sketches while maintaining competitive update and merge times. Concretely, we observed approximately a 40% reduction in the sketch size. Furthermore, we obtain as a corollary a theoretical algorithm that compresses the sketch down to ๐ log 2 log 2 log 2 log 2 ๐ + ๐ (๐ log log log ๐/log log ๐ + log log ๐) bits. CCS CONCEPTS โข Information systems โ Data management systems; โข Theory of computation โ Design and analysis of algorithms.