KDD2025
DIPS: Optimal Dynamic Index for Poisson πps Sampling
Jinchao Huang, Sibo Wang
Abstract
This paper addresses the Poisson πps sampling problem, a topic of significant academic interest in various domains and with practical data mining applications, such as influence maximization. The problem includes a set S of n elements, where each element v is assigned a weight w(v) reflecting its importance. The goal is to generate a random subset X of S, where each element v ∈ S is included in X independently with probability c ⋅ w(v)over ∑v ∈ S w(v), where 0 < c≤ 1 is a constant. The subsets must be independent across different queries. While the Poisson πps sampling problem can be reduced to the well-studied subset sampling problem, updates in Poisson πps sampling, such as adding a new element or removing an element, would cause the probabilities of all n elements to change in the corresponding subset sampling problem, making this approach impractical for dynamic scenarios. To address this, we propose a dynamic index specifically tailored for the Poisson πps sampling problem, supporting optimal expected O (1) query time and O (1) index update time, with an optimal O (n) space cost. Our solution involves recursively partitioning the set by weights and ultimately using table lookup. The core of our solution lies in addressing the challenges posed by weight explosion and correlations between elements. Empirical evaluations demonstrate that our approach achieves significant speedups in update time while maintaining consistently competitive query time compared to the subset-sampling-based methods.