WWW2026

Weighted Reservoir Sampling with Replacement from Data Streams

Adriano Meligrana, Adriano Fazzone

1 citation

Abstract

In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement, although most of the literature on this topic has focused on sampling without replacement. Our algorithm generates a weighted random sample in one pass over a population of unknown size. At any point in time, the sample is representative of the population seen so far and can be directly used by other modules without requiring any post-processing. We formally prove the correctness and efficiency of our method. An experimental analysis shows the performance of our method in practice when compared to state-of-the-art methods.