KDD2025
Measuring Item Freshness in Data Streams
Zirui Liu, Zihan Jiang, An Zhang, Zhouran Shi, Yuxuan Tian, Tong Yang
Abstract
This paper studies an unexplored attribute in data streams - item freshness. The freshness of an item refers to the time interval between its last arrival and the present moment. The information of item freshness is useful in various scenarios like cache, online advertising, computer network, etc. Currently, there is no algorithm tailored for estimating item freshness. We propose a theoretically guaranteed sketch algorithm called RingSketch, which integrates time-agnostic sketch algorithm with time-aware CLOCK algorithm for real-time freshness measurement. With the key idea of tracing the trajectory of the clock pointer, the estimation process of RingSketch is akin to observing the length of the growth rings in a tree trunk. We theoretically derive the average error of RingSketch and validate it with extensive experiments. The results show that RingSketch simultaneously achieves high accuracy (<10-3 average relative error) and fast update speed (>11.4 M/s), outperforming the baseline solutions by at least 13.3x and 1.5x respectively. All codes are open-sourced at GitHub.