KDD2025

Measuring Item Freshness in Data Streams

Zirui Liu, Zihan Jiang, An Zhang, Zhouran Shi, Yuxuan Tian, Tong Yang

摘要

This paper studies an unexplored attribute in data streams - item freshness. The freshness of an item refers to the time interval between its last arrival and the present moment. The information of item freshness is useful in various scenarios like cache, online advertising, computer network, etc. Currently, there is no algorithm tailored for estimating item freshness. We propose a theoretically guaranteed sketch algorithm called RingSketch, which integrates time-agnostic sketch algorithm with time-aware CLOCK algorithm for real-time freshness measurement. With the key idea of tracing the trajectory of the clock pointer, the estimation process of RingSketch is akin to observing the length of the growth rings in a tree trunk. We theoretically derive the average error of RingSketch and validate it with extensive experiments. The results show that RingSketch simultaneously achieves high accuracy (<10-3 average relative error) and fast update speed (>11.4 M/s), outperforming the baseline solutions by at least 13.3x and 1.5x respectively. All codes are open-sourced at GitHub.