WWW2026

StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning

Changwu Li, Tongjun Shi, Shuhao Zhang, Binbin Chen, Bingsheng He, Xiaofei Liao, Hai Jin

摘要

Modern web applications—ranging from personalized recommendation to real-time fraud detection—rely on AI models to deliver timely and personalized services, yet the underlying user interaction data arrives as massive and evolving streams. Stream Learning (SL) offers a natural paradigm for building adaptive models, but it struggles with challenges such as redundant training data and catastrophic forgetting, which can undermine long-term predictive performance. To address these issues, recent studies have explored data selection strategies like coreset selection and buffer update, typically implemented through rule-based or model-based methods. However, fixed selection rules hinder the adaptability of rule-based approaches to changing data distributions, while model-based methods often depend on costly per-sample gradients, leading to throttled updates and reduced coverage of informative samples. In this paper, we propose StreamFP, a lightweight SL framework that introduces fingerprints-a set of compact, learnable parameter vectors that summarize the model state. Fingerprints compute similarity scores to jointly guide coreset selection and buffer update, prioritizing informative incoming samples while retaining representative historical ones. A lightweight fingerprint attunement plugin further calibrates fingerprints using pre-trained ViT attention with negligible overhead, thereby improving accuracy while mitigating forgetting. Extensive experiments demonstrate that StreamFP consistently achieves superior accuracy and efficiency compared with state-of-the-art methods across diverse real-world datasets and varying data arrival rates.