WWW2026

Robust LLM-Based Website Fingerprinting under Dynamic Real-World Conditions

Xiyuan Zhao, Xinhao Deng, Tianyu Cui, Yixiang Zhang, Ke Xu, Qi Li

Abstract

Website Fingerprinting (WF) attacks aim to infer the websites visited by Tor users by analyzing patterns in encrypted network traffic. However, most existing WF attacks are evaluated on traffic collected in controlled environments with fixed configurations, failing to reflect the complexity and variability of real-world conditions. In practice, traffic is far more dynamic and diverse due to heterogeneous network conditions, the large number of subpages within individual websites, and continuous evolution of website content. These factors increase intra-class variability and induce temporal feature drift, which ultimately degrades the long-term effectiveness of existing attacks. In this paper, we propose TraVerse, an LLM-based representation learning framework designed to achieve robust WF attacks under real-world conditions. TraVerse applies architectural adaptation and large-scale fine-tuning on diverse unlabeled traffic to learn generalizable and resilient representations that remain effective in dynamic and evolving environments. Furthermore, TraVerse integrates a lightweight classifier atop the LLM-derived representations, enabling accurate website identification and efficient few-shot adaptation with minimal model updates. We prototype TraVerse and conduct comprehensive evaluations using real-user traffic. Experimental results show that TraVerse improves Accuracy@3 by an average of 176.3% and weighted F1 by 343.3% over state-of-the-art baselines, while maintaining strong performance throughout a three-month longitudinal evaluation.