WWW2026

TGNN: Enhancing Pixel Tracking Detection via LLM-driven Annotation and GAT-powered Structural Representation

Shenping Xiong, Xutong Wang, Ze Jin, Xinyu Liu, Haoqiang Wang, Zhen Chen, Ru Tan, Qixu Liu

摘要

Web tracking is increasingly pervasive, raising serious concerns about user privacy and security. Among existing techniques, pixel tracking is particularly stealthy and cost-effective, embedding invisible images that exfiltrate user activities to third-party servers. Current defenses, including filter list blocking and conventional machine learning, often fail to capture the cross-site associations that enable pixel tracking to evade detection. To address this limitation, we introduce TGNN, a framework that formulates pixel tracking detection as an edge classification task on a Tracking Directed Graph (TDG), which models third-party associations across websites. TGNN encodes HTTP traffic into structured quadruples and learns both semantic features and interaction patterns. To overcome the scarcity of reliable labels, we propose a large language model (LLM)-based annotation method that leverages minimal expert supervision to produce high-quality labels, significantly improving detection. Experiments conducted on traffic from the Alexa top-10K websites demonstrate that TGNN substantially outperforms existing baselines, while the LLM-based annotation achieves accuracy comparable to expert curation. Our large-scale measurement reveals that at least 16.74% of websites engage in pixel tracking via major third-party infrastructures, establishing cross-domain tracking as a pervasive practice in the wild and indicating a potential privacy threat in the modern Web ecosystem.