WWW2025
Semantics-Aware Cookie Purpose Compliance
Baiqi Chen, Jiawei Lyu, Tingmin Wu, Mohan Baruwal Chhetri, Guangdong Bai
Abstract
In response to stringent data protection regulations, websites typically display a cookie banner to inform users about the usage and purposes of cookies, seeking their explicit consent before installing any cookies into their browsers. However, a systematic approach for reliably assessing compliance between the website-declared purpose and the semantic-intended purpose of cookies (denoted as potential cookie purpose violation) has been notably absent. Websites may still, whether intentionally or unintentionally (e.g., due to third-party libraries imported), mis-declare cookies that may be abused for tracking purposes. We address this gap with Coover (cookie value examiner). We advocate that the value of the cookie is a more reliable indicator of its semantic-intended purpose compared to other features, such as expires and meta-information, which can be easily obfuscated. Coover decomposes the cookie value into primitive segments representing minimal semantic units, and fine-tunes a GPT-3.5 model to automatically interpret their value-inferred semantics. Based on the interpretation, it classifies cookies into four GDPR-defined purposes. We benchmark Coover against two widely-used content management providers (CMPs) i.e., CookiePedia and Cookie Script, and the state-of-the-art cookie classifier named CookieBlock. It achieves an F1 score of 95%, significantly outperforming other methods. To understand the status quo of potential cookie purpose violation on the web, we employ Coover to analyze Alexa Top 1k websites. Remarkably, out of 15,339 cookies across these websites, only 3.1% quality as truly necessary cookies, while 44.1% of websites suffer from issues of potential purpose violation. Our work serves as a wake-up call to web service providers and encourages further regulatory interventions to rectify non-compliance issues within the web infrastructure.