WWW2026
Decoding Web Memorization: A Semantic Membership Inference Attack on LLMs
Zhiyao Wu, Zi Liang, Haibo Hu
Abstract
While Large Language Models (LLMs) are increasingly deployed in Web applications such as search, dialogue, and recommendation systems, their reliance on large-scale Web data raises serious privacy concerns, particularly the risk of memorizing sensitive content. Existing Membership Inference Attacks (MIA) rely heavily on the surface form of inputs, rendering them ineffective against semantically preserved but structurally altered samples. This methodological weakness results in widespread false negatives and compromises the integrity of privacy evaluations in large-scale Web corpora. To address this limitation, we propose Adversarial Semantic Membership Inference Attack (ASMIA). ASMIA enhances MIA effectiveness by generating semantically diverse adversarial samples, extracting multi-layer attention features from the target model, and training a contrastive classifier that leverages similarity metrics and logarithmic probabilities to distinguish members from non-members. Experiments on LLMs trained with Wikipedia, a representative large-scale Web corpus, demonstrate that ASMIA significantly outperforms existing methods, highlighting the value of semantic perturbations and attention patterns in detecting training data leakage.