CCS2025

GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search

Matan Ben-Tov, Mahmood Sharif

摘要

Dense embedding-based text retrieval—retrieval of relevant passages from corpora via deep learning encodings—has emerged as a powerful method attaining state-of-the-art search results and popularizing Retrieval Augmented Generation (RAG). Still, like other search methods, embedding-based retrieval may be susceptible to search-engine optimization (SEO) attacks, where adversaries promote malicious content by introducing adversarial passages to corpora. Prior work has shown such SEO is feasible, mostly demonstrating attacks against retrieval-integrated systems (e.g., RAG). Yet, these consider relaxed SEO threat models (e.g., targeting single queries), use baseline attack methods, and provide small-scale retrieval evaluation, thus obscuring our comprehensive understanding of retrievers' worst-case behavior.