ACL2025

TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text

Ahmed Lekssays, Utsav Shukla, Husrev Taha Sencar, Md. Rizwan Parvez

摘要

Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations-such as custom hard-negative mining and denoising-resources rarely available in specialized domains. We propose TECHNIQUERAG, a domain-specific retrievalaugmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. First, our approach mitigates data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing resource-intensive retrieval training. Second, although conventional RAG mitigates hallucination by coupling retrieval and generation, its dependence on generic retrievers often introduces noisy candidates, thereby limiting domain-specific precision. To address, we enhance the retrieval quality and domain specificity through a zero-shot LLM re-ranking that explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TECHNIQUERAG achieves state-of-the-art performances without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.