USENIX Security2026
From Texts to Rules: Generating Sigma Rules with Large Language Models from Cyber Threat Reports
Yongxin Cai, Jing Qiu, Qingming Li, Du Cheng, Lei Chen
Abstract
Cyber Threat Reports (CTRs) deliver actionable intelligence essential for security systems detection rules. Large language models (LLMs) could serve as a bridge for CTRs-to-Rules translation through parsing and generation capabilities. However, the semantic disconnect and domain-specific constraints between high-level abstractions in CTRs and low-level machine semantics in rules fundamentally impede accurate detection rules generation. In this paper, we demonstrate that shell commands in CTRs can be effectively converted into Sigma detection rules for security systems. To this end, we propose SIGMERGE, an end-to-end framework that generates Sigma rules from texts of CTRs by constructing a semantic intermediate layer as a bridge. The SIGMERGE framework hierarchically organizes three modules by descending semantic levels: (1) The Information extraction module, high-level, utilizes a multi-subsequence algorithm and a fine-tuned domain-specific LLM, enabling accurate MITRE ATT&CK tactics, techniques, and procedures (TTPs) and command extractions; (2) The Attack description generation module, intermediate-level, employs preference optimization tuning with closed-loop self-validation to mitigate the semantic disconnect; (3) The Sigma rule generation module, machine-level, leverages a parameter-optimized retrieval algorithm to address domain-specific constraints. We constructed 7 datasets for training and conducted extensive experiments. To validate SIGMERGE, we evaluated it using 23 metrics against 16 baselines and 13 LLMs, and conducted 10 case studies integrated with real security systems to demonstrate both effectiveness and efficiency. Moreover, SIGMERGE has already contributed 4 novel Sigma rules to the official repository, all of which have been formally accepted.