ICLR2026

Cyber-Zero: Training Cybersecurity Agents without Runtime

Terry Yue Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang

12 citations

Abstract

Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present CYBER-ZERO, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. CYBER-ZERO leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by CYBER-ZERO, we train LLMbased agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best model, CYBER-ZERO-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents. https://github.com/amazon-science/cyber-zero C l a u d e -3 . 7 -S o n n e t C l a u d e -3 . 5 -S o n n e t D e e p S e e k -V 3 -0 3 2 4 G e m i n i -2 . 5 -F l a s h Q w e n 3 -3 2 B Q w e n 3 -1 4 B Q w e n 3 -8 B 0 20 40 60 80 InterCode-CTF C l a u d e -3 . 7 -S o n n e t C l a u d e -3 . 5 -S o n n e t G e m i n i -2 . 5 -F l a s h D e e p S e e k -V 3 -0 3 2 4 * Work done during an internship at Amazon.