CCS2025

Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges

Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Shuai Wang

Abstract

Capture-the-Flag (CTF) competitions are crucial for cybersecurity education and training. With the evolution of large language models (LLMs), there is growing interest in their ability to automate CTF challenge solving, with DARPA's AIxCC competition (since 2023) being a notable example. However,this demands a combination of multiple abilities of LLMs, from knowledge to reasoning and further to actions. In this paper, we highlight the importance of technical knowledge in solving CTF problems and deliberately construct a focused benchmark, CTFKnow, with 3,992 questions to measure LLMs' performance in this core aspect. Our study offers a focused and innovative measurement of LLMs' capability in understanding CTF knowledge and applying it to solve CTF challenges. Our key findings reveal that while LLMs possess substantial technical knowledge, they struggle to apply it accurately to specific scenarios and adapt based on feedback from CTF environments.