ACL2025

SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling

Haoran Wang, Zhenyu Hou, Yao Wei, Jie Tang, Yuxiao Dong

被引用 13 次

摘要

Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered SWE systems, such as Ope-nAI Codex and Cursor, have offered end-to-end automation of the software development process. However, building effective SWE agents remains challenging due to the lack of highquality training data and reliable test-time evaluation. To address this issue, we present SWE-DEV, an SWE agent built upon open-source LLMs, with a focus on training and inference scaling. For training scaling, we develop a robust pipeline to synthesize test cases and scale up agent trajectories to construct the training data. For inference scaling, we increase the interaction budget within a single run to enable further thinking within one independent attempt. Experiments on the SWE-bench-Verified benchmark show that the SWE-DEV models can achieve top performance among all open SWE agents. Specifically, the resolve rate of our 7B and 32B models reach 23.4% and 36.6%, respectively, outperforming state-of-the-art open-source models. All code, models, and datasets are publicly available at https://github.com/THUDM/SWE-Dev .