ACL2025

SWE-Dev: Building Software Engineering Agents with Training and Inference Scaling

Haoran Wang, Zhenyu Hou, Yao Wei, Jie Tang, Yuxiao Dong

13 citations

Abstract

Large language models (LLMs) have advanced rapidly from conversational problem solving to addressing real-world tasks involving tool use, such as software engineering (SWE). Recent LLM-powered SWE systems, such as Ope-nAI Codex and Cursor, have offered end-to-end automation of the software development process. However, building effective SWE agents remains challenging due to the lack of highquality training data and reliable test-time evaluation. To address this issue, we present SWE-DEV, an SWE agent built upon open-source LLMs, with a focus on training and inference scaling. For training scaling, we develop a robust pipeline to synthesize test cases and scale up agent trajectories to construct the training data. For inference scaling, we increase the interaction budget within a single run to enable further thinking within one independent attempt. Experiments on the SWE-bench-Verified benchmark show that the SWE-DEV models can achieve top performance among all open SWE agents. Specifically, the resolve rate of our 7B and 32B models reach 23.4% and 36.6%, respectively, outperforming state-of-the-art open-source models. All code, models, and datasets are publicly available at https://github.com/THUDM/SWE-Dev .