ACL2024

Selene: Pioneering Automated Proof in Software Verification

Lichen Zhang, Shuai Lu, Nan Duan

Abstract

Ensuring correctness is a pivotal aspect of software engineering. Among the various strategies available, software verification offers a definitive assurance of correctness. Nevertheless, writing verification proofs is resource-intensive and manpower-consuming, and there is a great need to automate this process. We introduce Selene in this paper, which is the first projectlevel automated proof benchmark constructed based on the real-world industrial-level operating system microkernel, seL4. Selene provides a comprehensive framework for end-toend proof generation and a lightweight verification environment. Our experimental results with advanced large language models (LLMs), such as GPT-3.5-turbo and GPT-4, highlight the capabilities of LLMs in the domain of automated proof generation. Additionally, our further proposed augmentations indicate that the challenges presented by Selene can be mitigated in future research endeavors. "Program testing can be used to show the presence of bugs, but never to show their absence." -Dahl et al.'s (1972) This process requires formal proofs to rigorously 040 demonstrate that the program satisfies the required 041 properties, which can be verified by the prover. 042 In general, software verification involves two 043 stages. ❶ The prerequisite specification stage trans-044 lates the required properties and the subject pro-045 gram into the formal language, creating a to-be-046 proved proposition stating that "the program meets 047 the properties", a.k.a., the specification. ❷ The 048 proof stage is supposed to generate proofs that 049 prove the above specification and can be formally 050 checked by the prover. Both stages consume sig-051 nificant resources and manpower, with the second 052 stage being particularly demanding. E.g., the seL4 053 operating system microkernel 2 , which has been 054 formally verified against strong functionality and 055 security properties, requires 7 person-months ded-056 icated to specification stage and 11 person-years 057 to proof stage for correctness verification, and the 058 amount of proof code in seL4 is even ten times 059 more than that of the microkernel implementation 060 itself (Klein et al., 2014). Therefore, in order to 061 promote provable software, automated software 062 verification, particularly automated proof, is highly 063 desirable. As an early exploratory effort, in this 064 paper, we explore to automate the major overhead. 065 Typically, automated proof in software verifica-066 tion is a conditional generation task from the speci-067 fication to the proof, involving reasoning capabil-068 ities. Fortunately, large language models (LLMs) 069 offer an opportunity, as they have demonstrated 070