ASE2025

On Effectiveness of Formal Model Repair by Large Language Models

Sebastião Carvalho, Tsutomu Kobayashi, Fuyuki Ishikawa

摘要

The use of formal methods is a significant contribution to developing trustworthy software; however, it can be a complex task. For this, automation with generative artificial intelligence models, such as Large Language Models (LLMs), is considered a promising approach. We studied the use of LLMs to generate repairs for faulty formal models of the Event-B formalism. To repair faulty Event-B models, we propose a System Prompt that contains constraints on how to suggest repairs that respect the syntax and rules of the Event-B language. We also propose Retry Prompts, a type of prompt that aims to refine a repair suggested by an LLM by highlighting errors in previous responses. To evaluate our method, we developed a tool that generates faulty models (mutants) from existing correct models by removing a single action or guard predicate. The tool then interacts with an LLM to obtain a suggested repair for the mutant model. After modifying the model according to the suggestions from the LLM, we evaluate the correctness of the modified model. The results demonstrate that using Retry Prompts significantly increases the success rate of the suggested repairs, with over 80% of the faulty models in our dataset being successfully repaired. The results also indicated directions of possible future improvements, such as combining Generative AI with formal approaches to repair failing cases.