ICSE2025

Large Language Models for Safe Minimization

Aashish Yadavally, Xiaokai Rong, Phat Nguyen, Tien N. Nguyen

Abstract

Several tasks in program analysis, verification, and testing are modeled as constraint solving problems, utilizing SMT solvers as the reasoning engine. In this work, we aim to investigate the reasoning capabilities of large language models (LLMs) toward reducing the size of an infeasible string constraint system by exploiting inter-constraint interactions such that the remaining ones are still unsatisfiable. We term this safe minimization. Motivated by preliminary observations of hallucination and error propagation in LLMs, we design SafeMin, a framework leveraging an LLM and SMT solver in tandem to ensure a safe and correct minimization. We test the applicability of our approach on string benchmarks from LeetCode in the computation of minimal unsatisfiable subsets (MUSes). We observed that SafeMin helps safely minimize 94.3% of these constraints, with an average minimization ratio of 98% relative to the MUSes. In addition, we assess SafeMin's capabilities in partially enumerating non-unique MUSes, which is baked into our approach via a “sample-and-enumerate” decoding strategy. Overall, we captured 42.1% more non-unique MUSes than without such LLM-based macro-reasoning. Finally, we demonstrate SafeMin's usefulness in detecting infeasible paths in programs.