FSE2024

ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification

Fangwen Mu, Lin Shi, Song Wang, Zhuohao Yu, Binquan Zhang, Chenxue Wang, Shichao Liu, Qing Wang

49 citations

Abstract

Large Language Models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in automatically generating code from provided natural language requirements. However, in real-world practice, it is inevitable that the requirements written by users might be ambiguous or insufficient. Current LLMs will directly generate programs according to those unclear requirements, regardless of interactive clarification, which will likely deviate from the original user intents. To bridge that gap, we introduce a novel framework named C larify GPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. Specifically, C larify GPT first detects whether a given requirement is ambiguous by performing a code consistency check. If it is ambiguous, C larify GPT prompts an LLM to generate targeted clarifying questions. After receiving question responses, C larify GPT refines the ambiguous requirement and inputs it into the same LLM to generate a final code solution. To evaluate our C larify GPT, we invite ten participants to use C larify GPT for code generation on two benchmarks: MBPP-sanitized and MBPP-ET. The results show that C larify GPT elevates the performance (Pass@1) of GPT-4 from 70.96% to 80.80% on MBPP-sanitized. Furthermore, to conduct large-scale automated evaluations of C larify GPT across different LLMs and benchmarks without requiring user participation, we introduce a high-fidelity simulation method to simulate user responses. The results demonstrate that C larify GPT can significantly enhance code generation performance compared to the baselines. In particular, C larify GPT improves the average performance of GPT-4 and ChatGPT across five benchmarks from 62.43% to 69.60% and from 54.32% to 62.37%, respectively. A human evaluation also confirms the effectiveness of C larify GPT in detecting ambiguous requirements and generating high-quality clarifying questions. We believe that C larify GPT can effectively facilitate the practical application of LLMs in real-world development environments.