ACL2025

What Makes a Good Natural Language Prompt?

Do Xuan Long, Duy Dinh, Ngoc-Hai Nguyen, Kenji Kawaguchi, Nancy F. Chen, Shafiq Joty, Min-Yen Kan

被引用 13 次

摘要

As large language models (LLMs) have progressed towards more human-like and human-AI communications prevalent, prompting has emerged as a decisive component. However, there is limited conceptual consensus on what exactly quantifies natural language prompts. We attempt to address this question by conducting a meta-analysis surveying 150+ promptingrelated papers from leading NLP and AI conferences (2022-2025), and blogs. We propose a property-and human-centric framework for evaluating prompt quality, encompassing 21 properties categorized into six dimensions. We then examine how existing studies assess their impact on LLMs, revealing their imbalanced support across models and tasks, and substantial research gaps. Further, we analyze correlations among properties in high-quality natural language prompts, deriving prompting recommendations. We then empirically explore multiproperty prompt enhancements in reasoning tasks, observing that single-property enhancements often have the greatest impact. Finally, we discover that instruction-tuning on propertyenhanced prompts can result in better reasoning models. Our findings establish a foundation for property-centric prompt evaluation and optimization, bridging the gaps between human-AI communication and opening new prompting research directions 1 . * Equal contribution. Works done during the internship at WING, NUS. 1 Our codes and data will be made publicly available at here. Prompt quality evaluation We begin our study by conducting a comprehensive survey of over 150 papers and blogs. Our methodology is straightforward: we first examine papers published in ACL, EMNLP, NAACL from ACL Anthology 2 , and ICLR, and NeurIPS on 2 https: