FSE2025
Doc2OracLL: Investigating the Impact of Documentation on LLM-Based Test Oracle Generation
Soneya Binta Hossain, Raygan Taylor, Matthew B. Dwyer
3 citations
Abstract
Code documentation is a critical artifact of software development, bridging human understanding and machinereadable code. Beyond aiding developers in code comprehension and maintenance, documentation also plays a critical role in automating various software engineering tasks, such as test oracle generation (TOG). In Java, Javadoc comments offer structured, natural language documentation embedded directly within the source code, typically describing functionality, usage, parameters, return values, and exceptional behavior. While prior research has explored the use of Javadoc comments in TOG alongside other information, such as the method under test, their potential as a stand-alone input source, the most relevant Javadoc components, and guidelines for writing effective Javadoc comments for automating TOG remain less explored.. In this study, we investigate the impact of Javadoc comments on TOG through a comprehensive analysis. We begin by fine-tuning 10 large language models using three different prompt pairs to assess the role of Javadoc comments alongside other contextual information. Next, we systematically analyze the impact of different Javadoc comment's components on TOG. To evaluate the generalizability of Javadoc comments from various sources, we also generate them using the GPT-3.5 model. We perform a thorough bug detection study using Defects4J dataset to understand their role in real-world bug detection. Our results show that incorporating Javadoc comments improves the accuracy of test oracles in most cases, aligning closely with ground truth. We find that Javadoc comments alone can achieve comparable or even better performance when using the implementation of MUT. Additionally, we identify that the description and the return tag are the most valuable components for TOG. Finally, our approach, when using only Javadoc comments, detects between 19% and 94% more real-world bugs in Defects4J than prior methods, establishing a new state-of-the-art. To further guide developers in writing effective documentation, we conduct a detailed qualitative study on when Javadoc comments are helpful or harmful for TOG. CCS Concepts: • Software and its engineering → Software testing and debugging; Documentation.