ASE2025
Enhancing LLMs with Staged Grouping and Dehallucination for Header File Decomposition
Yue Wang, Jiaxuan Sun, Yanzhen Zou, Bing Xie
Abstract
God Header Files, large header files included by numerous other code files, present significant challenges for code comprehension and maintenance. Existing approaches leverage various code similarity metrics to decompose them, but these metrics do not always capture the code’s functional essence accurately. Large Language Models (LLMs), with their advanced capabilities in code understanding and generation, offer a promising alternative for producing more effective refactorings. However, LLMs face three critical limitations that hinder practical application: they struggle with lengthy header files due to token constraints, suffer from hallucination by generating incomplete or spurious results, and produce cyclic dependencies that violate architectural principles and cause compilation failures. To address these challenges, we propose HFDecomposer, a hybrid approach that enhances LLMs with staged grouping and dehallucination techniques to effectively decompose header files. Our approach introduces a two-stage grouping framework for lengthy header files: it first groups strongly related code entities using traditional similarity metrics, then feeds group summaries to the LLM for higher-level semantic aggregation. To mitigate LLM hallucinations, we enhance prompts with factual knowledge extracted from static analysis, detect errors in LLM output, and make necessary corrections by reassigning missing entities and resolving cyclic dependencies. Our evaluation on real-world header file decomposition refactorings demonstrates that our method effectively overcomes the limitations of purely LLM-based techniques and outperforms the traditional state-of-the-art approach by 11%, delivering more accurate and reliable decomposition results. Our approach enables LLMs to handle lengthy header files efficiently, significantly reduces hallucinations, and ensures the reliability and practicality of the final decomposition.