WWW2025

MGF-ESE: An Enhanced Semantic Extractor with Multi-Granularity Feature Fusion for Code Summarization

Xiaolong Xu, Yuxin Cao, Hongsheng Hu, Haolong Xiang, Lianyong Qi, Junqun Xiong, Wanchun Dou

被引用 4 次

摘要

Code summarization aims to generate concise natural language descriptions of source code, helping developers to acquaint with software systems and reduce maintenance costs. Existing code summarization approaches widely employ attention mechanisms to assess the relevance between nodes in the Abstract Syntax Tree (AST), which generates context vectors that reflect the semantics of the source code. However, these approaches solely relying on AST lack the extraction of features at other levels of granularity, such as code tokens and Control Flow Graph (CFG), which suffer from severe semantic gaps when capturing data and control dependencies. To address this issue, we design an enhanced semantic extractor with multi-granularity feature fusion (MGF-ESE) to improve the model capability in comprehending and processing the overall semantics of the code. Specifically, we present a novel AST generation method that, based on controlling the scale of nodes, introduces syntactic description nodes to raise the semantic density of AST feature. Then we perform both local and global encoding of CFG after embedding the statement nodes. Moreover, through a cross-attention mechanism, we fuse code tokens and CFG with AST to enhance the model's capacity to capture both syntactic and structural information from source code. Finally, extensive experiments on two open-source datasets show that MGF-ESE outperforms the state-of-the-arts with higher-quality code summaries on key metrics, including BLEU, METEOR, and ROUGE-L.