ICML2024
PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation
Han Fu, Jian Tan, Pinhan Zhang, Feifei Li, Jianling Sun
摘要
Automatically generating high quality code descriptions greatly improves the readability and maintainability of the codebase. Recently, retrieval augmented code-to-text approaches have proven to be an effective solution, and have achieved the state-of-the-art results on various benchmarks. It brings out the potential to leverage large unlabeled code descriptions to further improve the generation quality. In spite of the promising performance, retrieval-augmented models however suffer from being deluded by inconducive retrieved references, due to irrelevant or even misleading information contained therein. To this end, we design PinNet, a new framework for code-to-text generation. PinNet relies on a discriminator to measure how well the retrievals match the semantics of the input code. Remarkably, the hidden representation of the reference from the last layer of the discriminator can be leveraged to significantly improve the codeto-text generation through modifying the attention weights. It essentially pays high attention to valuable information and eliminates misleading part. To effectively execute this idea, we also propose a novel contrastive learning method to quantify the semantical similarities between unlabeled references. Using extensive experiments on code summarization and SQL-to-text generation, we demonstrate that the proposed method can significantly outperform all of the baselines.