ICML2025

From Complex to Atomic: Enhancing Augmented Generation via Knowledge-Aware Dual Rewriting and Reasoning

Jinyu Wang, Jingjing Fu, Rui Wang, Lei Song, Jiang Bian

摘要

Recent advancements in Retrieval-Augmented Generation (RAG) systems have significantly enhanced the capabilities of large language models (LLMs) by incorporating external knowledge retrieval. However, the sole reliance on retrieval is often inadequate for mining deep, specialized knowledge and performing the logical reasoning necessary to tackle domain-specific complex questions. To address these challenges, we present an approach, which is designed to extract, comprehend, and utilize specialized knowledge in an atomic manner while simultaneously constructing a coherent rationale. At the heart of our approach lie four pivotal components: a knowledge atomizer that extracts atomic tags from raw data, a query proposer that generates subsequent questions to facilitate the original inquiry, an atomic retriever that locates knowledge based on atomic knowledge alignments, and an atomic selector that determines which atomic tag and chunk pair to query, guided by the retrieved information. Through this approach, we implement a knowledge-aware task decomposition strategy that iteratively builds the rationale in alignment with the initial question and the acquired knowledge. We conduct comprehensive experiments to demonstrate the efficacy of our approach across various benchmarks, particularly those requiring multihop reasoning steps. A substantial performance improvement of up to +10.1 (20.4%) over the second-best method underscores the potential of the approach in complex, knowledgeintensive applications. The code is publicly available at https://github.com/microsoft/PIKE-RAG .