KDD2025

Struct-X: Enhancing the Reasoning Capabilities of Large Language Models in Structured Data Scenarios

Xiaoyu Tan, Haoyu Wang, Xihe Qiu, Leijun Cheng, Yuan Cheng, Wei Chu, Yinghui Xu, Yuan Qi

3 citations

Abstract

Conducting reasoning tasks with large language models (LLMs) on structured and redundant data poses significant challenges, primarily due to the complexity introduced by the structured markdown tokens and the presence of extraneous contextual information. These elements can overburden and disrupt the generation process of LLMs, complicating the extraction of relevant insights and the production of coherent outputs. To address this, we propose Struct-X, a novel framework that operates through five key phases: ''read-model-fill-reflect-reason'' efficiently enabling LLMs to utilize structured data. It begins by encoding structured data into a topological space using graph embeddings, followed by filling in missing entity information with knowledge retrieval modules, and filtering out irrelevant tokens via a self-supervised module. The final phase involves constructing a topological network with selected tokens to further reduce the total token length for more effective LLM inference. Additionally, Struct-X includes an Auxiliary Module trained to generate prompts, aiding LLMs in analyzing structured data. Extensive experiments on open-source benchmarks, including the knowledge graph question-answer task and the long document reading comprehension task, show that Struct-X notably improves LLM reasoning in complex structured input context. Finally, we deployed Struct-X in a real-world financial report analysis task, where it exhibits enhanced reasoning capabilities when applied to authentic scenario. The code has been undergoing open-source development to facilitate easy replication.