WWW2026

Multistage Feedback-Driven Causal Discovery from Textual Data with Large Language Models

Juntao Yang, Dayuan Cao, Kui Yu, Xiang Wang, Jing Yang, Lin Liu, Jiuyong Li

Abstract

Revealing the underlying causal mechanisms in the real world is critical for scientific and technical progress. Despite advancements over the past decades, the lack of high-quality data and the inability of traditional causal discovery algorithms (TCDA) to fully comprehend the exact semantics of variables have long been major obstacles to the broader application of causal discovery. To address this issue, this paper proposes a novel causal modeling framework, LLM-CD, which integrates the metadata-based reasoning capabilities of large language models (LLMs) with the data-driven modeling abilities of TCDA for causal discovery. LLM-CD deeply couples the reasoning abilities of LLMs at various stages of TCDA, and enhances causal discovery through an iterative process. Due to the issues of overconfidence and hallucination in LLMs, LLM-CD quantifies