KDD2025

A Survey on Unifying Large Language Models and Knowledge Graphs for Biomedicine and Healthcare

Ran Xu, Patrick Jiang, Linhao Luo, Cao Xiao, Adam Cross, Shirui Pan, Jimeng Sun, Carl Yang

5 citations

Abstract

In recent years, the landscape of digital biomedicine and healthcare has been reshaped due to the disruptive breakthroughs in AIfacilitated by tremendous data and high-performance computers, large language models (LLMs) have transformed information technology from accessing data to performing analytical tasks.While demonstrating unprecedented capabilities, LLMs have been found unreliable in tasks requiring factual knowledge and rigorous reasoning.Biomedicine and healthcare, as an important vertical domain rapidly benefitting from progress in AI, necessitates strict requirements on the accuracy, controllability, and interpretability of analytical models, posing critical challenges for LLMs.Despite recent studies addressing the hallucination problem of LLMs, research on empowering LLMs with the ability to plan, reason, and ground with explicit knowledge has also started to prosper, especially in the biomedicine and healthcare domain.On the other hand, biomedical data are enormous and notoriously complex, coming from various sources (e.g., biomedical knowledge bases, online literature, and hospitals) and bearing various modalities (e.g., tables, texts, images and time-series).Healthcare professionals have spent decades collecting, cleaning, and curating various types of data.The processes are extremely costly, producing various datasets with different data schemas, coding systems, and quality standards, many privately