WWW2026

Med-R2: Crafting Trustworthy LLM Physicians via Retrieval and Reasoning of Evidence-Based Medicine

Keer Lu, Zheng Liang, Da Pan, Shusen Zhang, Guosheng Dong, Huang Leng, Bin Cui, Zhonghai Wu, Wentao Zhang

被引用 5 次

摘要

Large Language Models (LLMs) have exhibited remarkable capabilities in clinical scenarios. Despite their potential, existing works face challenges when applying LLMs to medical settings. Strategies relying on training with medical datasets are highly cost-intensive and may suffer from outdated training data. Leveraging external knowledge bases is a suitable alternative, yet it faces obstacles such as limited retrieval precision and poor effectiveness in answer extraction. These issues collectively prevent LLMs from demonstrating the expected level of proficiency in mastering medical expertise. To address these challenges, we introduce Med-R2, a novel LLM physician framework that adheres to the Evidence-Based Medicine (EBM) process, efficiently integrating retrieval mechanisms as well as the selection and reasoning processes of evidence, thereby enhancing the problem-solving capabilities of LLMs in healthcare scenarios and fostering a trustworthy LLM physician. Our comprehensive experiments indicate that Med-R2 achieves an improvement of 13.27% over vanilla RAG methods and even a 4.55% enhancement compared to fine-tuning strategies, without incurring additional training costs. Furthermore, we find that our LLaMA3.1-70B + Med-R2 surpasses frontier models, including GPT-4o, Claude3.5-Sonnet and DeepSeek-V3 by 1.05%, 6.14% and 1.91%. Med-R2 effectively enhances the capabilities of LLMs in the medical domain.