WWW2025
How much Medical Knowledge do LLMs have? An Evaluation of Medical Knowledge Coverage for LLMs
Ziheng Zhang, Zhenxi Lin, Yefeng Zheng, Xian Wu
7 citations
Abstract
Previous evaluation frameworks for large language models (LLMs) have mostly relied on existing question-answering benchmarks, which are primarily task-oriented rather than knowledge-oriented. In the medical domain, however, the effective deployment of LLMs necessitates a thorough evaluation of their medical knowledge coverage. To this end, we propose a systematic evaluation framework, MedKGEval, to assess the coverage of medical knowledge in LLMs through the lens of medical knowledge graphs (KGs). MedKGEval transforms various levels of knowledge (entity-level, relation-level, and subgraph-level) from the medical KG into distinct groups of question-answer pairs, which serve as comprehensive evaluation benchmarks. In addition to traditional task-oriented evaluations, MedKGEval introduces a novel knowledge-oriented evaluation approach that encompasses the assessment of knowledge coverage across entities, relations, and triples. This multi-aspect evaluation approach allows for a more nuanced understanding of LLMs' knowledge coverage in the medical context. Using these benchmarks, we conduct a systematic evaluation of 11 LLMs from multiple perspectives, revealing insights into their strengths and weaknesses in medical knowledge memorization and reasoning.