ACL2025

Exploring LLM Annotation for Adaptation of Clinical Information Extraction Models under Data-sharing Restrictions

Seiji Shimizu, Shohei Hisada, Yutaka Uno, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

2 citations

Abstract

In-hospital text data contains valuable clinical information, yet deploying fine-tuned small language models (SLMs) for information extraction remains challenging due to differences in formatting and vocabulary across institutions. Since access to the original in-hospital data (source domain) is often restricted, annotated data from the target hospital (target domain) is crucial for domain adaptation. However, clinical annotation is notoriously expensive and time-consuming, as it demands clinical and linguistic expertise. To address this issue, we leverage large language models (LLMs) to annotate the target-domain data for the adaptation. We conduct experiments on four clinical information extraction tasks, including eight targetdomain datasets. Experimental results show that LLM-annotated data consistently enhances SLM performance and, with a larger number of annotated data, outperforms manual annotation in three out of four tasks 1 .