ACL2024

Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation

Pablo Messina, René Vidal, Denis Parra, Alvaro Soto, Vladimir Araujo

5 citations

Abstract

Advancing representation learning in specialized fields like medicine remains challenging due to the scarcity of expert annotations for text and images. To tackle this issue, we present a novel two-stage framework designed to extract high-quality factual statements from freetext radiology reports in order to improve the representations of text encoders and, consequently, their performance on various downstream tasks. In the first stage, we propose a Fact Extractor that leverages large language models (LLMs) to identify factual statements from well-curated domain-specific datasets. In the second stage, we introduce a Fact Encoder (CXRFE) based on a BERT model fine-tuned with objective functions designed to improve its representations using the extracted factual data. Our framework also includes a new embedding-based metric (CXRFEScore) for evaluating chest X-ray text generation systems, leveraging both stages of our approach. Extensive evaluations show that our fact extractor and encoder outperform current state-of-the-art methods in tasks such as sentence ranking, natural language inference, and label extraction from radiology reports. Additionally, our metric proves to be more robust and effective than existing metrics commonly used in the radiology report generation literature. The code of this project is available at https://github. com/PabloMessina/CXR-Fact-Encoder . Comparison: Chest radiographs XXXX. Indication: XXXX-year-old male, chest pain. Findings: The cardiomediastinal silhouette is within normal limits for size and contour. The lungs are normally inflated without evidence of focal airspace disease, pleural effusion, or pneumothorax. Stable calcified granuloma within the right upper lung. No acute bone abnormality. Impression: No acute cardiopulmonary process. CXRFEScore: 1.000 RadGraph F1 Full: 0.750 CheXpert Acc: 1.0 CheXbert Acc: 1.0 new PICC line on the right. PICC line tip projecting in the mediastinum. potential arterial location crossing the midline. repeat PA and lateral radiograph taken approximately an hour after the previous radiograph. PICC line observed in the mid SVC. potential small right pleural effusion. stable moderate cardiomegaly CXRFEScore: 0.891 RadGraph F1 Full: 0.899 CheXpert Acc: 1.0 CheXbert Acc: 1.0 new PICC line on the right. PICC line tip in the mediastinum. appears to cross the midline. concern for potential arterial location. PA radiograph. lateral radiograph. PICC line in the mid SVC. potential small right pleural effusion. stable cardiomegaly CXRFEScore: 0.966 RadGraph F1 Full: 0.813 CheXpert Acc: 1.0 CheXbert Acc: 1.0 CheXbert CheXpert labeler Chest ImaGenome the heart is enlarged. the cardiomediastinal silhouette is enlarged. no focal consolidation. the lungs are free of focal airspace disease. no atelectasis. a device is seen. pleural effusion is seen. no fibrosis. no pneumonia. no pneumothorax is seen. no pulmonary edema. no pulmonary nodules or mass lesions identified. no fracture is seen CXRFEScore: 0.481 RadGraph F1 Full: 0.017 CheXpert Acc: 1.0 CheXbert Acc: 1.0 the heart is enlarged. the cardiomediastinal silhouette is enlarged. no focal consolidation. the lungs are free of focal airspace disease. no atelectasis. a device is seen. pleural effusion is seen. no fibrosis. no pneumonia. no pneumothorax is seen. no pulmonary edema. no pulmonary nodules or mass lesions identified. no fracture is seen CXRFEScore: 0.481 RadGraph F1 Full: 0.017 CheXpert Acc: 1.0 CheXbert Acc: 1.0 enlarged cardiac silhouette in cardiac silhouette. abnormal cardiac silhouette. picc in left shoulder. picc in mediastinum. lung opacity in right costophrenic angle. pleural effusion in right costophrenic angle. abnormal right costophrenic angle. lung opacity in right lung. pleural effusion in right lung. abnormal right lung. picc in right shoulder. picc in svc. enlarged cardiac silhouette. lung opacity. pleural effusion. picc