NeurIPS2023

Multimodal Clinical Benchmark for Emergency Care (MC-BEC): A Comprehensive Benchmark for Evaluating Foundation Models in Emergency Medicine

Emma Chen, Aman Kansal, Julie Chen, Boyang Tom Jin, Julia Rachel Reisler, David A. Kim, Pranav Rajpurkar

被引用 28 次

摘要

We propose the Multimodal Clinical Benchmark for Emergency Care (MC-BEC), a comprehensive benchmark for evaluating foundation models in Emergency Medicine using a dataset of 100K+ continuously monitored Emergency Department visits from 2020-2022. MC-BEC focuses on clinically relevant prediction tasks at timescales from minutes to days, including predicting patient decompensation, disposition, and emergency department (ED) revisit, and includes a standardized evaluation framework with train-test splits and evaluation metrics. The multimodal dataset includes a wide range of detailed clinical data, including triage information, prior diagnoses and medications, continuously measured vital signs, electrocardiogram and photoplethysmograph waveforms, orders placed and medications administered throughout the visit, free-text reports of imaging studies, and information on ED diagnosis, disposition, and subsequent revisits. We provide performance baselines for each prediction task to enable the evaluation of multimodal, multitask models. We believe that MC-BEC will encourage researchers to develop more effective, generalizable, and accessible foundation models for multimodal clinical data. patients, or specific tasks, such as predicting mortality. This orientation to specific subgroups and tasks makes it difficult to compare and evaluate models across tasks and patient populations. To address these challenges, and to promote the development of robust and clinically useful foundation models for Emergency Medicine, we propose the Multimodal Clinical Benchmark for Emergency Care (MC-BEC), a comprehensive benchmark for evaluating foundation models in Emergency Medicine. MC-BEC is built on a dataset 1 of 102,731 monitored visits made by 63,389 unique patients between 2020 and 2022, and provides a unique opportunity to study acute care in the COVID-19 era. It is the only multimodal medical dataset that exclusively covers patients during this period, while also capturing a wide range of non-COVID pathology. This dataset covers a wide range of information for emergency department (ED) patients, including triage information, prior diagnoses and medications, orders placed in the ED, medication administrations, lab results, continuously monitored vital signs and physiologic waveforms, and free-text reports for radiology studies. With its emphasis on multiple modalities, including continuous waveforms and vital signs providing physiologic context for heterogeneous and often rapidly evolving patients, MC-BEC presents opportunities to study the uniquely dynamic and complex nature of emergency care. MC-BEC emphasizes clinically relevant downstream tasks at multiple timescales, specifically predicting patient decompensation (within minutes), disposition (within hours), and ED revisit (within days), and provides a standardized evaluation framework with train-test splits and evaluation metrics. We also provide baselines for each task to enable model comparison and evaluation. With MC-BEC, we hope to encourage researchers and clinicians to develop more effective, generalizable, and accessible foundation models for EHR analysis in Emergency Medicine, ultimately improving patient outcomes and advancing the analysis of real-world EHR data. Related Work Current ED benchmarks are not multimodal Existing EHR datasets for ED or critically ill patients are often limited to structured EHR data and intermittent vital sign recordings. These datasets fail to capture the comprehensive multimodal information obtained from the intensive evaluation and monitoring of ED patients. To our knowledge, only two ED-specific benchmarks exist. Xie et al. (2022) [2] introduced an ED benchmark using the MIMIC-IV-ED dataset [3] . While this dataset represents the only publicly available general-purpose ED dataset, it contains only tabular EHR data for all patients, with radiology reports for a subset. EHRShot [4] is the other recent ED benchmark, but its underlying dataset includes only coded data such as ICD diagnosis codes, and is not publicly available.