ACL2024

Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources

Xiaochen Wang, Junyu Luo, Jiaqi Wang, Yuan Zhong, Xiaokun Zhang, Yaqing Wang, Parminder Bhatia, Cao Xiao, Fenglong Ma

摘要

Although pre-training has become a prevalent approach for addressing various biomedical tasks, the current efficacy of pre-trained models is hindered by their reliance on a limited scope of medical sources. This limitation results in data scarcity during pre-training and restricts the range of applicable downstream tasks. In response to these challenges, we develop Medical Cross-Source Pre-training (MEDCSP 1 ), a new pre-training strategy designed to bridge the gap between multimodal medical sources. MEDCSP employs modalitylevel aggregation to unify patient data within individual sources. Additionally, leveraging temporal information and diagnosis history, MED-CSP effectively captures explicit and implicit correlations between patients across different sources. To evaluate the proposed strategy, we conduct comprehensive experiments, where the experiments are based on 6 modalities from 2 real-world medical data sources, and MEDCSP is evaluated on 4 tasks against 19 baselines, marking an initial yet essential step towards cross-source modeling in the medical domain.