AAAI2023

DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery - a Focus on Affinity Prediction Problems with Noise Annotations

Yuanfeng Ji, Lu Zhang, Jiaxiang Wu, Bingzhe Wu, Lanqing Li, Long-Kai Huang, Tingyang Xu, Yu Rong, Jie Ren, Ding Xue, Houtim Lai, Wei Liu, Junzhou Huang, Shuigeng Zhou, Ping Luo, Peilin Zhao, Yatao Bian

65 citations

Abstract

Obtaining accurate and valid information for drug molecules is a crucial and challenging task. However, chemical knowledge and information have been accumulated over the past 100 years from various regions, laboratories, and experimental purposes. Little has been explored in terms of the out-of-distribution (OOD) problem with noise and inconsistency, which may lead to weak robustness and unsatisfied performance. This study proposes a novel benchmark ADMEOOD, a systematic OOD dataset curator and benchmark specifically designed for drug property prediction. ADMEOOD obtained 27 ADME (Absorption, Distribution, Metabolism, Excretion) drug properties from Chembl and relevant literature. Additionally, it includes two kinds of OOD data shifts: Noise Shift and Concept Conflict Drift (CCD). Noise Shift responds to the noise level by categorizing the environment into different confidence levels. On the other hand, CCD describes the data which has inconsistent label among the original data. Finally, it tested on a variety of domain generalization models, and the experimental results demonstrate the effectiveness of the proposed partition method in AD-MEOOD: ADMEOOD demonstrates a significant difference performance between in-distribution and out-of-distribution data. Moreover, ERM (Empirical Risk Minimization) and other models exhibit distinct trends in performance across different domains and measurement types.