NeurIPS2023

Tree Variational Autoencoders

Laura Manduchi, Moritz Vandenhirtz, Alain Ryser, Julia E. Vogt

摘要

Traditional antibiotic development remains slow and costly, with virtually no new classes targeting resistant gramnegative bacteria (GNB), highlighting the need for innovative approaches in generating new gram-negative antibacterial (GNAB) compounds to combat GNB. This study introduces a computational framework utilizing Junction Tree Variational Autoencoders (JT-VAE) and computational screening to design novel GNAB compounds. Generated compounds underwent Tanimoto similarity analysis and Lipinski’s Rule of Five (LRo5) screening to assess structural novelty and oral bioavailability, followed by agglomerative hierarchical clustering with cophenetic and silhouette score evaluation. Training on curated GNAB compounds from ChEMBL, the GVAE model, specifically the Junction Tree Variational Autoencoder (JT-VAE) model, demonstrated superior performance, producing 10,000\mathbf{1 0, 0 0 0} molecules with 100% validity. Subsequent filtering retained 2,141 compounds (21.41%) within optimal Tanimoto similarity thresholds (0.300.50)(0.30-0.50) that balance novelty with known antibacterial substructures while meeting LRo5 with 2\leq 2 violations. Property distributions aligned with fragment-based design principles, such as the Rule of Three. Clustering analysis revealed nuanced performance differences among benchmark models: JT-VAE achieved a cophenetic correlation coefficient (CCC) of 0.934 and silhouette score of 0.76, while unconstrained GVAE demonstrated superior clustering performance (silhouette score: 0.86, CCC: 0.98) despite occasionally generating disconnected molecular fragments. JT-VAE’s fragment-based approach successfully addresses limitations of atom-by-atom generation, particularly in handling aromatic systems essential for antibacterial activity. By combining hierarchical graph generation with rigorous cheminformatics filtering, this study provides a reproducible blueprint for accelerating GNAB compound discovery, advancing toward timely, data- driven solutions for the escalating antimicrobial resistance crisis.