NeurIPS2021

Towards understanding retrosynthesis by energy-based models

Ruoxi Sun, Hanjun Dai, Li Li, Steven Kearnes, Bo Dai

被引用 43 次

摘要

Retrosynthesis is the process of identifying a set of reactants to synthesize a target molecule. It is critical to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. However, the inner connections of these models are rarely discussed, and rigorous evaluations of these models are largely in need. In this paper, we propose a framework that unifies sequence-and graph-based methods as energy-based models (EBMs) with different energy functions. This unified view establishes connections and reveals the differences between models, thereby enhances our understanding of model design. We also provide a comprehensive assessment of performance to the community. Additionally, we present a novel dual variant within the framework that performs consistent training to induce the agreement between forward-and backward-prediction. This model improves the state-of-the-art of template-free methods with or without reaction types. Retrosynthesis is a critical problem in organic chemistry and drug discovery [1] [2] [3] [4] [5] . As the reverse process of chemical synthesis [6, 7] , retrosynthesis aims to find the set of reactants that can synthesize the provided target via chemical reactions (Fig 1 ). Since the search space of theoretically feasible reactant candidates is enormous, models should be designed carefully to have the expression power to learn complex chemical rules and maintain computational efficiency. N S N N O N HS N O Cl N Cc1ccc(-n2c(SCc3ccnc c3)nc3ccccc3c2=O)cc1 Cc1ccc(-n2c(S)nc3c cccc3c2=O)cc1 ClCc1ccncc1