ICLR2025

Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization

Zichen Wang, Yaokun Ji, Jianing Tian, Shuangjia Zheng

Abstract

Antibodies are essential proteins responsible for immune responses in organisms, capable of specifically recognizing antigen molecules of pathogens. Recent advances in generative models have significantly enhanced rational antibody design. However, existing methods mainly create antibodies from scratch without template constraints, leading to model optimization challenges and unnatural sequences. To address these issues, we propose a retrieval-augmented diffusion framework, termed RADAb, for efficient antibody design. Our method leverages a set of structural homologous motifs that align with query structural constraints to guide the generative model in inversely optimizing antibodies according to desired design criteria. Specifically, we introduce a structure-informed retrieval mechanism that integrates these exemplar motifs with the input backbone through a novel dual-branch denoising module, utilizing both structural and evolutionary information. Additionally, we develop a conditional diffusion model that iteratively refines the optimization process by incorporating both global context and local evolutionary conditions. Our approach is agnostic to the choice of generative models. Empirical experiments demonstrate that our method achieves state-ofthe-art performance in multiple antibody inverse folding and optimization tasks, offering a new perspective on biomolecular generative models. Computational efforts in antibody design have traditionally involved grafting residues onto existing structures (Sormanni et al., 2015) , sampling alternative native CDR loops to enhance affinities (Aguilar Rangel et al., 2022) , and using tools like Rosetta for sequence design improvements in interacting regions (Adolf-Bryfogle et al., 2018) . Many recent studies have focused on applying deep generative models to design antibodies (Luo et al., 2022; Martinkus et al., 2024; Zhu et al., 2024) . They take advantage of geometric learning and generative models to capture the higher- * Equal Contribution