NeurIPS2024
MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training
Bo Chen, Zhilei Bei, Xingyi Cheng, Pan Li, Jie Tang, Le Song
摘要
Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high-quality MSA. Although various methods have been proposed to generate virtual MSA under these conditions, they fall short in comprehensively capturing the intricate co-evolutionary patterns within MSA or require guidance from external oracle models. Here we introduce MSAGPT, a novel approach to prompt protein structure predictions via MSA generative pre-training in the low-MSA regime. MSAGPT employs a simple yet effective 2D evolutionary positional encoding scheme to model the complex evolutionary patterns. Endowed by this, its flexible 1D MSA decoding framework facilitates zero-or few-shot learning. More-over, we demonstrate that leveraging the feedback from AlphaFold2 can further enhance the model’s capacity via Rejective Fine-tuning (RFT) and Reinforcement Learning from AF2 Feedback (RLAF). Extensive experiments confirm the efficacy of MSAGPT in generating faithful virtual MSA to enhance the structure prediction accuracy (up to +8.5% TM-Score on few-shot scenarios). The transfer learning capabilities also highlight its great potential for facilitating other protein tasks.