ICLR2026

Efficient Prediction of Large Protein Complexes via Subunit-Guided Hierarchical Refinement

Chixiang Lu, Yunhua Zhong, Shikang Liang, Xiaojuan Qi, Haibo Jiang

摘要

State-of-the-art protein structure predictors have revolutionized structural biology, yet quadratic memory growth with token length makes end-to-end inference impractical for large complexes beyond a few thousand tokens. We introduce HIER-AFOLD, a hierarchical pipeline that exploits the modularity of large complexes via PAE-guided (Predicted Aligned Error) subunit decomposition, targeted interfaceaware refinement, and confidence-weighted assembly. PAE maps localize rigid intra-chain segments and sparse inter-chain interfaces, enabling joint refinement of likely interacting subunits to capture multi-body cooperativity without increasing memory. HIERAFOLD matches AlphaFold3 accuracy, raises success rates from 49.9% (CombFold) to 73.1% on recent PDB set. While for large complexes, it cuts peak memory by ∼25 GB on a 4,000 token target (∼40%), successfully models complexes with over 5,000 tokens that are Out-Of-Memory for AlphaFold3, and raises success rates by two-fold compared with CombFold. The code is available at https://github.com/Luchixiang/HierAFold .