EMNLP2025
VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM
Lesheng Jin, Zhenyuan Ruan, Haohui Mai, Jingbo Shang
Abstract
Modern GPUs evolve rapidly, yet production compilers still rely on hand-crafted register allocation heuristics that require substantial retuning for each hardware generation. We introduce VERILOCC, a framework that combines large language models (LLMs) with formal compiler techniques to enable generalizable and verifiable register allocation across GPU architectures. VERILOCC fine-tunes an LLM to translate intermediate representations (MIRs) into target-specific register assignments, aided by static analysis for cross-architecture normalization and generalization and a verifierguided regeneration loop to ensure correctness. Evaluated on matrix multiplication (GEMM) and multi-head attention (MHA), VERILOCC achieves 85-99% single-shot accuracy and near-100% pass@100. Case study shows that VERILOCC discovers more performant assignments than expert-tuned libraries, outperforming rocBLAS by over 10% in runtime.