ICLR2026

Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM

Rongjie Zhu, Cong Zhang, Zhiguang Cao

被引用 1 次

摘要

While large language models (LLMs) are emerging as automated heuristic designers for solving vehicle routing problems (VRPs), state-of-the-art approaches predominantly rely on massive, general-purpose models like GPT-4. This work challenges this paradigm by demonstrating that smaller, specialized LLMs, when finely tuned, can generate components that surpass expert-designed heuristics within advanced solvers. We introduce RFTHGS, a novel Reinforcement learning (RL) framework for Fine-Tuning a small LLM to produce high-performance crossover operators for the Hybrid Genetic Search (HGS) solver to solve the capacitated vehicle routing problem (CVRP). Our methods utilizes a multi-tiered, curriculum-based reward function that progressively guides the LLM to first produce compilable code, then executable operators, and finally, components that exceed human expert-designed ones. Additionally, we introduce an operator caching mechanism to work in conjunction with the reward function, discouraging plagiarism and promoting diversity during training. Experimental results demonstrate that our fine-tuned LLM generates crossover operators which significantly outperform those designed by human experts in HGS. This performance advantage is consistent, holding from small-scale instances and generalizing to large-scale problems of up to 1000 nodes. Furthermore, RFTHGS surpasses leading neurocombinatorial baselines, prompt-based methods, and commercial LLMs, including GPT-4o and GPT-4o-mini.