WWW2026

ToolBox-RL: Learning to Generalize Tool Use Across Massive Repositories

Xinyan Shi, Renzhi Wang, Haodong Liu, Piji Li

Abstract

Recent advances in enabling Large Language Models (LLMs) to use external tools have significantly extended their functional capabilities beyond internal knowledge. However, most existing approaches rely on retrieval-based mechanisms to select suitable tools from massive repositories, which often struggle to align user queries with tool documentation and exhibit limited generalization. To address these issues, we propose ToolBox-RL. The code is available at https://github.com/S-cavy/ToolBox-RL., a novel reinforcement learning framework that unifies query rewriting, intent understanding, and large-scale tool retrieval into an end-to-end optimization process. ToolBox-RL introduces a query rewriting stage to better capture user intent and ensure semantic alignment with tool descriptions, while reinforcement learning encourages autonomous discovery of generalized tool-use strategies through combined cold-start and policy optimization training. Our experiments demonstrate that ToolBox-RL not only achieves the best tool call accuracy on both white-box and black-box tools but also exhibits strong generalization capabilities on out-of-domain dataset. Ablation studies show that ToolBox-RL can adapt to different retrieval methods and maintains good performance even when incorporating a large number of distractor tools.