ICML2025

Tool Unlearning for Tool-Augmented LLMs

Jiali Cheng, Hadi Amiri

摘要

Tool-augmented large language models (LLMs) are often trained on datasets of query-response pairs, which embed the ability to use tools or APIs directly into the parametric knowledge of LLMs. As these models are increasingly deployed in real-world applications, there is a need for them to forget specific tools-for example, due to security vulnerabilities, privacy regulations, or tool deprecation. This work presents "tool unlearning" as a novel machine unlearning task that presents distinct challenges beyond traditional sample-level unlearning: it requires removing functional knowledge rather than individual data points, managing the high cost of LLM optimization, and developing principled evaluation metrics. To address these challenges, we propose TOOLDELETE, the first unlearning framework designed specifically for tool-augmented LLMs. It implements three key properties for effective tool unlearning and introduces a new membership inference attack (MIA) model for effective evaluation. Extensive experiments on multiple tool learning datasets and tool-augmented LLMs show that TOOLDELETE effectively unlearns both randomly selected and class-specific tools, while preserving knowledge on remaining tools and maintaining performance on general tasks. (a) Tool Learning and Tool Unlearning Tool Deletion Requests (Insecure tools, Broken tools, ...) (c) ToolDelete (b) Traditional Unlearning vs. Tool Unlearning