CCS2023

Demo: Certified Robustness on Toolformer

Yue Xu, Wenjie Wang

摘要

Tool-augmented language models (TALMs) overcome the limitations of current language models (LMs), allowing them to leverage external tools to enhance performance. One state-of-the-art example is Toolformer introduced by Meta AI Research, which achieves a broader integration of tool utilization. However, Toolformer faces particular concerns related to the robustness of its predictions in the optimal positioning for API calls. Adversarial perturbations can alter the position of API calls chosen by Toolformer, thus resulting in responses that are not only incorrect but potentially even less accurate than those generated by standard language models. To improve the robustness of Toolformer and fulfill the capability of its toolbox, our focus lies on addressing the potential vulnerabilities that arise from small perturbations in the input or prompt space. To achieve this goal, we plan to study adversarial attacks from both attackers' and defenders' perspectives by first studying the adversarial attack algorithms on the input and prompt space, then proposing the certified robustness to the Toolformer API calls scheduling, which is not only empirically effective but also theory-backed.