ICLR2025

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Haiyang Shen, Yue Li, Desong Meng, Dongqi Cai, Sheng Qi, Li Zhang, Mengwei Xu, Yun Ma

Abstract

Part 1 -Lack of Richness and Complexity • Limited API Richness: Few APIs, small number of apps, narrow range of task difficulties. • Low Query Complexity: Average action length is short (1-5.9). • Consequence: Fails to differentiate the capabilities of different LLMs, even less intelligent ones.