ACL2024
TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation
Yikai Zhang, Siyu Yuan, Caiyu Hu, Kyle Richardson, Yanghua Xiao, Jiangjie Chen
Abstract
Despite remarkable advancements in emulating human-like behavior through Large Language Models (LLMs), current textual simulations do not adequately address the notion of time. To this end, we introduce TIMEARENA, a novel textual simulated environment that incorporates complex temporal dynamics and constraints that better reflect real-life planning scenarios. In TIMEARENA, agents are asked to complete multiple tasks as soon as possible, allowing for parallel processing to save time. We implement the dependency between actions, the time duration for each action, and the occupancy of the agent and the objects in the environment. TIMEARENA grounds to 30 real-world tasks in cooking, household activity, and laboratory work. We conduct extensive experiments with various LLMs using TIMEARENA. Our findings reveal that even the most powerful models, e.g., GPT-4, still lag behind humans in effective multitasking, underscoring the need for enhanced temporal awareness in the development of language agents. pendencies, requiring agents to strategize and prior-042 itize based on time constraints and task completion 043 progress. 2) Agent Occupancy: Agents will be oc-044 cupied by certain actions thus they might be unable 045 to perform other actions at the same time. 3) Ob-046 ject Occupancy: Some objects might be occupied 047 for some time, and agents must use available ob-048 jects in the environment for the tasks. These factors 049 are common in real-life but are seldom addressed 050 by current textual simulations. 051 To help illustrate, Figure 1 shows an example 052 of completing make tea (Task 1) and wash clothes 053 (Task 2). The actions of each task might depend on 054 previous actions, e.g., agents must boil water be-055 fore make tea, and each action takes a duration in 056 time, e.g., wash cup takes 5 minutes. In particular, 057 2 Related Work 110 Simulation-based Evaluation For language 111 Agents With the great success of LLMs (Ope-112 nAI, 2022, 2023; Team and Google, 2023), recent 113 works have shifted the focus from traditional NLP 114 tasks to explore language agents in simulation en-115 vironments that mimic real-world scenarios (Wu 116