NeurIPS2024
WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games
Junlin Xie, Ruifei Zhang, Zhihong Chen, Xiang Wan, Guanbin Li
Abstract
Recently, large language models (LLMs) have achieved superior performance, empowering the development of large multimodal agents (LMAs). An LMA expected to perform practical tasks must possess a range of capabilities, including multimodal perception, interaction, reasoning, and decision-making skills. However, existing benchmarks are limited in assessing compositional skills and actions ♡ Equal contribution ♣ Corresponding authors.