ICLR2025

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

S. Sakshi, Utkarsh Tyagi, Sonal Kumar, Ashish Seth, Ramaneswaran Selvakumar, Oriol Nieto, Ramani Duraiswami, Sreyan Ghosh, Dinesh Manocha

摘要

Figure 1: Overview of the MMAU Benchmark. MMAU provides comprehensive coverage across three key domains: speech, sounds, and music, featuring diverse audio samples. It challenges multimodal LLMs with tasks across 27 distinct skills, requiring advanced audio perception, reasoning, and domain-specific knowledge.