ICLR2025

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Zicheng Zhang, Haoning Wu, Chunyi Li, Yingjie Zhou, Wei Sun, Xiongkuo Min, Zijian Chen, Xiaohong Liu, Weisi Lin, Guangtao Zhai

Abstract

From Contradiction Overcome From Generative Distortion Assessment What is the most severe generative distortion? A. Incorrect structure of the handgun B. Blur due to low completion C. Incorrect structure of the woman's face D. Incorrect structure of the woman's hand (correct) GPT-4o Response: A Gemini 1.5 Pro Response : D Does the cactus contain soft and fluffy leaves? A. No B. Yes (correct) GPT-4o Response: A Gemini 1.5 Pro Response: A. No From Composition Identification What is partially covered by the mountain climber's backpacks? A. Climbing harnesses B. Boots lined up behind C. Ropes and carabiners D. The view of the mountain in the background (correct) GPT-4o Response: B. Boots lined up behind Gemini 1.5 Pro Response: B. Boots lined up behind Figure 1: Error cases from the A-Bench.