ACL2025
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos
Tingyu Song, Tongyan Hu, Guo Gan, Yilun Zhao
1 citation
Abstract
Multimodal large language models (MLLMs) have been widely studied for video question answering recently. However, most existing assessments focus on natural videos, overlooking synthetic videos, such as AI-generated content (AIGC). Meanwhile, some works in video generation rely on MLLMs to evaluate the quality of generated videos, but the capabilities of MLLMs on interpreting AIGC videos remain largely underexplored. To address this, we propose a new benchmark, VF-EVAL, which introduces four tasks-coherence validation, error awareness, error type detection, and reasoning evaluation-to comprehensively evaluate the abilities of MLLMs on AIGC videos. We evaluate 13 frontier MLLMs on VF-EVAL and find that even the best-performing model, GPT-4.1, struggles to achieve consistently good performance across all tasks. This highlights the challenging nature of our benchmark. Additionally, to investigate the practical applications of VF-EVAL in improving video generation, we conduct an experiment, REPROMPT, demonstrating that aligning MLLMs more closely with human feedback can benefit video generation. Data songtingyu/VF-Eval Code SighingSnow/VF-Eval (a) Yes-Or-No (c) Open-Ended (b) Multichoice Q1: Is there moral issuse in this video, including human, meaningless text, violence ? Q2: Is there distortion issue within the pink pig toy? A. Yes B. No A. Yes B. No Q: What is unusual about the straw's appearance? A. The straw is missing its top part. B. The colors of the top and bottom parts are different. C. The straw is shorter than a regular one. D. The straw is bent at an unusual angle. Q1: Identify any discrepancies between the video content and "A soccer player kicks a ball harder, making it travel farther than a light tap. " Afterward, suggest a better prompt based on the text to help regenerate the video. A1: (1) Mis-alignment: There are two soccer balls in the video, and the soccer player does not kick the ball out. (2) Better prompt: A soccer player kicks a soccer ball hard. Q2: How many soccer balls does the man kick? A2: The man in the video actually kicks one ball, but the trajectory of the ball he kicked does not match his action, while the trajectory of the other soccer ball does. And a kick on one ball can't make two balls move.