CVPR2024
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan
摘要
L 4 H i e r a r c h i c a l T a s k L e v e l C o r r e s p o n d i n g M o d e l C o r r e s p o n d i n g B e n c h m a r k * Equal Contribution. † Correspondence Author. Can you recognize the actions that occur in this video and list them in order? A. Cook breakfast, switch stove on, close fridge, carry milk, peel banana B. Scoop ice cream, squeeze chocolate syrup, pour sprinkles, close fridge C. Close fridge, carry milk, screw open milk cap, pour milk, screw close milk cap D. Reach for cereal box, grab bowl, pour milk, stir cereal, close fridge Procedure Understanding time What action do you anticipate following the end of this video? A. Stir potatoes B. Wash potatoes C. Add potatoes D. Slice potatoes Action Prediction time What is the action being carried out in the video? A. Throwing something in the air and letting it fall B. Throwing something in the air and catching it C. Lifting up one end of something, then letting it drop down D. Poking something so that it falls over Action Recognition time What are the differences between the two image? A. In the second image, there are two people standing on the sidewalk instead of three and a car is just entering the parking lot. B. In the second image, there are four people standing on the sidewalk instead of three and a car is just leaving the parking lot. C. In the second image, there are three people standing on the sidewalk instead of two and a car is just entering the parking lot. D. In the second image, there are two people standing on the sidewalk instead of three and a car is just leaving the parking lot. Difference Spotting What is funny about this comic strip? A. The polar bear entered the bus pavilion with a Dalmatian, but the bus pavilion was a dog without Dalmatian. B. The Dalmatian and bear are in the rain. C. This is a fake Dalmatian. D. The rain cleaned the Dalmatian's spots.