EMNLP2024

Tag-grounded Visual Instruction Tuning with Retrieval Augmentation

Daiqing Qi, Handong Zhao, Zijun Wei, Sheng Li

被引用 1 次

摘要

GPT4:This photo features four mangosteens on a wooden table.Three of them are uncut, showing their round, dark purple exterior.The fourth mangosteen is cut open, revealing its white, fleshy interior, which is divided into segments.The table surface has a natural wood grain pattern and adds a rustic touch to the image. LLaVA:The image features a close-up of three small, round, and purple fruits, possibly pomegranates, sitting on a wooden surface.The fruits are placed in a row, with one on the left, one in the middle, and one on the right.In addition to the fruits, there is a knife located on the right side of the image, likely used for cutting the fruits.The scene appears to be a still life composition, showcasing the vibrant color and texture of the fruits. ShareGPT4V:The image captures a close-up view of four lychee fruits, their vibrant colors standing out against the wooden surface they rest on.The fruits are arranged in a square formation, with one at each corner.The top left and bottom right corners each hold a whole lychee, their green tops contrasting with the deep purple of the fruit.The top right corner features a lychee that has been cut open, revealing the white interior that is characteristic of this tropical fruit.The background, though blurred, hints at the presence of green leaves, adding a touch of nature to the scene.What type of fruit is this?