CVPR2023

GLIGEN: Open-Set Grounded Text-to-Image Generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee

摘要

https://gligen.github.io/ Caption: "A woman sitting in a restaurant with a pizza in front of her " Grounded text: table, pizza, person, wall, car, paper, chair, window, bottle, cup Caption: "a baby girl / monkey / Hormer Simpson / is scratching her/its head" Grounded keypoints: plotted dots on the left image Caption: "A dog / bird / helmet / backpack is on the grass" Grounded image: red inset Caption: "Elon Musk and Emma Watson on a movie poster" Grounded text: Elon Musk, Emma Watson; Grounded style image: blue inset Caption: "A vibrant colorful bird sitting on tree branch" Grounded depth map: the left image Caption: "A young boy with white powder on his face looks away" Grounded HED map: the left image Caption: "Cars park on the snowy street" Grounded normal map: the left image Caption: "A living room filled with lots of furniture and plants" Grounded semantic map: the left image § Part of the work performed at Microsoft; ¶ Co-senior authors This CVPR paper is the Open Access version, provided by the Computer Vision Foundation. Except for this watermark, it is identical to the accepted version; the final published version of the proceedings is available on IEEE Xplore.