CVPR2024

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text- to- Image Generation

Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

Abstract

A cartoon fox with clouds on the left. (b) A cartoon fox with clouds on the right. (c) A cartoon fox with clouds on the top. (d) A cartoon fox with clouds on the bottom (e) A lion with a crown and flowers, the (i) Eiffel Tower with a storm on the bottom (h) A boy on the left looked up at the aurora on the top right. Before After Before After Before After Before After crown on the bottom, flowers on the top. (f) An angel, a flower on the top, an apple on the bottom, a mountain on the top. (g) A cat on the bottom right, a lamp on the top, a cake on the bottom, balloons on the left. . . Figure 1. Given only the input textual prompt, our system can autonomously detect and rectify the layout inconsistencies across various position requirements (a-d), object quantities (e-g), and resolutions (h-i).