ICCV2021

Image Synthesis from Layout with Locality-Aware Mask Adaption

Zejian Li, Jingyu Wu, Immanuel Koh, Yongchuan Tang, Lingyun Sun

88 citations

Abstract

This paper is concerned with synthesizing images conditioned on a layout (a set of bounding boxes with object categories). Existing works construct a layout-maskimage pipeline. Object masks are generated separately and mapped to bounding boxes to form a whole semantic segmentation mask (layout-to-mask), with which a new image is generated (mask-to-image). However, overlapped boxes in layouts result in overlapped object masks, which reduces the mask clarity and causes confusion in image generation. We hypothesize the importance of generating clean and semantically clear semantic masks. The hypothesis is supported by the finding that the performance of state-of-theart LostGAN decreases when input masks are tainted. Motivated by this hypothesis, we propose Locality-Aware Mask Adaption (LAMA) module to adapt overlapped or nearby object masks in the generation. Experimental results show our proposed model with LAMA outperforms existing approaches regarding visual fidelity and alignment with input layouts. On COCO-stuff in 256×256, our method improves the state-of-the-art FID score from 41.65 to 31.12 and the SceneFID from 22.00 to 18.64. Methods Trained with Aggregating overlapped GT masks object masks/features Layout2Im [37, 38] No ConvLSTM LostGAN [32, 34] No Normalize OC-GAN [35] No Normalize and concatenate with layout boundaries Hong et al. [13] Yes Sum ⋄ Obj-GAN [20] Yes Maxpooling ⋄ OP-GAN [11, 12] Yes Sum in global pathway and replacement in object path SG2IM [15] Yes Sum Ashual and Wolf [1] Yes Normalize Ours No Adapt and normalize