NeurIPS2023

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

Yuseung Lee, Kunho Kim, Hyunjin Kim, Minhyuk Sung

104 citations

Abstract

The remarkable capabilities of pretrained image diffusion models have been utilized not only for generating fixed-size images but also for creating panoramas. However, naive stitching of multiple images often results in visible seams. Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows and averaging latent features in overlapping regions. However, these approaches, which focus on seamless montage generation, often yield incoherent outputs by blending different scenes within a single image. To overcome this limitation, we propose SYNCDIFFUSION, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. Specifically, we compute the gradient of the perceptual loss using the predicted denoised images at each denoising step, providing meaningful guidance for achieving coherent montages. Our experimental results demonstrate that our method produces significantly more coherent outputs for text-guided panorama generation compared to previous methods (66.35% vs. 33.65% in our user study) while still maintaining fidelity (as assessed by GIQA) and compatibility with the input prompt (as measured by CLIP score). We further demonstrate the versatility of our method across three plug-and-play applications: layout-guided image generation, conditional image generation and 360-degree panorama generation. Our project page is at https://syncdiffusion.github.io . Figure 1: Comparison of panoramas generated with prompt "A photo of a rock concert" by Blended Latent Diffusion [1] (top), MultiDiffusion [3] (middle), and our SYNCDIFFUSION (bottom). Blended Latent Diffusion, when applied on image extrapolation, often generates visible seams and repetitive patterns. MultiDiffusion creates seamless panoramas but fails to achieve global coherence across the image. In contrast, our SYNCDIFFUSION synchronizes windows across the panorama by increasing the perceptual similarity of the denoised output predictions. This results in significantly more coherent panorama outputs. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).