CVPR2025

Generative Photomontage

Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu

Abstract

a) (b) (c) "A dog on grass" "A dog on ice" + (d) "A Japanese-style stone house" "A robot from the future" + + + "A waffle pancake" + "A Japanese-style stone house" "A Japanese-style stone house" "A Japanese-style stone house" ControlNet Input ControlNet Output + User strokes Our Output ControlNet Input ControlNet Output + User strokes Our Output ControlNet Input ControlNet Output + User strokes Our Output ControlNet Input ControlNet Output + User strokes Our Output Figure 1 . We introduce Generative Photomontage, a framework that allows users to create their desired image by compositing multiple generated images. Given a stack of ControlNet-generated images using the same input condition and different seeds, users select desired regions from different images within the stack. Our method takes in the user strokes, solves for a segmentation across the stack using diffusion features, and then composites them using a new feature-space blending method. Our method offers users fine-grained control over the final image and enables various applications, such as generating unseen appearance combinations (a, c), correcting shapes and removing artifacts (b, d).