CVPR2024

Don't Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion

Nicolas Dufour, Victor Besnier, Vicky Kalogeiton, David Picard

Abstract

Prompt: A vibrant, multicolored furry wolf with neon highlights playing an electric guitar on stage; trending on artstation Prompt: A shanty version of Tokyo, new rustic style, bold colors with all colors palette, video game, genshin, tribe, fantasy, overwatch Figure 1. Images generated with a RIN model trained with different handling of the misalignment between the image and its associated text at training. Compared to doing nothing (baseline), removing misaligned samples (filtering) or weighting the loss accordingly (weighted), our Coherence-Award Diffusion training (CAD) generates more visually pleasing images while better adhering to the prompt's subject.