AAAI2026

SAGA: Learning Signal-Aligned Distributions for Improved Text-to-Image Generation

Paul Grimal, Michaël Soumm, Hervé Le Borgne, Olivier Ferret, Akihiro Sugimoto

1 citation

Abstract

State-of-the-art text-to-image models produce visually impressive results but often struggle with precise alignment to text prompts, leading to missing critical elements or unintended blending of distinct concepts. We propose a novel approach that learns a high-success-rate distribution conditioned on a target prompt, ensuring that generated images faithfully reflect the corresponding prompts. Our method explicitly models the signal component during the denoising process, offering fine-grained control that mitigates overoptimization and out-of-distribution artifacts. Moreover, our framework is training-free and seamlessly integrates with both existing diffusion and flow matching architectures. It also supports additional conditioning modalities -such as bounding boxes -for enhanced spatial alignment. Extensive experiments demonstrate that our approach outperforms current state-of-the-art methods. Code available at https://github.com/grimalPaul/gsn-factory