CVPR2025

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Lital Binyamin, Yoad Tewel, Hilit Segev, Eran Hirsch, Royi Rassin, Gal Chechik

Abstract

CountGen (ours) "A photo of six kittens sitting on a branch" "A photo of five eggs in a carton" "A realistic photo of Goldilocks and three bears eating a porridge" "an illustration of four ninja turtles" SDXL "A realistic photo of seven dwarves dancing in the forest" Figure 1 . CountGen generates the correct number of objects specified in the input prompt while maintaining a natural layout that aligns with the prompt.