CVPR2025

LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending

Jian Jin, Zhenbo Yu, Yang Shen, Zhenyong Fu, Jian Yang

摘要

tioned after the text encoder and a linear projection. LA-TEXBLEND customizes each concept individually, storing them in a concept bank with a compact representation of latent textual features that captures sufficient concept information to ensure high fidelity. At inference, concepts from the bank can be freely and seamlessly combined in the latent textual space, offering two key merits for multiconcept generation: 1) excellent scalability, and 2) significant reduction of denoising deviation, preserving coherent layouts. Extensive experiments demonstrate that LATEXBLEND can flexibly integrate multiple customized concepts with harmonious structures and high subject fidelity, substantially outperforming baselines in both generation quality and computational efficiency. Project page: https://jinjianrick.github.io/latexblend/ This CVPR paper is the Open Access version, provided by the Computer Vision Foundation. Except for this watermark, it is identical to the accepted version; the final published version of the proceedings is available on IEEE Xplore. MuDI Concept bank OMG Mix-of-Show V 2 * bear plushie sitting on V 1 * chair. V 7 * dog playing V 8 * guitar, surrounded by V 6 * flower, with V 10 * lighthouse in the background. V 11 * cat sitting next to V 12 * teddybear, with V 6 * flower blooming beside them, with V 10 * lighthouse and V 9 * barn in the background. Two kids wearing V 3 * jacket and V 4 * shoes, playing with V 5 * dog.