CVPR2024

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, Jean Oh

摘要

Original Stable Diffusion Original Stable Diffusion (a)"Photo of a traditional building, in [Culture]" (b)"Two people wearing traditional clothing, in [Culture]" SCoFT (Ours) SCoFT (Ours) American Culture Nigerian Culture Korean Culture Mexican Culture Chinese Culture Indian Culture Stereotype Misrepresentation Figure 1. Comparison between Stable Diffusion with and without our proposed fine-tuning approach, SCoFT, on our proposed CCUB dataset. Stable Diffusion perpetuates harmful stereotypes that assume dirty buildings are representative of some nations, and often generates regionally irrelevant designs. By contrast, our approach decreases stereotypes and improves cultural relevance of generated images.