NeurIPS2024

Few-Shot Diffusion Models Escape the Curse of Dimensionality

Ruofeng Yang, Bo Jiang, Cheng Chen, Ruinan Jin, Baoxiang Wang, Shuai Li

摘要

While diffusion models have demonstrated impressive performance, there is a growing need for generating samples tailored to specific user-defined concepts. The customized requirements promote the development of few-shot diffusion models, which use limited n ta target samples to fine-tune a pre-trained diffusion model trained on n s source samples. Despite the empirical success, no theoretical work specifically analyzes few-shot diffusion models. Moreover, the existing results for diffusion models without a fine-tuning phase can not explain why few-shot models generate great samples due to the curse of dimensionality. In this work, we analyze few-shot diffusion models under a linear structure distribution with a latent dimension d . From the approximation perspective, we prove that few-shot models have a (cid:101) O ( n − 2 /d s + n − 1 / 2 ta ) bound to approximate the target score function, which is better than n − 2 /d ta results. From the optimization perspective, we consider a latent Gaussian special case and prove that the optimization problem has a closed-form minimizer. This means few-shot models can directly obtain an approximated minimizer without a complex optimization process. Furthermore, we also provide the accuracy bound (cid:101) O (1 /n ta + 1 / √ n s ) for the empirical solution, which still has better dependence on n ta compared to n s . The results of the real-world experiments also show that the models obtained by only fine-tuning the encoder and decoder specific to the target distribution can produce novel images with the target feature, which supports our theoretical results.