NeurIPS2022

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, Jun-Yan Zhu

63 citations

Abstract

During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users prone to gradually edit the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited areas. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With about 1%-area edits, SIGE accelerates DDPM by <inline-formula><tex-math notation="LaTeX"> $3.0\times$ </tex-math><alternatives>mml:math mml:mrow mml:mn3</mml:mn>mml:mo.</mml:mo>mml:mn0</mml:mn>mml:mo×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq1-3316020.gif"/></alternatives></inline-formula> on NVIDIA RTX 3090 and <inline-formula><tex-math notation="LaTeX"> $4.6\times$ </tex-math><alternatives>mml:math mml:mrow mml:mn4</mml:mn>mml:mo.</mml:mo>mml:mn6</mml:mn>mml:mo×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq2-3316020.gif"/></alternatives></inline-formula> on Apple M1 Pro GPU, Stable Diffusion by <inline-formula><tex-math notation="LaTeX"> $7.2\times$ </tex-math><alternatives>mml:math mml:mrow mml:mn7</mml:mn>mml:mo.</mml:mo>mml:mn2</mml:mn>mml:mo×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq3-3316020.gif"/></alternatives></inline-formula> on 3090, and GauGAN by <inline-formula><tex-math notation="LaTeX"> $5.6\times$ </tex-math><alternatives>mml:math mml:mrow mml:mn5</mml:mn>mml:mo.</mml:mo>mml:mn6</mml:mn>mml:mo×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq4-3316020.gif"/></alternatives></inline-formula> on 3090 and <inline-formula><tex-math notation="LaTeX"> $5.2\times$ </tex-math><alternatives>mml:math mml:mrow mml:mn5</mml:mn>mml:mo.</mml:mo>mml:mn2</mml:mn>mml:mo×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq5-3316020.gif"/></alternatives></inline-formula> on M1 Pro GPU. Compared to our conference paper, we enhance SIGE to accommodate attention layers and apply it to Stable Diffusion. Additionally, we offer support for Apple M1 Pro GPU and include more results to substantiate the efficacy of our method.