NeurIPS2023

Temporal Dynamic Quantization for Diffusion Models

Junhyuk So, Jungwon Lee, Daehyun Ahn, Hyungjun Kim, Eunhyeok Park

90 citations

Abstract

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property of temporal variation in activation. We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information, significantly improving output quality. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference and is compatible with both post-training quantization (PTQ) and quantization-aware training (QAT). Our extensive experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets. While the majority of previous approaches have focused on reducing the number of sampling steps to accelerate the denoising process, it is also important to lighten the individual denoising steps. Since the single denoising step can be regarded as a conventional deep learning model inference, various model compression techniques can be used. Quantization is a widely used compression technique where both weights and activations are mapped to the low-precision domain. While advanced quantization schemes have been extensively studied for conventional Convolution Neural Networks (CNNs) and * Equal Contribution 37th Conference on Neural Information Processing Systems (NeurIPS 2023).