ICML2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Zhaoyang Zhang, Wenqi Shao, Jinwei Gu, Xiaogang Wang, Ping Luo

被引用 36 次

摘要

Model quantization is challenging due to many tedious hyper-parameters such as precision (bitwidth), dynamic range (minimum and maximum discrete values) and stepsize (interval between discrete values). Unlike prior arts that carefully tune these values, we present a fully differentiable approach to learn all of them, named Differentiable Dynamic Quantization (DDQ), which has several benefits. ( 1 ) DDQ is able to quantize challenging lightweight architectures like Mo-bileNets, where different layers prefer different quantization parameters. (2) DDQ is hardwarefriendly and can be easily implemented using lowprecision matrix-vector multiplication, making it capable in many hardware such as ARM. (3) DDQ reduces training runtime by 25% compared to state-of-the-arts. Extensive experiments show that DDQ outperforms prior arts on many networks and benchmarks, especially when models are already efficient and compact. e.g. DDQ is the first approach that achieves lossless 4-bit quantization for MobileNetV2 on ImageNet.