ICML2021

Accumulated Decoupled Learning with Gradient Staleness Mitigation for Convolutional Neural Networks

Huiping Zhuang, Zhenyu Weng, Fulin Luo, Kar-Ann Toj, Haizhou Li, Zhiping Lin

6 citations

Abstract

Gradient staleness is a major side effect in decoupled when training convolutional neural asynchronously. Existing methods that this effect might result in reduced generalization even divergence. In this paper, propose an accumulated decoupled learning (ADL), which includes a module-wise gradient in order to mitigate the gradient . Unlike prior arts ignoring the gradient , we quantify the staleness in such a way its mitigation can be quantitatively visualized. a new learning scheme, the proposed ADL is shown to converge to critical points spite of its asynchronism. Extensive experiments CIFAR-10 and ImageNet datasets are , demonstrating that ADL gives promising results while the state-of-theart experience reduced generalization divergence. In addition, our ADL is shown to the fastest training speed among the compared . The code will be ready soon https://github.com/ZHUANGHP/Accumulated--Learning.git.