ICML2022

Communication-efficient Distributed Learning for Large Batch Optimization

Rui Liu, Barzan Mozafari

9 citations

Abstract

Background: Large Batch Optimization Use largest batch size that still fits the GPU memory • Local batch size is fixed for each GPU (total batch size increases as the number of GPUs increases) • Fully utilize the compute power of each node • Same generalization with some mitigation tricks (e.g., layerwise adaptive learning rates as in Lars) [1, 2]