ICML2022
Communication-efficient Distributed Learning for Large Batch Optimization
Rui Liu, Barzan Mozafari
9 citations
Abstract
Background: Large Batch Optimization Use largest batch size that still fits the GPU memory • Local batch size is fixed for each GPU (total batch size increases as the number of GPUs increases) • Fully utilize the compute power of each node • Same generalization with some mitigation tricks (e.g., layerwise adaptive learning rates as in Lars) [1, 2]