NeurIPS2021
Piper: Multidimensional Planner for DNN Parallelization
Jakub Tarnawski, Deepak Narayanan, Amar Phanishayee
被引用 74 次
摘要
The rapid increase in sizes of state-of-the-art DNN models, and consequently the increase in the compute and memory requirements of model training, has led to the development of many execution schemes such as data parallelism, pipeline model parallelism, tensor (intra-layer) model parallelism, and various memory-saving optimizations. However, no prior work has tackled the highly complex problem of optimally partitioning the DNN computation graph across many accelerators while combining all these parallelism modes and optimizations. In this work, we introduce Piper, an efficient optimization algorithm for this problem that is based on a two-level dynamic programming approach. Our two-level approach is driven by the insight that being given tensor-parallelization techniques for individual layers (e.g., Megatron-LM's splits for transformer layers) significantly reduces the search space and makes the global problem tractable, compared to considering tensor-parallel configurations for the entire DNN operator graph. Combining these dimensions, however, is non-trivial [19] , since each dimension has trade-offs with respect to computational efficiency, amount of communication, and memory footprint. Given the importance of efficient model-parallel training (and inference [7, 4] ), partitioning a model across 35th Conference on Neural Information Processing Systems (NeurIPS 2021).