NeurIPS2022
A Deep Learning Dataloader with Shared Data Preparation
Jian Xie, Jingwei Xu, Guochang Wang, Yuan Yao, Zenan Li, Chun Cao, Hanghang Tong
被引用 6 次
摘要
Parallelly executing multiple training jobs on overlapped datasets is a common practice in developing deep learning models. By default, each of the parallel jobs prepares (i.e., loads and preprocesses) the data independently, causing redundant consumption of I/O and CPU. Although a centralized cache component can reduce the redundancies by reusing the data preparation work, each job’s random data shuffling results in a low sampling locality causing heavy cache thrashing. Prior work tries to improve the sampling locality by enforcing all the training jobs loading the same dataset in the same order and pace. However, such a solution is only efficient under strong constraints: all jobs are trained on the same dataset with the same starting moment and training speed. In this paper, we propose a new data loading method for efficiently training parallel DNNs with much flexible constraints. Our method is still highly efficient when different training jobs use different but overlapped datasets and have different starting moments and training speeds. To achieve this, we propose a dependent sampling algorithm (DSA) and a domain-specific cache policy. Moreover, a novel tree data structure is designed to efficiently implement DSA. Based on the proposed techniques, we implemented a prototype, named J OADER , which can share data preparation work as long as the datasets are overlapped for different training jobs. We evaluate the proposed J OADER , showing a greater versatility and superiority of training speed improvement (up to 200% on ResNet18) without affecting the accuracy.