SOSP2023

UGACHE: A Unified GPU Cache for Embedding-based Deep Learning

Xiaoniu Song, Yiwen Zhang, Rong Chen, Haibo Chen

20 citations

Abstract

This paper presents UGache, a unified multi-GPU cache system for embedding-based deep learning (EmbDL). UGache is primarily motivated by the unique characteristics of EmbDL applications, namely read-only, batched, skewed, and predictable embedding accesses. UGache introduces a novel factored extraction mechanism that avoids bandwidth congestion to fully exploit high-speed cross-GPU interconnects (e.g., NVLink and NVSwitch). Based on a new hotness metric, UGache also provides a near-optimal cache policy that balances local and remote access to minimize the extraction time. We have implemented UGache and integrated it into two representative frameworks, TensorFlow and PyTorch. Evaluation using two typical types of EmbDL applications, namely graph neural network training and deep learning recommendation inference, shows that UGache outperforms state-of-the-art replication and partition designs by an average of 1.93× and 1.63× (up to 5.25× and 3.45×), respectively.