VLDB2024
Reimagining Deep Learning Systems Through the Lens of Data Systems
Arun Kumar
Abstract
The high-profile success of Deep Learning (DL) at Big Tech companies, including recent Large Language Models (LLMs) such as the GPT and Llama families, has led to high demand among Web companies, consumer app companies, enterprises, healthcare, domain sciences, and even digital humanities and arts to adopt modern DL for their applications. The scale of DL workloads, domain-specific datasets, and publicly available pre-trained base models keeps growing. Naturally, tackling issues of scalability, usability , and resource/cost efficiency of DL systems are critical to democratizing modern DL-powered AI. We find that some key lessons from the decades of work on data system design, implementation, and optimization-when adapted prudently-can go a long way toward that goal. Specifically, our work shows that new analogues of multi-query optimization for DL systems can substantially reduce runtimes and costs, while improving ease of use. This article lays out how we reimagine DL workloads that way and summarizes the technical contributions powering this transformation.