VLDB2025
Database Perspective on LLM Inference Systems
James Pan, Guoliang Li
5 citations
Abstract
Large language models (LLMs) are powering a new wave of language-based applications, including database applications, leading to new techniques and systems for dealing with the enormous compute and memory needs of LLMs, coupled with advances in computing hardware. In this tutorial, we review how these techniques lower inference costs by managing uncertain request lifecycles, exploiting specialized hardware, and scaling over distributed inference devices and machines. We present these techniques from the database perspective of request processing, model execution and optimization, and memory management. Following these discussion, we review how inference systems combine these techniques in diverse architectures to achieve application or performance objectives.