VLDB2025

Database Perspective on LLM Inference Systems

James Pan, Guoliang Li

被引用 5 次

摘要

Large language models (LLMs) are powering a new wave of language-based applications, including database applications, leading to new techniques and systems for dealing with the enormous compute and memory needs of LLMs, coupled with advances in computing hardware. In this tutorial, we review how these techniques lower inference costs by managing uncertain request lifecycles, exploiting specialized hardware, and scaling over distributed inference devices and machines. We present these techniques from the database perspective of request processing, model execution and optimization, and memory management. Following these discussion, we review how inference systems combine these techniques in diverse architectures to achieve application or performance objectives.