KDD2024

Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy

Yao Zhao, Zhitian Xie, Chen Liang, Chenyi Zhuang, Jinjie Gu

4 citations

Abstract

As Large Language Models (LLMs) have made significant advancements across various tasks, such as question answering, translation, text summarization, and dialogue systems, the need for accuracy in information becomes crucial, especially for serious financial products serving billions of users like Alipay. However, for a real-world product serving millions of users, the inference speed of LLMs becomes a critical factor compared to a mere experimental model.