ICLR2025

Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference

Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Moshe Wasserblat, Tomer Galanti, Michal Gordon-Kiwkowitz, David Harel

摘要

This paper introduces distributed speculative inference (DSI), a novel inference algorithm that is provably faster than speculative inference (SI) (