ICLR2025

DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models

Ruibing Song, Chuan Liu, Chunshu Wu, Ang Li, Dongfang Liu, Ying Nian Wu, Tong Geng

摘要

Traditional techniques fail to address the fundamental bottleneck. • Emerging techniques like Quantum computing, optical computing, and computing-inmemory are promising but facing significant technical barriers. • Is it feasible to rely on mature CMOS-based technology to accelerate LLM training from 10 million hours to 10,000 hours while reducing energy consumption from 20 TJ to 200 MJ? -> Yes! By training LLMs on DS-machines! Motivation Inference Acceleration: Mapping existing LLM onto DS-machines