KDD2025

Large Language Models: Architecture and Training. From Next-Word Prediction to Reasoning

Jay Alammar

Abstract

This talk is a highly visual and accessible look at large language models, their architecture, and their training. Attendees will be presented with the intuitions for tens of LLM concepts like tokenizers, the internals of the latest Transformer neural networks, mixture-of-expert models, reward models, reasoning LLMs, model merging, and more.