KDD2025
Large Language Models: Architecture and Training. From Next-Word Prediction to Reasoning
Jay Alammar
Abstract
This talk is a highly visual and accessible look at large language models, their architecture, and their training. Attendees will be presented with the intuitions for tens of LLM concepts like tokenizers, the internals of the latest Transformer neural networks, mixture-of-expert models, reward models, reasoning LLMs, model merging, and more.