KDD2025

Practical Guidance and Tutorial on Incentivizing Reasoning in LLMs using Distillation and Reinforcement Learning

Zhaopeng Qiu, Jingqi Zhang, Shuang Yu, Shuai Zhang, Junjie Lai

摘要

With reasoning models like DeepSeek-R1 and OpenAI's o1 demonstrating breakthrough capabilities in complex problem-solving, there is growing interest in the AI community about how to unlock similar capabilities in other large language models (LLMs). This hands-on tutorial dives into practical methods for building reasoning capabilities in LLMs through two primary approaches: knowledge distillation from advanced reasoning models and post-training with reinforcement learning techniques. Participants will learn how to transfer reasoning capabilities from cutting-edge models like DeepSeek-R1 into smaller LLMs such as Qwen and Llama, and then explore how reinforcement learning can take these capabilities even further. Through interactive Jupyter notebook, the participants will exercise through the entire process. By the end of this session, participants will be equipped with practical knowledge in how to incentivize reasoning capabilities into LLMs, understand how to use various frameworks for this task, and leave with hands-on experience that can be applied to their own projects. The related materials are available at https://zpqiu.github.io/reasoning-model-tutorial-kdd2025.