EMNLP2025

Can LLMs simulate the same correct solutions to free-response math problems as real students?

Yuya Asano, Diane J. Litman, Erin Walker

Abstract

Large language models (LLMs) have emerged as powerful tools for developing educational systems. While previous studies have explored modeling student mistakes, a critical gap remains in understanding whether LLMs can generate correct solutions that represent student responses to free-response problems. We compare the distribution of solutions from four LLMs (one proprietary, two open-sourced general, and one open-sourced math models) with various sampling and prompting techniques and those from students teaching math problems to a conversational robot. Our study reveals discrepancies between the correct solutions produced by LLMs and by students. We discuss the practical implications of these findings for the design and evaluation of LLMsupported educational systems.