EMNLP2025
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
Yuheng Wu, Jianwen Xie, Denghui Zhang, Zhaozhuo Xu
1 citation
Abstract
Theory-of-Mind (ToM) tasks pose a unique challenge for large language models (LLMs), which often lack the capability for dynamic logical reasoning. In this work, we propose DEL-ToM, a framework that improves verifiable ToM reasoning through inference-time scaling rather than architectural changes. Our approach decomposes ToM tasks into a sequence of belief updates grounded in Dynamic Epistemic Logic (DEL), enabling structured and verifiable dynamic logical reasoning. We use data generated automatically via a DEL simulator to train a verifier, which we call the Process Belief Model (PBM), to score each belief update step. During inference, the PBM evaluates candidate belief traces from the LLM and selects the highest-scoring one. This allows LLMs to allocate extra inference-time compute to yield more transparent reasoning. Experiments across model scales and benchmarks show that DEL-ToM consistently improves performance, demonstrating that verifiable belief supervision significantly enhances LLMs' ToM capabilities without retraining. Code is available at https://github.com/joel-wu/DEL-ToM . Let's try a few times, analyze belief at state 2! Belief Candidates LLM under Evaluation Drawer Table Null Fridge Mary thinks John thinks the chocolate is at... 0.99 0.41 0.62 0.01 Reward Score PBM Judge Drawer Belief Update State 1 Belief State Update State 2 State 3 State 4 Null 1 John, Mary and Alice entered the kitchen. State 1 2 John put the chocolate in the drawer. State 2 3 John exited the kitchen. State 3 4 Mary moved the chocolate to the table. State 4 Question: Where does Mary think John thinks the chocolate is?