EMNLP2025

SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool Calling

Fares Fawzi, Vinitra Swamy, Dominik Glandorf, Tanya Nazaretsky, Tanja Käser

Abstract

Language models can be used to provide interactive, personalized student feedback in educational settings.However, real-world deployment faces three key challenges: privacy concerns, limited computational resources, and the need for pedagogically valid responses.These constraints require small, open-source models that can run locally and reliably ground their outputs in correct information.We introduce SCRIBE, a framework for multi-hop, tool-augmented reasoning designed to generate valid responses to student questions about feedback reports.SCRIBE combines domainspecific tools with a self-reflective inference pipeline that supports iterative reasoning, tool use, and error recovery.We distil these capabilities into 3B and 8B models via two-stage LoRA fine-tuning on synthetic GPT-4o-generated data.Evaluation with a human-aligned GPT-Judge and a user study with 108 students shows that 8B-SCRIBE models achieve comparable or superior quality to much larger models in key dimensions such as relevance and actionability, while being perceived on par with GPT-4o and Llama-3.3 70B by students.These findings demonstrate the viability of SCRIBE for low-resource, privacy-sensitive educational applications.How can I improve my performance to pass the course?Useful