EMNLP2025

Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time

Jiazheng Li, Yuxiang Zhou, Junru Lu, Gladys Tyen, Lin Gui, Cesare Aloisi, Yulan He

摘要

Although preference optimization methods have improved reasoning performance in Large Language Models (LLMs), they often lack transparency regarding why one reasoning outcome is preferred over another. This limitation is especially critical in Automated Student Answer Scoring (ASAS), where explainability is essential to justify assessment outcomes. Verbal reinforcement learning offers the potential to generate explicit reflection, but it tends to produce superficial critiques that can harm assessment performance. Existing LLMs also struggle to reliably detect subtle reasoning errors in ASAS tasks. Moreover, manually identifying intermediate reasoning errors is expensive and difficult to scale. To address these challenges, we introduce a contrastive reflection synthesis pipeline that generates precise verbal feedback by identifying discrepancies in structure reasoning graph paths. Leveraging these synthetic reflection data, we propose DARS, a Dual-model Reflective Scoring framework featuring a dedicated Critic model trained for effective reflection. DARS achieves strong performance and consistently outperforms existing ASAS baselines across all evaluation metrics. Extensive experiments further provide novel insights into the value of reflection data, framework design, and the scaling behavior of DARS. 1 * Now at Google DeepMind. 1 We release the DARS code at https://github.com/ lijiazheng99/DARS . (b) DARS Framework produce more effective reflections. ASAS Model The assessment overcredits the student's response, failing to recognize the distinction critical experimental details. Student Answer: the mRNA, Is met by tRNA with anticodons, that link to the mRNA's codons. This tRNA carries an amino acid that goes with the mRNA's codon. This is done multiple times until there is a stranhd of amino acids. Ribosomes attach these amino acids together. Reasoner: The student's answer scored 3 points according to the marking rubric, which awards 3 points for addressing four key elements. The student successfully described mRNA exiting the nucleus, tRNA carrying complementary anticodons recognizing the mRNA codons, the corresponding amino acids on tRNA being bonded, and the continuation of amino acid linkage until a stop codon is reached, as indicated by '...tRNA with anticodons, that link to the mRNA's codons.' and '...until there is a strand of amino acids. ... Critic: It appears there was a misunderstanding in your assessment of the student's answer. You credited the student for mentioning that the'corresponding amino acids on tRNA are bonded to adjacent tRNA's amino acids' and that 'amino acids continue to be linked until a STOP codon is read on the mRNA.' However, upon reviewing the student's response, these elements were not explicitly covered. The student's description primarily focused on the interaction between mRNA and tRNA and the eventual formation of a peptide bond, but did not detail the process of amino acids being bonded to adjacent tRNA's amino acids nor the specific mechanism of elongation ... Reasoner: The student's answer scored 2 points for achieving three key answer elements. The student accurately described the roles of mRNA and tRNA in protein synthesis, mentioning that 'the mRNA, Is met by tRNA with anticodons, that link to the mRNA's codons' which covers the key elements of mRNA bases being read in triplets and tRNA carrying the complementary anticodon recognizing the complementary codon of the mRNA. Additionally, the student implied the bonding of amino acids by stating 'Ribosomes attach these amino acids together.' However, the answer did not explicitly mention the exit of mRNA from the nucleus, or the elongation and termination at a STOP codon which resulted in losing points for these key elements.