ACL2024

Digital Socrates: Evaluating LLMs through Explanation Critiques

Yuling Gu, Oyvind Tafjord, Peter Clark

被引用 1 次

摘要

While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood. In response, our goal is to define a detailed way of characterizing the explanation capabilities of modern models and to create a nuanced, interpretable explanation evaluation tool that can generate such characterizations automatically, without relying on expensive API calls or human annotations. Our approach is to (a) define the new task of explanation critiquing -identifying and categorizing any main flaw in an explanation and providing suggestions to address the flaw, (b) create a sizeable, humanverified dataset for this task, and (c) train an open-source, automatic critique model (called Digital Socrates) using this data. Through quantitative and qualitative analysis, we demonstrate how Digital Socrates is useful for revealing insights about student models by examining their reasoning chains, and how it can provide highquality, nuanced, automatic evaluation of those model explanations for the first time. Digital Socrates thus fills an important gap in evaluation tools for understanding and improving the explanation behavior of models. Question: Fill in the blank: The tip of James pencil was breaking while writing on the paper sheet. The ___ is weak. (A) pencil (B) paper Explanation (Llama2-70B): The word "tip" refers to the pointed end of a pencil. The phrase "breaking while writing on the paper" implies that the pencil was being used to write on the paper. Therefore, the correct answer is (B) paper, as the sentence implies that the paper is weak. Answer (Llama2-70B): (B) [incorrect, should be (A)] Critique Critique DS-13B The explanation states or suggests the following: * Main flaw (standalone statement): "Therefore, the correct answer is (B) paper, as the sentence implies that the paper is weak." * Dimension: incorrect_reasoning Consider these points for revising the explanation: * General: Be careful not to misinterpret the cause and effect in a sentence. Just because the pencil tip is breaking while writing on the paper, it doesn't necessarily mean that the paper is weak. It could be that the pencil is weak, which is why its tip is breaking. * Specific: In this case, the sentence "The tip of James pencil was breaking while writing on the paper sheet" suggests that the pencil's tip is weak, not the paper. The paper is simply the surface on which the pencil is being used. Explanation score: 1 The explanation states or suggests the following: * Main flaw (standalone statement): "The correct answer is (B) paper, as the sentence implies that the paper is weak." * Dimension: misunderstanding Consider these points for revising the explanation: * General: Always ensure you understand the question and the context before making a conclusion. Misinterpretation can lead to incorrect answers. * Specific: The sentence "The tip of James pencil was breaking while writing on the paper" implies that the pencil is weak, not the paper. The pencil is the one that is breaking, which suggests it is weak. Explanation score: 1 Question: Identify best explanation connecting the beginning and ending. Beginning: Karen didn't have a boyfriend for Valentine's Day. Ending: Her brother's thoughtfulness made her Valentine's Day special! (A) Karen decided to buy her brother a Valentine's gift. (B) Karens little brother handed her a box of chocolates. Explanation (GPT-3.5): 1) The beginning states that Karen didn't have a boyfriend for Valentine's Day. 2) The ending mentions her brother's thoughtfulness made her Valentine's Day special. 3) The connection between the beginning and the ending should involve an action by Karen's brother that made her Valentine's Day special. 4) Giving a box of chocolates is a thoughtful gesture typically associated with Valentine's Day.