ACL2024
Plausible Extractive Rationalization through Semi-Supervised Entailment Signal
Wei Jie Yeo, Ranjan Satapathy, Erik Cambria
Abstract
The increasing use of complex and opaque black box models requires the adoption of interpretable measures, one such option is extractive rationalizing models, which serve as a more interpretable alternative. These models, also known as Explain-Then-Predict models, employ an explainer model to extract rationales and subsequently condition the predictor with the extracted information. Their primary objective is to provide precise and faithful explanations, represented by the extracted rationales. In this paper, we take a semi-supervised approach to optimize for the plausibility of extracted rationales. We adopt a pre-trained natural language inference (NLI) model and further fine-tune it on a small set of supervised data (10%). The NLI predictor is leveraged as a source of supervisory signals to the explainer via entailment alignment. We show that, by enforcing the alignment agreement between the explanation and answer in a question-answering task, the performance can be improved without access to ground truth labels. We evaluate our approach on the ERASER dataset and show that our approach achieves comparable results with supervised extractive models and outperforms unsupervised approaches by > 100%. 1 Introduction Large language models such as Google's BERT (Devlin et al., 2018) and OpenAI's GPT series (Brown et al., 2020) are gaining widespread adoption in natural language processing (NLP) tasks. These models achieved impressive performance in multiple NLP tasks ranging from solving text generation to information extraction (Liu et al., 2023). However, little is known regarding how answers are generated or which portion of the input text the model focuses on. These flaws highlight concerns surrounding trust and fear of undesirable biases in the model's reasoning chain. Explainable AI (XAI) is currently an active field of 042 research aimed at addressing these issues (Adadi 043 and Berrada, 2018; Cambria et al., 2023; Yeo 044 et al., 2023). In this work, we focus on extractive 045 rationalizing models (Lei et al., 2016), which are 046 also known as Explain-Then-Predict (ETP) models, 047 and are designed towards producing highlights 048 serving as faithful explanations. Faithfulness is 049 defined as serving an explanation that represents 050 the model's reasoning process for a given decision, 051 while plausibility refers to the level of agreement 052 with humans (Jacovi and Goldberg, 2020). An 053 advantageous characteristic of ETP models is that 054 they concurrently produce the explanation and the 055 task label, eliminating the necessity for an added 056 layer of interpretation. This differs from post-hoc techniques such as 058 LIME (Ribeiro et al., 2016) or SHAP (Lundberg 059 and Lee, 2017), specifically tailored to interpret 060 black-box models. Although these techniques are 061 model-agnostic by design, they are computation-062 ally expensive and do not guarantee faithfulness 063 nor optimized for plausibility. Chain-of-thought 064 (CoT) (Wei et al., 2022) is another popular ap-065 118 NLI models are designed to determine whether 119 a hypothesis contradicts, entails, or is neutral to a 120 given premise. As such, they provide useful signals 121 to align a given explanation to the answer produced 122 by the predictor. An example shown in Figure 1, 123 in a fact verification example, the purpose of the 124 rationale is to act as evidence to either support or 125 refute the given claim. This can be interpreted 126 alternatively as an NLI task where the claim acts 127 as the premise while the rationale is the hypothesis, 128 in this case entailing the premise. We further note 129 that this simple principle not only addresses the 130 scenario of scarce supervisory labels but can also 131 act as a counter-checker against the predictor. As 132 seen later on, this can have some desirable effects 133 on the robustness of rationales (Chen et al., 2022) 134 and enhanced predictive performance. In summary, 135 the three key contributions of this work are the 136 following: 137 • A simple yet effective approach that improves 138 the plausibility and robustness of extracted ra-139 tionales, while simultaneously improving task 140 performance. The approach achieves compet-141 itive results against supervised models while 142 outperforming unsupervised models by a large 143 margin (>100%). 144 • To the best of our knowledge, this is the first 145 work to utilize an auxiliary NLI predictor in 146 a semi-supervised fashion for extractive ratio-147 nalization. 148 • Our approach has low resource requirements, 149 using models of <300M parameters, and a 150 small set of human-annotated rationales. 151 2 Methodology 152 2.1 Problem setting 153 Given an input document consisting of N sen-154 tences, x i = x i,1 , x i,2 , ..., x i,N . The task ob-155 jective can be decomposed into two steps, namely 156 rationale extraction, and task prediction. An ex-157 plainer, f θ takes in the input document and gener-158 ates a bin