EMNLP2023

IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

Wenhao Yu, Meng Jiang, Peter Clark, Ashish Sabharwal

6 citations

Abstract

Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of largescale counterfactual open-domain questionanswering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we introduce the first such dataset, named IfQA, where each question is based on a counterfactual presupposition via an "if" clause. Such questions require models to go beyond retrieving direct factual knowledge from the Web: they must identify the right information to retrieve and reason about an imagined situation that may even go against the facts built into their parameters. The IfQA dataset contains 3,800 questions that were annotated by crowdworkers on relevant Wikipedia passages. Empirical analysis reveals that the IfQA dataset is highly challenging for existing open-domain QA methods, including supervised retrieve-then-read pipeline methods (F1 score 44.5), as well as recent few-shot approaches such as chain-of-thought prompting with ChatGPT (F1 score 57.2). We hope the unique challenges posed by IfQA will push open-domain QA research on both retrieval and reasoning fronts, while also helping endow counterfactual reasoning abilities to today's language understanding models. The IfQA dataset can be found and downloaded at https://allenai.org/data/ifqa .