ACL2024
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification
Soumya Sanyal, Tianyi Xiao, Jiacheng Liu, Wenya Wang, Xiang Ren
1 citation
Abstract
Making inferences in text comprehension to understand the meaning is essential in language processing. This work studies the entailment verification (EV) problem of multi-sentence premises that requires a system to make multiple inferences implicitly. Studying EV for such complex premises is important because modern NLP problems, such as detecting inconsistent model-generated rationales, require complex multi-hop reasoning. However, current textual inference datasets mostly contain short premises that only partially focus on these challenges. To address this, we compile an EV benchmark that includes datasets from three NLP domains (NLI, contextual QA, and rationales) containing multi-sentence premises. On benchmarking humans and LLMs, we find that LLMs are better than humans in multi-hop reasoning across extended contexts, while humans perform better in simple deductive reasoning tasks. We also finetune a Flan-T5 model 1 for EV using two training objectives to obtain a strong open-source model that outperforms GPT-3.5 and rivals GPT-4. Finally, we use this model to filter out inconsistent modelgenerated rationales in self-consistency decoding, resulting in a 6% accuracy improvement on average across three MCQ datasets. 1 https://huggingface.co/soumyasanyal/ entailment-verifier-xxl Premise: Exposure to sea air can cause scurvy. Scurvy is a kind of disease. Hypothesis: This suggests that scurvy is a disease caused by exposure to sea air. Task: Given the premise, is the hypothesis correct? Simple Deductive Inference Complex Deductive Inference Premise: Joe is a 2013 independent drama film directed and co-produced by David Gordon Green, co-produced by Lisa Muskat, Derrick Tseng and Christopher Woodrow and written by Gary Hawkins, adaptation from Larry Brown's 1991 novel of the same name. Hypothesis: Joe was a book before it was a film.