EMNLP2022

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way

Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman

被引用 19 次

摘要

Summarization datasets are often assembled either by scraping naturally occurring publicdomain summaries-which are nearly always in difcult-to-work-with technical domainsor by using approximate heuristics to extract them from everyday text-which frequently yields unfaithful summaries. In this work, we turn to a slower but more straightforward approach to developing summarization benchmark data: We hire highly-qualied contractors to read stories and write original summaries from scratch. To amortize reading time, we collect ve summaries per document, with the rst giving an overview and the subsequent four addressing specic questions. We use this protocol to collect SQuAL-ITY, a dataset of question-focused summaries built on the same public-domain short stories as the multiple-choice dataset QuALITY (Pang et al., 2021b). Experiments with stateof-the-art summarization systems show that our dataset is challenging and that existing automatic evaluation metrics are weak indicators of quality. SQuALITY is available at https: //github.com/nyu-mll/SQuALITY .