ACL2021

SumPubMed: Summarization Dataset of PubMed Scientific Articles

Vivek Gupta, Prerna Bharti, Pegah Nokhiz, Harish Karnick

Abstract

Text summarization is one of the more challenging and open problems of Natural Language Processing. Most earlier work in this eld has been carried out on news article datasets. However, news datasets are not suitable for summarization because the summary is usually placed in the rst few lines of the text. Thus, we constructed a new dataset, SPM , using scientic articles from the PubMed archive. The summaries in SPM dataset are from the omnipresent information in the document (not merely the rst few lines). The summary also contains scientic jargon making the summarization task more challenging. To verify the quality of summaries in SPM, we conducted human annotation on several aspects, such as coverage of important content without repetition, readability, coherence, and informativeness. We observed that the existing B4@2B4@-based summarization methods struggle to perform well on SPM, opening opportunities for further improvement in scientic summarization models. Furthermore, we observed that current summarization methods on news-based datasets yield acceptable results with ROUGE only because of the simplicity of summary placement and ROUGE's lexical matching-based evaluation. In contrast to the earlier news datasets, we found that ROUGE's scores do not correlate well with our human judgments on SPM. Therefore, indicating the need for new evaluation metrics for scientic summarization.