ACL2021

Is Human Scoring the Best Criteria for Summary Evaluation?

Oleg V. Vasilyev, John Bohannon

Abstract

A summary quality measure is judged by how well it correlates with quality scores produced by human annotators. A higher correlation with human scores is considered to be a decisive indicator of a better measure. In this work we present observations that cast doubt on this view. We also show a possibility of an alternative indicator for selecting the best measure from a family of measures, a criterion that does not rely on human scores.