EMNLP2022

Keyphrase Generation via Soft and Hard Semantic Corrections

Guangzhen Zhao, Guoshun Yin, Peng Yang, Yu Yao

被引用 3 次

摘要

Keyphrase generation aims to generate a set of condensed phrases given a source document. Although maximum likelihood estimation (MLE) based keyphrase generation methods have shown impressive performance, they suffer from the bias on the source-prediction pair and the bias on the prediction-target pair. To tackle the above biases, we propose a novel correction model CorrKG on top of the MLE pipeline, where the biases are corrected via the optimal transport (OT) and a frequency-based filtering-and-sorting (FreqFS) strategy. Specifically, OT is introduced as the soft correction to facilitate the alignment of salient information and rectify the semantic bias on the source document and predicted keyphrases pair. An adaptive semantic mass learning scheme is conducted on the vanilla OT to achieve a proper pair-wise optimal transport procedure, which promotes the OT calculation brought by rectifying semantic masses dynamically. Besides, the FreqFS strategy is designed as the hard correction to reduce the bias of predicted and target keyphrases, and thus generate accurate and sufficient keyphrases. Extensive experiments over multiple benchmark datasets show that our model achieves superior keyphrase generation as compared with the state-of-the-arts. Introduction Keyphrase generation is an important and meaningful task that converts the main semantic information of the document into multiple keyphrases. Keyphrases can further be divided into present keyphrases and absent keyphrases, with the former appearing in the document whereas the latter do not. High-quality keyphrases are beneficial for many downstream tasks, such as text summarization (Wang and Cardie, 2013), document clustering (Hammouda et al., 2005) , translation (Tang et al., 2016) , and so forth. Despite the promising suc-* The first two authors contribute equally to this work. ‡ Corresponding author. Source Document Learning Weights for the Quasi Weighted Means. We study the determination of weights for quasi weighted means (also called quasi linear means) when a set of examples is given. We consider first a simple case, the learning of weights for weighted means, and then we extend the approach to the more general case of a quasi weighted mean. We consider the case of a known arbitrary generator f. The paper finishes considering the use of parametric functions that are suitable when the values to aggregate are measure values or ratio.