ACL2022

Rewarding Semantic Similarity under Optimized Alignments for AMR-to-Text Generation

Lisa Jin, Daniel Gildea

Abstract

A common way to combat exposure bias is by applying scores from evaluation metrics as rewards in reinforcement learning (RL). Metrics leveraging contextualized embeddings appear more flexible than those that match n-grams and thus ideal as training rewards. Yet metrics such as BERTSCORE greedily align candidate and reference tokens, which can give system outputs excess credit relative to a reference. Past systems using such semantic similarity rewards further suffer from repetitive outputs and overfitting. To address these issues, we propose metrics that replace the greedy alignments in BERTSCORE with optimized ones. Our model optimizing discrete alignment metrics consistently outperforms cross-entropy and BLEU reward baselines on AMR-to-text generation. Additionally, we find that this model enjoys stable training relative to a non-RL setting.