ACL2021
Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution
Jiacheng Xu, Greg Durrett
摘要
Despite the prominence of neural abstractive summarization models, we know little about how they actually form summaries and how to understand where their decisions come from. We propose a two-step method to interpret summarization model decisions. We first analyze the model's behavior by ablating the full model to categorize each decoder decision into one of several generation modes: roughly, is the model behaving like a language model, is it relying heavily on the input, or is it somewhere in between? After isolating decisions that do depend on the input, we explore interpreting these decisions using several different attribution methods. We compare these techniques based on their ability to select content and reconstruct the model's predicted token from perturbations of the input, thus revealing whether highlighted attributions are truly important for the generation of the next token. While this machinery can be broadly useful even beyond summarization, we specifically demonstrate its capability to identify phrases the summarization model has memorized and determine where in the training pipeline this memorization happened, as well as study complex generation phenomena like sentence fusion on a per-instance basis. Conclusion: These doc tokens impacted prediction the most (according to int. grad.) Conclusion: Higher difference means higher dependence on context Ablation mayoral Cameron 0.01 0.01 0.56 0.99 Diff between LM and full model for Khan 0.96 0.99 0.99 0.99 Input Article Speaking at a rally for Tory candidate Zac Goldsmith, the prime minister warned about the dangers of a Labour victory for the capital's economy. Mr Goldsmith said his Labour rival was "Mr Corbyn's man" in City Hall. But Mr Khan said he was "no patsy" to Mr Corbyn and […] Predicted Summary David Cameron has urged Londoners to vote for the Conservatives in the mayoral election, saying Labour's Sadiq Khan is "Jeremy Corbyn's man". BART Compare decoder-only LM ( ) with full model ( ) Attribution When context matters, use attribution to find the content supporting the decision LM-like Contextual … the prime minister warned … David Cameron Encoder Decoder next word BART Figure 1: Our two-stage ablation-attribution framework. First, we compare a decoder-only language model (not fine-tuned on summarization task, and not conditioned on the input article) and a full summarization model. They are colored in gray and orange respectively. the The higher the difference, the more heavily model depends on the input context. For those context-dependent decisions, we conduct content attribution to find the relevant supporting content with methods like Integrated Gradient or Occlusion.