ACL2021

An Investigation of Suitability of Pre-Trained Language Models for Dialogue Generation - Avoiding Discrepancies

Yan Zeng, Jian-Yun Nie

摘要

Pre-trained language models have been widely used in response generation for open-domain dialogue. These approaches are built within 4 frameworks: Transformer-ED, Transformer-Dec, Transformer-MLM and Transformer-AR. In this study, we experimentally compare them using both large and small-scale data. This reveals that decoder-only architecture is better than stacked encoder-decoder, and both leftto-right and bi-directional attention have their own advantages. We further define two concepts of model discrepancy, which provides a new explanation to the model performance. As discrepancies may hinder performance, we propose two solutions to reduce them, which successfully improve the model performance.