ACL2025
Exploring Layer-wise Representations of English and Chinese Homonymy in Pre-trained Language Models
Matthew King-Hang Ma, Chenwei Xie, Wenbo Wang, William Shi-Yuan Wang
2 citations
Abstract
Pre-trained language model representations have been successful in a wide range of language understanding tasks. In this paper, we examine different strategies to integrate pretrained representations into sequence to sequence models and apply it to neural machine translation and abstractive summarization. We find that pre-trained representations are most effective when added to the encoder network which slows inference by only 14%. Our experiments in machine translation show gains of up to 5.3 BLEU in a simulated resource-poor setup. While returns diminish with more labeled data, we still observe improvements when millions of sentence-pairs are available. Finally, on abstractive summarization we achieve a new state of the art on the full text version of CNN-DailyMail. 1