EMNLP2022

Language Model Pre-Training with Sparse Latent Typing

Liliang Ren, Zixuan Zhang, Han Wang, Clare R. Voss, ChengXiang Zhai, Heng Ji

被引用 1 次

摘要

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at https://github.com/renll/ SparseLT . * Equal contribution. Listing order is random. Liliang proposed and implemented the architecture designs and the training objectives of Sparse Latent Typing (SLT), and he also conducted extensive experiments for pre-training, few-shot evaluation and the analyses. Zixuan designed the language model pre-training pipeline for SLT, built the initial training codebase and conducted experiments for pre-training and supervised evaluation. Both of the authors initially came up with the same project goal of encouraging the model to sparsely select sentence-level key words during pre-training.