ACL2021
The Utility and Interplay of Gazetteers and Entity Segmentation for Named Entity Recognition in English
Oshin Agarwal, Ani Nenkova
Abstract
Recent papers have introduced methods to incorporate gazetteer features and entity segmentation techniques in neural named entity recognition models. These papers rely on different resources and include features not related to the use of gazetteers, rendering impossible the comparison of the relative effectiveness of the approaches. Here, we provide a comprehensive overview of methods for incorporating gazetteers and for entity segmentation. We evaluate representative methods from each in similar settings for a fair comparison and identify the ones that are consistently better across datasets and input representations. We further show that gazetteers improve entity segmentation and not just entity typing. Hence, we explore their utility in recognizing long entities, a problem for which entity segmentation techniques were developed. Our work explains the mechanisms via which gazetteers improve the performance of neural NER models. C Hyperparameters Hyperparameters used as same as the ones in https://github.com/guillaumegenthial/tf_ ner/models/chars_conv_lstm_crf , except for number of epochs and the minimum number of steps before early stopping which depend on dataset size. Numbers of epochs used were 25, 25, 50 and 50 for CoNLL, Ontonotes, BTC and TTC respectively. Minimum steps were 8000, 15000, 2500 and 2500 for CoNLL, Ontonotes, BTC and TTC respectively.