EMNLP2021

Moving on from OntoNotes: Coreference Resolution Model Transfer

Patrick Xia, Benjamin Van Durme

23 citations

Abstract

Academic neural models for coreference resolution (coref) are typically trained on a single dataset, OntoNotes, and model improvements are benchmarked on that same dataset. However, real-world applications of coref depend on the annotation guidelines and the domain of the target dataset, which often differ from those of OntoNotes. We aim to quantify transferability of coref models based on the number of annotated documents available in the target dataset. We examine eleven target datasets and find that continued training is consistently effective and especially beneficial when there are few target documents. We establish new benchmarks across several datasets, including state-of-the-art results on PreCo. Dataset Example Comments OntoNotes (general) Judging from the Americana in [[Haruki Murakami's]1 "A Wild Sheep Chase" [Kodansha]2, 320 pages, 18.95]3,babyboomersonbothsidesofthePacifichavealotincommon.Onlycoreferringmentionsaremarked(nosingletons).ARRAU(news)Judgingfrom[theAmericanain[[HarukiMurakamis]1"AWildSheepChase"[[Kodansha]2,[320pages]3,[18.95]3, baby boomers on both sides of the Pacific have a lot in common. Only coreferring mentions are marked (no singletons). ARRAU (news) Judging from [the Americana in [[Haruki Murakami's]1 "A Wild Sheep Chase" [[Kodansha]2, [320 pages]3, [18.95]4]5]6]7, [baby boomers on [both sides of [the Pacific]8]9]10 have [a lot in [common]11]12. All mentions are marked, even if they are singletons.