ACL2021

DocOIE: A Document-level Context-Aware Dataset for OpenIE

Kuicai Dong, Yilin Zhao, Aixin Sun, Jung-Jae Kim, Xiaoli Li

Abstract

Open Information Extraction (OpenIE) aims to extract structured relational tuples (subject, relation, object) from sentences, and plays a critical role in many NLP applications. Existing solutions perform extraction at sentence level, without referring to any additional contextual information. In reality, however, a sentence typically exists as part of a document rather than standalone; we often need to access relevant contextual information around the sentence before we can accurately interpret it. As there is no document-level context-aware Ope-nIE dataset available, we manually annotate 800 sentences from 80 documents in two domains (Healthcare and Transportation) to form a DocOIE dataset for evaluation. In addition, we propose DocIE, a document-level contextaware OpenIE model. Our experimental results demonstrate that incorporating documentlevel context is helpful in improving OpenIE performance. Both the DocOIE dataset and DocIE model are available online. 1