KDD2022

Robust and Informative Text Augmentation (RITA) via Constrained Worst-Case Transformations for Low-Resource Named Entity Recognition

Hyunwoo Sohn, Baekkwan Park

3 citations

Abstract

Recent advances in deep learning have brought remarkable performance improvements in named entity recognition (NER), specifically in token-level classification problems. However, deep learning models often require a large amount of annotated data to achieve satisfactory performance, and NER annotation is significantly time-consuming and labor-intensive due to the fine-grained labels. To address this issue, we propose a textual data augmentation method that can automatically generate informative synthetic samples, which contribute to the development of a robust classifier. The proposed method generates additional training data by estimating the optimal level of worst-case transformation of training data while preserving the original annotation, and includes them into training to construct a robust decision boundary. Extensive experiments conducted on two benchmark datasets in a low-resource environment reveal that the proposed method outperforms two baseline augmentation methods including human annotation, which is typically considered to provide a decent amount of performance boost. To elucidate the processes, we also present in-depth analyses of the generated samples and estimated model parameters.