ACL2024

Pushing the Limits of Low-Resource NER Using LLM Artificial Data Generation

Joan Santoso, Patrick Sutanto, Billy Cahyadi, Esther Irawati Setiawan

Abstract

Named Entity Recognition (NER) is an important task, but to achieve great performance, it is usually necessary to collect a large amount of labeled data, incurring high costs. In this paper, we propose using open-source Large Language Models (LLM) to generate NER data with only a few labeled examples, minimizing the need for extensive human-annotated data. Our proposed method is simple and can perform well using only a few labeled data points. Experimental results on diverse low-resource NER datasets show that our proposed data generation method can significantly improve the baseline. Additionally, our method can be used to augment datasets with class-imbalance problems and consistently improves model performance on macro-F1 metrics.