ACL2023

Boosting Text Augmentation via Hybrid Instance Filtering Framework

Heng Yang, Ke Li

5 citations

Abstract

Text augmentation is an effective technique for addressing the problem of insufficient data in natural language processing. However, existing text augmentation methods tend to focus on few-shot scenarios and usually perform poorly on large public datasets. Our research indicates that existing augmentation methods often generate instances with shifted feature spaces, which leads to a drop in performance on the augmented data (for example, EDA generally loses ≈ 2% in aspect-based sentiment classification). To address this problem, we propose a hybrid instance-filtering framework (BO O S TAU G) based on pre-trained language models that can maintain a similar feature space with natural datasets. BO O S TAU G is transferable to existing text augmentation methods (such as synonym substitution and back translation) and significantly improves the augmentation performance by ≈ 2 -3% in classification accuracy. Our experimental results on three classification tasks and nine public datasets show that BO O S TAU G addresses the performance drop problem and outperforms state-ofthe-art text augmentation methods. Additionally, we release the code to help improve existing augmentation methods on large datasets.