EMNLP2023
CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Low Resource With Contrastive Learning
Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Hang Pu, Yu Lan, Chao Shen
16 citations
Abstract
Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently. The latest proposed detectors usually take coarse text sequences as input and finetune pre-trained models with standard crossentropy loss. However, these methods fail to consider the linguistic structure of texts. Moreover, they lack the ability to handle the lowresource problem, which could often happen in practice considering the enormous amount of textual data online. In this paper, we present a coherence-based contrastive learning model named COCO to detect the possible MGT under the low-resource scenario. To exploit the linguistic feature, we encode coherence information in the form of graph into the text representation. To tackle the challenges of low data resources, we employ a contrastive learning framework and propose an improved contrastive loss for preventing performance degradation brought by simple samples. The experiment results on two public datasets and two self-constructed datasets prove our approach outperforms the state-of-the-art methods significantly. Also, we surprisingly find that MGTs originated from up-to-date language models could be easier to detect than these from previous models, in our experiments. And we propose some preliminary explanations for this counter-intuitive phenomena. All the codes and datasets are open-sourced. 1