ACL2024

A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts

Nafis Irtiza Tripto, Saranya Venkatraman, Dominik Macko, Róbert Móro, Ivan Srba, Adaku Uchendu, Thai Le, Dongwon Lee

摘要

In the realm of text manipulation and linguistic transformation, the question of authorship has always been a subject of fascination and philosophical inquiry. Much like the Ship of Theseus paradox, which ponders whether a ship remains the same when each of its original planks is replaced, our research delves into an intriguing question: Does a text retain its original authorship when it undergoes numerous paraphrasing iterations? Specifically, since Large Language Models (LLMs) have demonstrated remarkable proficiency in both the generation of original content and the modification of human-authored texts, a pivotal question emerges concerning the determination of authorship in instances where LLMs or similar paraphrasing tools are employed to rephrase the text -whether authorship should be attributed to the original human author or the AI-powered tool. Therefore, we embark on a philosophical voyage through the seas of language and authorship to unravel this intricate puzzle. Using a computational approach, we discover that the diminishing performance in text classification models with each successive paraphrasing iteration is closely associated with the extent of deviation from the original author's style, thus provoking a reconsideration of the current notion of authorship. 043 as a replacement of linguistic "planks." We aim 044 to explore whether, like the Ship of Theseus, the 045 essence of the original authorship remains intact or 046 whether it morphs into something entirely new. 047 Paraphrasing involves rewriting texts to convey 048 the same meaning while employing different words 049 or sentence structures (Bhagat and Hovy, 2013). 050 Although paraphrasing has long been employed to 051 enhance writing, it has been the subject of ongo-052 ing ethical and plagiarism-related debates (Prentice 053 and Kinden, 2018; Roe and Perkins, 2022). Never-054 theless, paraphrasing has always been considered 055 a tool to aid in rewriting content rather than gen-056 erating entirely original material. However, recent 057 advancements in LLMs have altered this paradigm 058 as they can function as paraphrasers while also 059 autonomously generating original content without 060 explicit prompts. As illustrated in the examples 061 in Figure 2, a situation will arise in contempo-062 rary times where paraphrasing a text (T 0 ) using 063 an LLM to produce the paraphrased version (T 1 ) 064 might closely resemble the text (G) independently 065 generated by the LLM on the same subject mat-066 ter. Consequently, this situation prompts inquiries 067 about the authorship of text T 1 , akin to the philo-068 sophical dilemma posed by the Ship of Theseus. 069 Two contrasting perspectives on this matter are 070 evident within the existing literature (Figure 2). 071 108 2023), capable of whole-text paraphrasing while 109 preserving contextual coherence and offering con-110 trol over lexical diversity. Our comprehensive anal-111 ysis encompasses various text sources, including 112 human-authored content and texts generated by six 113 LLMs in seven distinct datasets. 114 Our study stands apart from other research in au-115 thorship analysis, paraphrasing detection, AI-text 116 detection, or style analysis. The major contribution 117 of our paper is as follows: 118 • We aim to offer a solid resolution to 119 the counter-intuitive assumptions surround-120 ing paraphrasing and authorship, employing 121 a comprehensive computational perspective 122 supported by philosophical theory. 123 • We identify the difference among paraphrasers 124 regarding their effect on authorship. 125 • We create a paraphrased corpora 1 consist-126 ing of seven sources (with humans), seven 127 datasets, and four paraphrasers. 128 2 Related Work 129 Our study extends prior research examining au-130 thorship from various perspectives, including style 131 and content. Notably, Sari et al. (2018) found 132 that content-based features are more effective for 133 datasets with high topical variance, while datasets 134 with lower variance benefit more from style-based 135 features. Several assessments and benchmarks on 136 stylistic analysis have aimed to identify and in-137 fer style across different domains. The XSLUE benchmark (Kang and Hovy, 2021) comprehen-139 sively evaluates sentence-level cross-style language 140 understanding in 15 different styles. Addition-141 ally, the STEL framework (Wegmann and Nguyen, 142