EMNLP2024

Waterfall: Scalable Framework for Robust Text Watermarking and Provenance for LLMs

Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, Bryan Kian Hsiang Low

3 citations

Abstract

Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of users for practical implementation. In this paper, we propose WA-TERFALL, the first training-free framework for robust and scalable text watermarking applicable across multiple text types (e.g., articles, code) and languages supportable by LLMs, for general text and LLM data provenance. WA-TERFALL comprises several key innovations, such as being the first to use LLM as paraphrasers for watermarking along with a novel combination of techniques that are surprisingly effective in achieving robust verifiability and scalability. We empirically demonstrate that WATERFALL achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods, and also showed how it could be directly applied to the watermarking of code. Our code is available at https: //github.com/aoi3142/Waterfall . 2. To tackle the challenges arising from these desiderata, we proposed WATERFALL comprising novel innovations, including: (a) effective use of LLM paraphrasers to watermark existing text with IP to be protected (Section 3.1); (b) combination of vocab permutation and a new orthogonal watermarking perturbation method in token space, to achieve high scalability and robust verifiability while preserving fidelity (Section 3.3). 3. We conducted comprehensive empirical evaluations, demonstrating that WATERFALL achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods (Section 4.1), while meeting the desiderata for a variety of applications, including for LLM data provenance of articles (Section 4.3). We also showed how WA-TERFALL could be directly applied to the watermarking of programming code (Section 4.2). Problem formulation and Desiderata Consider M clients, each with unique watermark ID µ ∈ M and textual data T o ∈ T (e.g., articles or code) represented as token sequences where each token w i is from an ordered vocab space V = v 1 , ..., v |V| . We assume that T o has semantic content c (e.g., the IP content) that is only determined by its tokens and fully represents the text's value. Text formatting is irrelevant, especially as adversaries can strip all formatting, making those channels unusable for watermarking 1 . Watermarking: Client i uses a watermarking operator W(µ i , T o ) → T (i) w to produce a text T (i) w that contains watermark µ i , preserves c, and can then be used/distributed freely.