ACL2024

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao

被引用 6 次

摘要

Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark (GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1. 1 * Corresponding author. 1 Code is available at https://github.com/ PorUna-byte/Gumbelsoft Fixed Decoder function ❄ ➕ Fixed Pseudo-random function ❄ Γ F sk … Watermarked LLM User Multiple conversations between Watermarked LLM and User Reason for the Repetition Possible Solutions Nth conversation Kyoto, Shanghai, Istanbul Recommend three cities suitable for vacation Kyoto, Shanghai, Istanbul Recommend three cities suitable for vacation 1st conversation • Add uncertainty to the Decoder function: 1. Drop the watermark with a predefined probability 2. Replace the 'argmax' with 'sample from softmax' • Add uncertainty to the Pseudo-random function: 3. Randomly modify(cyclically shift) watermark key Repeated response from LLMs!