EMNLP2024

StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell L. Gordon, Zaïd Harchaoui, Yejin Choi

被引用 4 次

摘要

Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is an important but challenging task. Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall. To address this, we develop STYLEREMIX, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. STYLEREMIX uses pre-trained Low Rank Adaptation (LoRA) modules to rewrite an input specifically along various stylistic axes (e.g., formality and length) while maintaining low computational cost. STYLEREMIX outperforms state-of-theart baselines and much larger LLMs in a variety of domains as assessed by both automatic and human evaluation. Additionally, we release AUTHORMIX, a large set of 30K high-quality, long-form texts from a diverse set of 14 authors and 4 domains, and DISC, a parallel corpus of 1,500 texts spanning seven style axes in 16 unique directions 1 . * Co-first authors 1 We release 1) our code at https://github.com/ jfisher52/StyleRemix 2) a demo of STYLEREMIX at https://huggingface.co/spaces/hallisky/ StyleRemix and 3) the datasets (AUTHORMIX and DISC) and trained models in a HuggingFace collection