StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements (2408.15666v1)

Published 28 Aug 2024 in cs.CL

Abstract: Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is an important but challenging task. Current methods using LLMs lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall. To address this, we develop StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. StyleRemix uses pre-trained Low Rank Adaptation (LoRA) modules to rewrite an input specifically along various stylistic axes (e.g., formality and length) while maintaining low computational cost. StyleRemix outperforms state-of-the-art baselines and much larger LLMs in a variety of domains as assessed by both automatic and human evaluation. Additionally, we release AuthorMix, a large set of 30K high-quality, long-form texts from a diverse set of 14 authors and 4 domains, and DiSC, a parallel corpus of 1,500 texts spanning seven style axes in 16 unique directions

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel modular approach for authorship obfuscation that leverages LoRA adapters to perturb stylistic elements with fine-grained control.
It employs a two-phase process—pre‐obfuscation via style element distillation and adaptive obfuscation—to balance content preservation with effective identity masking.
Evaluation on new datasets shows that StyleRemix outperforms larger models in obfuscation efficacy and fluency, offering practical utility in secure text anonymization.

StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

Introduction

The paper introduces a novel method, StyleRemix, for authorship obfuscation by perturbing specific stylistic elements of text, thereby guarding against authorial fingerprinting while preserving original content integrity. Unlike traditional approaches relying on LLMs that often suffer from deficiencies in interpretability and controllability, StyleRemix leverages fine-grained style modifications through Low Rank Adaptation (LoRA) modules. The reported performance indicates that StyleRemix not only excels over existing methods and larger LLMs but also ensures a much-needed balance between computational efficiency and personalization for diverse author styles.

Methodology

Distillation of Style Elements

StyleRemix operates in two main phases: pre-obfuscation and obfuscation. The pre-obfuscation phase involves constructing a variety of training datasets designed to reflect targeted stylistic dimensions such as length, formality, and grade level. Specifically, LoRA modules are trained to perform style transfers along these dimensions. The separate adapters for each style axis ensure that the entire system can be flexible and modular, being trained once and used multiple times.

Adaptive Obfuscation

In the obfuscation phase, StyleRemix employs the pre-trained LoRA adapters to adjust the style of the input text based on author-specific stylistic features. Authors can manually or automatically select which stylistic features to alter. By merging these style-specific adapters, authorship obfuscation achieves a nuanced, yet interpretable transformation of text. This ensures that stylometric invariants of authorship are effectively masked while retaining the fluency and readability of the output.

Datasets and Evaluation

The authors introduce two new datasets: AuthorMix and DiSC. AuthorMix, comprising over 30K paragraphs and spanning four domains, provides a rich source for evaluating authorship obfuscation techniques. DiSC, featuring 24K texts translated into distinct stylistic dimensions using GPT-4 Turbo, serves as a comprehensive training and evaluation resource for different style elements.

To validate StyleRemix, the authors employ both automatic and human evaluations across multiple metrics. Automatic evaluations use stylistic classifiers to measure author drop rates, content preservation scores using Sentence Transformers, and fluency scores using the Corpus of Linguistic Acceptability (CoLA). Human evaluators further attest to the text's fluency, grammar, and degree of obfuscation.

Results

Automatic Evaluation

The paper reports that StyleRemix significantly outperforms existing models, with drop rates indicating a higher success in obfuscation. Notably, StyleRemix achieves better performance even when compared to models with vastly larger parameter sizes such as Llama-3-Instruct 70B. The results are consistent across multiple domains, showing the robustness and versatility of the method.

Human Evaluation

Human evaluation corroborates these findings, showing that StyleRemix provides superior obfuscation without compromising grammar or fluency. Interestingly, StyleRemix's modular approach also offers additional interpretability, with evaluators noting noticeable changes in specific stylistic features, thus vindicating the choice of LoRA-augmented style dimensions.

Implications and Future Research

The practical implications of StyleRemix are profound, particularly in contexts requiring secure anonymization such as anonymous peer reviews and sensitive communications. The robustness of the methodology across a diverse array of domains further amplifies its applicability.

Theoretically, StyleRemix opens pathways for exploring modular, interpretable obfuscation methods within NLP, presenting a compelling case for the integration of LoRA modules with traditional LLMs. Future research could investigate extending the style axes beyond the current selection and enhancing the automatic selection procedures for better personalization.

Furthermore, the methodology shows potential for multi-lingual adaptation, a direction that could significantly broaden the scope and impact of authorship obfuscation techniques globally.

Conclusion

StyleRemix advances the field of authorship obfuscation by combining interpretability with advanced style transfer mechanisms. Leveraging LoRA modules tuned for specific stylistic features, the method ensures both robustness and efficiency, outperforming larger models and existing state-of-the-art techniques. With comprehensive evaluations and a focus on balancing various quality metrics, StyleRemix stands out as a highly promising tool, potentially transformative in maintaining author anonymity across diverse application areas. The datasets and trained models released with this work provide valuable resources for further exploration and innovation in the domain of authorship obfuscation.