- The paper introduces a novel modular approach for authorship obfuscation that leverages LoRA adapters to perturb stylistic elements with fine-grained control.
- It employs a two-phase process—pre‐obfuscation via style element distillation and adaptive obfuscation—to balance content preservation with effective identity masking.
- Evaluation on new datasets shows that StyleRemix outperforms larger models in obfuscation efficacy and fluency, offering practical utility in secure text anonymization.
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements
Introduction
The paper introduces a novel method, StyleRemix, for authorship obfuscation by perturbing specific stylistic elements of text, thereby guarding against authorial fingerprinting while preserving original content integrity. Unlike traditional approaches relying on LLMs that often suffer from deficiencies in interpretability and controllability, StyleRemix leverages fine-grained style modifications through Low Rank Adaptation (LoRA) modules. The reported performance indicates that StyleRemix not only excels over existing methods and larger LLMs but also ensures a much-needed balance between computational efficiency and personalization for diverse author styles.
Methodology
Distillation of Style Elements
StyleRemix operates in two main phases: pre-obfuscation and obfuscation. The pre-obfuscation phase involves constructing a variety of training datasets designed to reflect targeted stylistic dimensions such as length, formality, and grade level. Specifically, LoRA modules are trained to perform style transfers along these dimensions. The separate adapters for each style axis ensure that the entire system can be flexible and modular, being trained once and used multiple times.
Adaptive Obfuscation
In the obfuscation phase, StyleRemix employs the pre-trained LoRA adapters to adjust the style of the input text based on author-specific stylistic features. Authors can manually or automatically select which stylistic features to alter. By merging these style-specific adapters, authorship obfuscation achieves a nuanced, yet interpretable transformation of text. This ensures that stylometric invariants of authorship are effectively masked while retaining the fluency and readability of the output.
Datasets and Evaluation
The authors introduce two new datasets: AuthorMix and DiSC. AuthorMix, comprising over 30K paragraphs and spanning four domains, provides a rich source for evaluating authorship obfuscation techniques. DiSC, featuring 24K texts translated into distinct stylistic dimensions using GPT-4 Turbo, serves as a comprehensive training and evaluation resource for different style elements.
To validate StyleRemix, the authors employ both automatic and human evaluations across multiple metrics. Automatic evaluations use stylistic classifiers to measure author drop rates, content preservation scores using Sentence Transformers, and fluency scores using the Corpus of Linguistic Acceptability (CoLA). Human evaluators further attest to the text's fluency, grammar, and degree of obfuscation.
Results
Automatic Evaluation
The paper reports that StyleRemix significantly outperforms existing models, with drop rates indicating a higher success in obfuscation. Notably, StyleRemix achieves better performance even when compared to models with vastly larger parameter sizes such as Llama-3-Instruct 70B. The results are consistent across multiple domains, showing the robustness and versatility of the method.
Human Evaluation
Human evaluation corroborates these findings, showing that StyleRemix provides superior obfuscation without compromising grammar or fluency. Interestingly, StyleRemix's modular approach also offers additional interpretability, with evaluators noting noticeable changes in specific stylistic features, thus vindicating the choice of LoRA-augmented style dimensions.
Implications and Future Research
The practical implications of StyleRemix are profound, particularly in contexts requiring secure anonymization such as anonymous peer reviews and sensitive communications. The robustness of the methodology across a diverse array of domains further amplifies its applicability.
Theoretically, StyleRemix opens pathways for exploring modular, interpretable obfuscation methods within NLP, presenting a compelling case for the integration of LoRA modules with traditional LLMs. Future research could investigate extending the style axes beyond the current selection and enhancing the automatic selection procedures for better personalization.
Furthermore, the methodology shows potential for multi-lingual adaptation, a direction that could significantly broaden the scope and impact of authorship obfuscation techniques globally.
Conclusion
StyleRemix advances the field of authorship obfuscation by combining interpretability with advanced style transfer mechanisms. Leveraging LoRA modules tuned for specific stylistic features, the method ensures both robustness and efficiency, outperforming larger models and existing state-of-the-art techniques. With comprehensive evaluations and a focus on balancing various quality metrics, StyleRemix stands out as a highly promising tool, potentially transformative in maintaining author anonymity across diverse application areas. The datasets and trained models released with this work provide valuable resources for further exploration and innovation in the domain of authorship obfuscation.