MEraser: An Effective Fingerprint Erasure Approach for Large Language Models (2506.12551v2)

Published 14 Jun 2025 in cs.CR and cs.AI

Abstract: LLMs have become increasingly prevalent across various sectors, raising critical concerns about model ownership and intellectual property protection. Although backdoor-based fingerprinting has emerged as a promising solution for model authentication, effective attacks for removing these fingerprints remain largely unexplored. Therefore, we present Mismatched Eraser (MEraser), a novel method for effectively removing backdoor-based fingerprints from LLMs while maintaining model performance. Our approach leverages a two-phase fine-tuning strategy utilizing carefully constructed mismatched and clean datasets. Through extensive evaluation across multiple LLM architectures and fingerprinting methods, we demonstrate that MEraser achieves complete fingerprinting removal while maintaining model performance with minimal training data of fewer than 1,000 samples. Furthermore, we introduce a transferable erasure mechanism that enables effective fingerprinting removal across different models without repeated training. In conclusion, our approach provides a practical solution for fingerprinting removal in LLMs, reveals critical vulnerabilities in current fingerprinting techniques, and establishes comprehensive evaluation benchmarks for developing more resilient model protection methods in the future.

Summary

The paper introduces MEraser, a dual-phase fine-tuning method that completely removes fingerprints from LLMs.
It uses an Erase phase with a mismatched dataset and a Recover phase with clean data to restore performance.
Experimental results show a 0% Fingerprint Success Rate and stable language modeling across various architectures.

MEraser: An Effective Fingerprint Erasure Approach for LLMs

Introduction

The research paper "MEraser: An Effective Fingerprint Erasure Approach for LLMs" (2506.12551) addresses the critical issue of model ownership and intellectual property protection in the context of LLMs. While backdoor-based fingerprinting has emerged as a promising method for verifying model authenticity, the removal of these fingerprints remains largely unexplored. MEraser, the proposed methodology, provides a novel solution that effectively removes fingerprints from LLMs while preserving their performance. The paper highlights the dual-phase fine-tuning strategy with mismatched and clean datasets, offering a practical and efficient approach.

Methodology

MEraser's methodology involves two main phases: Erase and Recover. The Erase phase utilizes a mismatched dataset to train the LLM, disrupting the associations between trigger and fingerprint outputs. This dataset is strategically constructed to include random and unrelated input-output pairs, effectively erasing fingerprints from the model.

Figure 1: The process of MEraser and verification. Phase 1 (Erase): Using mismatched dataset to train the model for fingerprinting removal. Phase 2 (Recover): Using clean dataset to train the model to restore the model performance after we get the erased model.

In the Recover phase, the model is fine-tuned using a clean dataset to restore its original performance. This clean dataset consists of high-quality, task-relevant samples that enable the model to relearn appropriate responses without reintroducing fingerprints. The methodology leverages the LoRA adapter for efficient erasure transfer across different models, minimizing computational overhead and resource usage.

Experimental Results

The experiments conducted evaluate the effectiveness and harmlessness of MEraser. The Fingerprint Success Rate (FSR) and Perplexity (PPL) serve as key metrics for measuring the degree of fingerprint removal and the model's language modeling capabilities, respectively. MEraser achieves complete fingerprint removal (FSR = 0%) with minimal training data, while PPL values indicate stable model performance across various architectures and fingerprinting methods.

Figure 2: The ACC and SuperGLUE evaluation of MEraser.

Additionally, MEraser outperforms existing baseline methods such as incremental fine-tuning and model pruning in both effectiveness and efficiency. While some methods partially reduce fingerprints, MEraser ensures comprehensive erasure while maintaining model integrity.

Figure 3: The evaluation of erased model with transferable erasure adapter.

Implications and Future Directions

MEraser reveals significant vulnerabilities in current fingerprinting protocols and sets a benchmark for evolving more secure model protection methods. The approach highlights the need for resilient authentication mechanisms, demonstrating that offensive erasure research can inform defensive innovation. Future developments might explore broader applicability to other types of watermarking and intellectual property protection strategies, fostering an ecosystem where ethical deployment of LLMs is prioritized.

Conclusion

MEraser provides a practical, efficient solution to fingerprinting removal in LLMs, ensuring model integrity while highlighting the weaknesses of current protection measures. By combining mismatched and clean datasets in a strategic two-phase approach, MEraser achieves effective fingerprint erasure without compromising model performance. This work offers substantial insights for developing robust ownership verification systems in machine learning while guiding the future exploration of innovative protective methodologies.