DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection (2505.16530v1)

Published 22 May 2025 in cs.CR, cs.AI, and cs.CL

Abstract: LLMs are considered valuable Intellectual Properties (IP) for legitimate owners due to the enormous computational cost of training. It is crucial to protect the IP of LLMs from malicious stealing or unauthorized deployment. Despite existing efforts in watermarking and fingerprinting LLMs, these methods either impact the text generation process or are limited in white-box access to the suspect model, making them impractical. Hence, we propose DuFFin, a novel $\textbf{Du}$al-Level $\textbf{Fin}$gerprinting $\textbf{F}$ramework for black-box setting ownership verification. DuFFin extracts the trigger pattern and the knowledge-level fingerprints to identify the source of a suspect model. We conduct experiments on a variety of models collected from the open-source website, including four popular base models as protected LLMs and their fine-tuning, quantization, and safety alignment versions, which are released by large companies, start-ups, and individual users. Results show that our method can accurately verify the copyright of the base protected LLM on their model variants, achieving the IP-ROC metric greater than 0.95. Our code is available at https://github.com/yuliangyan0807/LLM-fingerprint.

Summary

An Examination of DuFFin: A Dual-Level Framework for LLMs IP Protection

The paper "DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection" presents an innovative approach to safeguard the intellectual property (IP) of LLMs by leveraging a dual-level fingerprinting methodology. The objective is to address the inefficiencies and impracticalities inherent in existing watermarking and fingerprinting solutions, especially within the context of black-box settings where access to model parameters is restricted.

The authors introduce DuFFin, a comprehensive framework that utilizes two distinct levels of fingerprinting: trigger-pattern fingerprints (Trigger-DuFFin) and knowledge-level fingerprints (Knowledge-DuFFin). This approach ensures robust IP protection without the need for invasive modifications that could potentially degrade text generation quality.

Methodology Overview

Trigger-DuFFin operates by exploiting the phenomenon where LLMs derived from a protected model exhibit recurrent patterns in their output responses to specific prompt triggers. Here, the authors train a fingerprint extractor capable of capturing these invariant patterns using a private set of prompt triggers. These prompts are selected from various domains, such as reasoning and safety alignment, ensuring that responses from different models display discernible differences.

Knowledge-DuFFin, unlike the trigger-pattern counterpart, does not require training an extractor. It involves constructing a set of domain-specific knowledge questions serving as the secret key. The suspect model’s answers to these questions are aggregated to form the knowledge-level fingerprint. This method capitalizes on the consistency of an LLM’s multi-domain knowledge capabilities, which remain largely unaltered even after modifications by model stealers.

Experimental Validation

The effectiveness of DuFFin is evaluated through experiments conducted on a diverse set of models, including variants of protected LLMs from public repositories like HuggingFace. Notably, DuFFin demonstrates high performance in ownership verification tasks, achieving an IP-ROC (Receiver Operating Characteristic) metric greater than 0.95 across multiple protected LLM families. In scenarios where model modifications challenge verification methods, DuFFin maintains robust performance, indicating its efficacy in real-world applications.

Implications and Future Work

DuFFin’s framework provides considerable insights into the practical and theoretical aspects of LLM IP protection. Practically, it offers a non-invasive and efficient verification mechanism, critical for entities aiming to prevent unauthorized model usage or modification. Theoretically, it opens avenues for further exploration into fingerprinting techniques that do not rely on internal model access, potentially paving the path towards standardized IP protection methodologies for AI systems.

Future research may delve into expanding DuFFin’s capabilities to cover visual LLMs, wherein multi-modal data influence generation processes, or explore dynamic secret key generation to avert targeted fingerprint erasure. Additionally, investigating the robustness of DuFFin against adversarial attacks could further refine its defensive capabilities, ensuring comprehensive protection against evolving threats.

In summary, DuFFin represents a significant stride in the ongoing efforts to secure the intellectual property rights of LLMs, offering a robust, non-invasive solution adaptable to the complexities and restrictions inherent in real-world model deployment scenarios.

GitHub

GitHub - yuliangyan0807/llm-fingerprint (2 stars)