Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of Large Language Models (2506.06060v1)

Published 6 Jun 2025 in cs.CL and cs.AI

Abstract: Federated fine-tuning of LLMs (FedLLMs) presents a promising approach for achieving strong model performance while preserving data privacy in sensitive domains. However, the inherent memorization ability of LLMs makes them vulnerable to training data extraction attacks. To investigate this risk, we introduce simple yet effective extraction attack algorithms specifically designed for FedLLMs. In contrast to prior "verbatim" extraction attacks, which assume access to fragments from all training data, our approach operates under a more realistic threat model, where the attacker only has access to a single client's data and aims to extract previously unseen personally identifiable information (PII) from other clients. This requires leveraging contextual prefixes held by the attacker to generalize across clients. To evaluate the effectiveness of our approaches, we propose two rigorous metrics-coverage rate and efficiency-and extend a real-world legal dataset with PII annotations aligned with CPIS, GDPR, and CCPA standards, achieving 89.9% human-verified precision. Experimental results show that our method can extract up to 56.57% of victim-exclusive PII, with "Address," "Birthday," and "Name" being the most vulnerable categories. Our findings underscore the pressing need for robust defense strategies and contribute a new benchmark and evaluation framework for future research in privacy-preserving federated learning.

PDF Abstract

Analyzing Vulnerabilities in Federated Fine-Tuning of LLMs: A Study on Data Extraction Attacks

The paper "Simple Yet Effective: Extracting Private Data Across Clients in Federated Fine-Tuning of LLMs" presents a critical examination of the privacy vulnerabilities associated with federated fine-tuning of LLMs. Federated Learning (FL) has emerged as a prominent approach to preserve data privacy by allowing clients to collaboratively train models without direct data sharing. However, the intrinsic capacity of LLMs to memorize data introduces significant risks, particularly pertaining to the leakage of Personally Identifiable Information (PII) through training data extraction attacks.

Overview and Methodology

This paper shifts focus to a realistic threat model wherein an adversary can access data from a single client and aims to extract PII from other clients within a federated framework. Unlike previous verbatim data extraction methods, this approach involves leveraging contextual prefixes to extract sensitive data. The authors introduce three distinct attack strategies: PII-contextual Prefix Sampling, Frequency-Prioritized Prefix Sampling, and Latent Association Fine-tuning (LAFt), designed to optimize data recovery both in terms of coverage and efficiency.

PII-contextual Prefix Sampling: This strategy involves collecting prefixes immediately preceding PII in the attacker's dataset to query federated LLMs, assuming these prefixes hold potential in triggering the model to generate corresponding PII from other clients.
Frequency-Prioritized Prefix Sampling: Here, prefixes are sorted based on their frequency of occurrence before PII in training data, selecting high-frequency prefixes to enhance extraction effectiveness.
Latent Association Fine-tuning (LAFt): A novel approach leveraging fine-tuning with prefix-PII pairs to reinforce the model's ability to associate prefixes with sensitive data, potentially improving extraction success rates.

The experimental setup utilizes a legal dataset enhanced with PII annotations in accordance with international privacy standards (CPIS, GDPR, CCPA). The research employs coverage rate and efficiency as primary metrics to assess attack success, alongside developing a comprehensive evaluation framework.

Results and Implications

The experiments demonstrate that the proposed attacks can achieve coverage rates as high as 56.57% for victim-exclusive PII, highlighting a substantial privacy risk in federated fine-tuning contexts. Categories such as "Address," "Birthday," and "Name" emerged as most vulnerable. LAFt, although offering distinct PII extractions, did not significantly diversify the range of extracted PII types.

These findings underscore the urgent need for developing robust defense mechanisms within federated learning to mitigate such privacy threats. Notably, the application of naive data sanitization techniques, like PII masking, showed only limited effectiveness, pointing towards the necessity of more sophisticated data protection strategies.

Future Directions in AI and Privacy Preservation

The paper suggests that further advancements in privacy-preserving techniques are crucial as federated learning expands across domains handling sensitive information. Exploring advanced cryptographic methods or integrating differential privacy into federated learning protocols could potentially curtail unauthorized data extraction. Additionally, improvements in model architectures that inherently reduce memorization could complement these strategies.

Federated Learning's implementation in real-world systems necessitates not only rigorous security assessments but also collaborative efforts between researchers and regulatory bodies to ensure compliance with evolving legal frameworks. By establishing new benchmarks through studies like this, the AI community can better navigate the challenges posed by privacy-preservation in decentralized learning environments.

This research contributes significantly to our understanding of the vulnerabilities in federated fine-tuning of LLMs and sets a precedent for ongoing investigations into safeguarding user privacy in distributed AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Yingqi Hu (2 papers)
Zhuo Zhang (42 papers)
Jingyuan Zhang (50 papers)
Lizhen Qu (68 papers)
Zenglin Xu (145 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos