Papers
Topics
Authors
Recent
2000 character limit reached

Privacy Risks in Large Language Models

Updated 7 October 2025
  • Privacy risks in LLMs are vulnerabilities arising from training data memorization, prompt leakage, and system-level side channels that expose sensitive information.
  • Attack vectors include membership inference, data extraction, model inversion, and backdoor injections, enabling adversaries to recover or infer private data.
  • Mitigation strategies combine technical defenses like differential privacy, deduplication, federated learning, and unlearning with legal and sociotechnical safeguards.

LLMs present substantial privacy risks stemming from their training procedures, the structure and deployment of their architectures, and their integration with user-facing and agentic systems. These models, built on massive corpora—often containing sensitive or personal data—exhibit a variety of privacy vulnerabilities not limited to training data memorization. Adversaries can exploit these weaknesses for direct data leakage, infer sensitive user or training attributes, or launch sophisticated attacks exploiting new system-level vectors introduced by rapid advances in LLM deployment and autonomous capabilities. As recent research demonstrates, the multifaceted privacy landscape of LLMs requires a comprehensive framework encompassing technical, legal, and sociotechnical considerations.

1. Taxonomy of Privacy Risks in LLMs

Privacy risks in LLMs arise across the entire lifecycle of model development and usage. A rigorous taxonomy organizes these threats into distinct but interrelated categories, each with distinct mechanisms:

Category Mechanism/Example Stage(s) Impacted
Training Data Memorization Verbatim sequence regurgitation, semantic/cross-modal leakage Pretraining, Fine-tuning
Prompt/User Interaction Leakage Logging and retention of user prompts, direct API exposure Inference, System Deployment
Output/Contextual Leakage Reflecting sensitive info in generated outputs, agentic actions Inference, Agent Integration
Attribute Inference/Aggregation Inferring demographics or identities from outputs/profiles Inference, Aggregation in LLM-powered apps
Side-Channel/System-level Leakage Token generation timing, speculative decoding, cache timing System-level, Infrastructure
Autonomous Agent Privacy Risks Autonomous retrieval, prompt injection, SQL leakage Agentic Execution, Multi-agent setups

This expanded taxonomy appears in multiple surveys and position pieces, highlighting that privacy incidents encompass not only direct regurgitation of training data but extend to indirect, contextual, and systemic vectors (Mireshghallah et al., 2 Oct 2025, Shanmugarasa et al., 15 Jun 2025, Du et al., 16 Sep 2025).

2. Principal Attack Vectors and Technical Mechanisms

Distinct attack modalities target different privacy surfaces of LLMs. Key vectors include:

  • Membership Inference Attacks (MIAs): Adversaries determine whether a specific example xx was included in the LLM's training set by examining output probability or loss. Sensitivity to prompt data, as in in-context learning, amplifies this risk; for prompted LLMs, the distinction between member and non-member confidences can be orders of magnitude higher than in fine-tuned models (Duan et al., 15 Nov 2024).
  • Training Data Extraction: By crafting inputs or exploiting model overfitting, adversaries recover verbatim or near-verbatim snippets (e.g., with “canary” sequences) (Neel et al., 2023, Li et al., 2023, Mireshghallah et al., 2 Oct 2025).
  • Model Inversion and Embedding Attacks: Attackers invert embeddings or hidden representations to reconstruct sensitive inputs. Formally, given a hidden state HH, an attacker may solve minwL(G(H+ΔH),Dprivate)\operatorname{min}_w L(G(H + \Delta H), D_\text{private}) for adversarial loss LL and generator GG (Chen et al., 4 May 2025, Zhu et al., 25 Apr 2024, Wan et al., 20 May 2024).
  • Attribute Inference: Sensitive attributes (demographics, location, health status) can be inferred using probe classifiers on internal representations or outputs, leveraging the semantic richness of modern LLM embeddings (Plant et al., 2022, Ma et al., 30 Jun 2025).
  • Gradient Leakage in Distributed/Federated Learning: Gradients transmitted during collaborative or federated learning, if intercepted, can allow recovery of the associated private data (Das et al., 30 Jan 2024, Du et al., 21 Dec 2024).
  • Backdoor and Trojan Injection: Malicious triggers inserted during (pre-)training or fine-tuning can later force the LLM to leak targeted information upon exposure to specific patterns (Li et al., 2023, Neel et al., 2023, Du et al., 21 Dec 2024).
  • Side-channel Attacks (System-level): In speculative decoding, the pattern of correct/incorrect token generations modulates observable timing or packet size, which correlates with input characteristics, yielding high identification rates (>>90%) (Wei et al., 1 Nov 2024, Du et al., 16 Sep 2025).
  • Agentic and Aggregation-based Attacks: In agentic deployments or as information aggregators, LLMs may synthesize publicly scattered data into highly specific private conclusions, leading to attribute aggregation risks (Mireshghallah et al., 2 Oct 2025, Shanmugarasa et al., 15 Jun 2025).

3. Mitigation Strategies: Efficacy, Trade-offs, and Limitations

Mitigation measures against LLM privacy risks span preprocessing, model training, inference, and post-hoc stages:

  • Data Deduplication and Sanitization: Superlinear memorization with duplicate data makes deduplication a potent defense. Suffix array-based or LSH-based algorithms remove exact duplicates, reducing sequence regeneration rates by up to 20×20\times and decreasing attack AUROC from $0.90$ to near-chance (0.50\sim0.50) for singletons (Kandpal et al., 2022). However, near-duplicates and semantic redundancy are not fully addressed.
  • Differential Privacy (DP): DP techniques—applied at data, parameter, or gradient levels—bound the influence of any single sample via noise injection. Standard DP-SGD satisfies for datasets D,DD, D’:

Pr[M(D)S]exp(ϵ)Pr[M(D)S]\Pr[M(D) \in S] \leq \exp(\epsilon) \Pr[M(D') \in S]

However, strong privacy budgets (low ϵ\epsilon) can degrade utility by 20%20\% or more, especially in deep LLMs. Hybrid approaches (e.g., adversarial objectives + DP, such as CAPE) mitigate some utility loss (Plant et al., 2022).

  • Federated and Edge Learning: Training and tuning without centralizing raw data limits exposure, but exposed gradients in federated setups can still leak data unless further masked (Yan et al., 8 Mar 2024, Chen et al., 4 May 2025).
  • Backdoor and Trojan Defense: Removal techniques include fine-tuning sharpness-aware minimization, super-fine-tuning, and code-level inspection for triggers; however, complete removal often sacrifices model utility or fails to eliminate indirect pathways (Chen et al., 4 May 2025).
  • Embeddings and Representation Hardening: Encryption (e.g., homomorphic), local DP at embedding-level, and frequency-domain obfuscation defenses (such as DCT-based feature spreading) show some reduction in inversion success but at computational cost and possible utility impact (Wan et al., 20 May 2024, Zhu et al., 25 Apr 2024).
  • Unlearning Mechanisms: Post-hoc unlearning (e.g., SISA or descent-to-delete) removes targeted data effects without full retraining, targeting the “Right to be Forgotten” for compliance (Neel et al., 2023).
  • Speculative Decoding Side-Channel Mitigations: Aggregating token outputs or padding packets to uniform or randomized lengths reduces the effectiveness of timing and packet size side-channels, at the cost of higher communication latency and overhead (Wei et al., 1 Nov 2024).
  • Prompt and Output Sanitization: Named Entity Recognition, rule-based redaction, and two-step (mask/de-anonymize) pipelines manage prompt privacy, but may miss context-dependent exposures (Shanmugarasa et al., 15 Jun 2025).

Many of these mitigations face inherent trade-offs between privacy, utility, and system efficiency. Furthermore, while DP and deduplication address mean-case risks, motivated attackers can still exploit outliers, semantic or system-level pathways, and agent-specific vulnerabilities.

4. Emerging Systemic and Agentic Privacy Risks

Recent advances in LLM deployment have introduced new, underexplored risk surfaces:

  • Privacy in LLM-powered Agents: Autonomous agents can aggregate data beyond initial prompts—retrieving user-specific web content or API responses, performing latent aggregation of sparse public signals into private profiles, or leaking private documents via multi-hop reasoning or prompt injection (Mireshghallah et al., 2 Oct 2025, Du et al., 16 Sep 2025).
  • Speculative and Distributed Inference: System optimizations, such as speculative decoding, expose side-channels, enabling high-accuracy query fingerprinting and confidential datastore exfiltration by observing token pacing and packetization (Wei et al., 1 Nov 2024, Du et al., 16 Sep 2025).
  • Contextual and Semantic Privacy: Sensitive features are not limited to explicit identifiers but may be reconstructed via stylistic or contextual inferences. Standard DP or token-level sanitization does not provide guarantees against semantic attacks where, for example, latent embeddings leak demographic status or health data (Ma et al., 30 Jun 2025, Shanmugarasa et al., 15 Jun 2025).
  • Human Factors and Disclosure Dynamics: User interaction modalities, incomplete mental models, and flawed privacy perceptions cause over-disclosure and systemic privacy exposure well beyond model-centered threats (Li et al., 3 Feb 2024).
  • Legal and Policy-Driven Challenges: Evolving regulations (e.g., GDPR, CCPA) demand not just technical compliance (e.g., unlearning), but holistic frameworks in line with emerging definitions of contextual integrity and user sovereignty (Neel et al., 2023).

5. Critical Open Problems and Ongoing Research Directions

The literature identifies several unsolved challenges that continue to undermine robust privacy guarantees in LLMs:

  • Measurement and Quantification of Semantic Leakage: Existing metrics—such as perplexity or token overlap—fail to capture semantic inference risk. New formalizations leveraging KL divergence or adversarial expected value are needed:

DKL(P(Spf(X))P(Sp))E,D_{KL}(P(S_p \mid f(X)) || P(S_p)) \leq \mathcal{E},

where SpS_p denotes semantic property of interest and f(X)f(X) is the model output (Ma et al., 30 Jun 2025).

  • Multimodal Privacy: Cross-modal LLMs exacerbate leakage risk, requiring advances in joint privacy-scoring and multimodal representation encryption (Ma et al., 30 Jun 2025).
  • Balancing Privacy and Fidelity: Overzealous de-identification or noise harms utility and generation quality; adaptive and metric-aware methods are needed to maintain function while masking risk (Plant et al., 2022, Yan et al., 8 Mar 2024).
  • Resilience to Adaptive and Long-Range Attacks: Many current defenses are tailored to single-shot or static attacks but may fail against multi-step, adaptive, or agentic adversaries capable of sophisticated context aggregation (Shanmugarasa et al., 15 Jun 2025, Du et al., 16 Sep 2025).
  • Interdisciplinary and Sociotechnical Solutions: Research focus remains disproportionately on memorization and technical defenses (nearly half of recent work per (Mireshghallah et al., 2 Oct 2025)), while broader interdisciplinary approaches that address data governance, user agency, regulatory negotiation, and deployment context are lacking.

6. Broader Implications: From Data to Sociotechnical Privacy

The contemporary literature converges on a central insight: privacy risks in LLMs cannot be fully addressed by technical fixes targeting only memorization or isolated leakage vectors. Privacy threats are system-wide, emerging from data collection practices, aggregation effects, human–machine interaction modalities, and the design of agentic and autonomous capabilities. Effective mitigation will require:

  • Integration of technical defenses (data deduplication, DP, sanitization) with robust system design (distributed architectures, local processing, control over retention) (Shanmugarasa et al., 15 Jun 2025, Mireshghallah et al., 2 Oct 2025).
  • Sociotechnical frameworks informed by legal and policy norms, enabling meaningful user consent, tracing, and actionable rights (e.g., efficient “Right to be Forgotten” compliance) (Neel et al., 2023).
  • Ongoing research and standardization on privacy risk measurement, empirical evaluation benchmarks, and agentic behavior auditing (Smith et al., 2023, Li et al., 2023).

Current and future work must bridge technical, legal, and societal domains to create LLM systems that deliver utility while maintaining principled privacy guarantees in an ever-widening landscape of deployment scenarios.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Privacy Risks in Large Language Models.