Scaling Laws for Membership Inference

Updated 1 July 2025

Scaling Laws for Membership Inference study how the risk of identifying whether a data sample was used in training changes based on model size, data, and attack strategy.
The risk doesn't scale linearly; it depends complexly on factors like data diversity (per-class examples, number of classes) and attack techniques, including aggregation strategies.
New techniques like quantile regression allow scalable black-box attacks on large models, necessitating advanced defenses and auditing methods to measure and mitigate privacy risk effectively.

Membership inference refers to the problem of determining, given a model and a sample, whether that sample was part of the model's training data. The paper of scaling laws for membership inference seeks to quantify how the risk and accuracy of such attacks evolve as a function of model size, data properties, attack resources, and architectural or domain-specific features. Recent research—spanning discriminative vision models, deep transfer learning, black-box inference against LLMs, and general sequence models—has revealed diverse and sometimes surprising behaviors, challenging assumptions about monotonic risk and exposing new paradigms for scalable auditing, privacy, and legal compliance.

1. Scaling Laws: Core Models and Empirical Relationships

Key scaling phenomena for membership inference can be grouped according to several axes: model capacity and overfitting, data set characteristics, attack complexity, and aggregation strategies.

Model Capacity and Overfitting: Early work established that, in classical and deep networks, increasing model depth/size (for fixed data) increases vulnerability to membership inference attacks, especially in the presence of overfitting (elevated train–test accuracy gap) (Tonni et al., 2020). However, as models become very large and are trained on massive, highly diverse datasets (e.g., LLMs, ImageNet), overfitting may vanish, and naive attacks based on training/test loss differences lose much of their power (Puerto et al., 31 Oct 2024).
Quantitative Scaling in Transfer Learning: Empirical and theoretical analysis reveals that the vulnerability of fine-tuned models at a fixed low false positive rate follows a power law with respect to dataset properties (Tobaben et al., 7 Feb 2024):

$\log_{10}(\mathrm{tpr}) = \beta_S \log_{10}(S) + \beta_C \log_{10}(C) + \beta_0$

where $\mathrm{tpr}$ is true-positive rate, $S$ is the number of examples per class, $C$ the number of classes, and $\beta_S < 0$ , $\beta_C > 0$ are empirical exponents. Thus, MIA risk falls as examples per class increase and rises with the number of classes.

Document and Collection-Sized Aggregation in LLMs: Modern studies demonstrate that while paragraph- or sentence-level MIAs are weak (AUROC $\sim0.5$ ), aggregating many such weak signals across longer texts (documents or collections) creates a compounding effect, yielding high AUROCs ( $>0.9$ ) (Puerto et al., 31 Oct 2024). This scaling is non-linear: small per-unit accuracy improvements lead to outsized performance gains when many units are combined.
Shadow and Reference Model Scaling: The success of strong attacks such as LiRA increases as more reference models are used (up to hundreds), but this shows diminishing returns and becomes computationally prohibitive for large models (Bertran et al., 2023, Hayes et al., 24 May 2025). Quantile regression and discrepancy-based approaches enable efficient scaling to large architectures and datasets, matching or exceeding reference-based attacks at a fraction of the compute cost (Bertran et al., 2023, Zhang et al., 22 Sep 2024, 2405.15140).
Membership Encoding and Stealth Embedding: Membership encoding demonstrates that it is possible to encode, and later robustly extract, a substantial fraction of the membership information for arbitrary points, with accuracy scaling with model capacity. In some settings, 20–50% of the training set can be marked for inference with negligible accuracy loss, and encoding is robust to pruning, fine-tuning, and input redaction (Song et al., 2019).

2. Data, Domain, and Attack Resource Dependencies

Scaling laws for MIA are not governed by a single variable but are a joint function of data, model, and adversary characteristics.

Data Properties: Class/feature imbalance and low entropy increase MIA accuracy; balanced and highly diverse datasets (high entropy) reduce risk (Tonni et al., 2020). Analysis of deep transfer learning uncovers a direct, power-law relationship between vulnerability and the number of examples per class and an inverse power law with the number of classes (Tobaben et al., 7 Feb 2024).
Domain and Distribution Shift: Domain significantly impacts vulnerability, with legal/encyclopedic text showing higher risk than source code or mathematical domains. The emergence of "outlier" splits—where strong MIA performance materializes—further suggests high variance in risk, even when most splits show near-random results (Chen et al., 18 Dec 2024). The shadow model’s data source need not match the target model for successful attacks (He et al., 2022).
Attacker Resources and Threat Model: The size of the attacker's shadow dataset and the number of trained reference models have a strong, non-linear effect on attack success—larger attacker capacity yields higher attack rates (up to a plateau). However, defenses must remain robust even when attacker shadow datasets are theoretically unbounded (Tonni et al., 2020).

3. Attack Methodologies and Calibration

Advances in scalable and robust attacks have altered the landscape of practical auditing.

Quantile Regression: Replaces shadow models with efficient, black-box quantile regressors or ensembles, enabling high-performance MIA on large LLMs at novel orders-of-magnitude cost reduction and without architecture/tokenizer knowledge (Bertran et al., 2023, Zhang et al., 22 Sep 2024).
Statistical Divergence/JS Distance: Across vision/classification tasks, the Jensen-Shannon distance between the entropy or cross-entropy distributions of members and non-members is a universal, attack-independent predictor of MIA risk, with attack accuracy scaling nearly linearly with JS divergence (He et al., 2022).
Discrepancy-Based Metrics: Discrepancy—the supremum gap between output distributions over a large family of scoring sets—provides a tight, scalable upper bound on attack advantage, surpassing classic loss/probability methods in revealing privacy risk under modern data augmentations and training routines (2405.15140).
Correlation-Aware Sequence Auditing: Sequence models require explicitly modeling within-sequence correlations; multivariate covariance-aware attacks (e.g., OAS estimator) substantially outperform naive, scalar-loss approaches, exposing far more memorization at fixed FPR (Rossi et al., 5 Jun 2025).
Calibration and Automated Thresholding: Automatic calibration of probability scores via temperature scaling (ACMIA) significantly enhances both the robustness and accuracy of attack signals, especially in the context of long texts or paraphrased data (Zade et al., 6 May 2025). Practical success crucially depends on robust, domain- and model-aware thresholding (Chen et al., 18 Dec 2024).

4. Defenses, Regularization, and Utility-Risk Trade-offs

Fairness and Mutual Information Regularization: Regularizing group/predictive/individual fairness can reduce MIA accuracy by up to 25% without sacrificing utility (Tonni et al., 2020). Classic L1/L2 regularization alone is less effective.
Privacy-Aware Sparsity Tuning (PAST): Privacy can be improved by targeting regularization to parameters that most influence privacy leakage, determined by the gradient of the loss gap between members and non-members. This focused approach outperforms uniform penalties and reduces attack advantage by up to 65% with negligible impact on accuracy (Hu et al., 9 Oct 2024).
Data Augmentation and Label Smoothing: Modern augmentations powerfully reduce attack success, particularly when defenses and attacks are adaptively matched (He et al., 2022). MixUp- and RelaxLoss-trained models require custom, training-aware attack scores for proper vulnerability assessment (2405.15140).

5. Emerging Paradigms: LLMs, Sequential Data, and Granularity

Recent work highlights paradigm shifts in how MIA scaling manifests on large generative models:

Granularity of Membership Claims: For LLMs, per-paragraph or per-sentence membership claims often fail (AUROC $\sim$ 0.5), but aggregating across documents or large collections enables strong MIAs (AUROC $>$ 0.9) (Puerto et al., 31 Oct 2024). This insight directly informs copyright auditability.
In-Context Learning: For LLMs performing in-context learning, the vulnerability to MIAs scales inversely with the number of demonstrations per prompt: attacks are strongest in short prompts and at the first/last demonstration positions (Wen et al., 2 Sep 2024).
Scaling and Uncertainty: In deep learning, epistemic uncertainty decays as a power law ( $\sim O(1/N)$ ) with data, but rarely vanishes completely in practice. This implies persistent, slowly decreasing MIA risk, even as models scale; full elimination of privacy leakage is not guaranteed even with extremely large $N$ (Rosso et al., 11 Jun 2025).

6. Practical Implications and Forward Directions

Audit and Certificate of Privacy: Modern auditing and membership inference risk measurement at scale must combine statistical discrepancy estimation, robust quantile-based calibration, and correlation-aware aggregation, especially as models and datasets grow in size and sophistication.
Privacy Guarantees and Differential Privacy: The observed scaling laws motivate matching privacy guarantees (e.g., $(\epsilon,\delta)$ -DP) via practical risk prediction, and highlight challenges of preserving utility in small-data/high-class settings (Tobaben et al., 7 Feb 2024).
Bridging Empirical and Theoretical Scaling Laws: Ongoing work seeks to reconcile scaling laws observed empirically (e.g., via JS distance, power law decay) with theoretical properties of learning and generalization, and to develop universally valid predictors of MIA risk.
Defensive Innovations: Continued push is required on creative defenses—data synthesis, advanced regularization, training-procedure-aware loss design, and adaptation to sequential and federated settings.

7. Summary Table: Key Axes of Scaling for Membership Inference

Scaling Axis	Empirical Law or Trend	Key Reference(s)
Dataset Examples per Class ( $S$ )	$\text{tpr} \propto S^{\beta_S}\ (\beta_S < 0)$	(Tobaben et al., 7 Feb 2024)
Number of Classes ( $C$ )	$\text{tpr} \propto C^{\beta_C}\ (\beta_C > 0)$	(Tobaben et al., 7 Feb 2024)
Model Size / Depth	Non-monotonic: increases up to a point, then can decrease	(Chen et al., 18 Dec 2024, Hayes et al., 24 May 2025)
Aggregation over Units	AUROC compounds, $\sim$ (base AUROC) ${}^{1/\sqrt{K}}$	(Puerto et al., 31 Oct 2024)
JS Distance (Entropy/Cross-Entropy)	$\text{Attack Accuracy} \sim a\,\mathrm{JS} + b$	(He et al., 2022)
Reference Model Count	Increases MIA power up to saturation	(Bertran et al., 2023, Hayes et al., 24 May 2025)
Sequence Correlation Utilized	MIA power rises dramatically when modeled	(Rossi et al., 5 Jun 2025)
Epistemic Uncertainty ( $N$ )	$\sim O(1/N^\gamma)$ contraction, never vanishes entirely	(Rosso et al., 11 Jun 2025)

The paper of scaling laws for membership inference has evolved from classical overfitted models to the most recent LLMs and transfer learning pipelines. Key findings emphasize that privacy risk can increase or decrease as a function of model/data scale, data structure, and aggregation strategy—not always monotonically, and often in ways that require careful, context-specific analysis. Bridging this quantitative understanding with practical defenses and responsible deployment remains an active area of research.