Bayes Optimal Strategies for Membership Inference

Updated 18 April 2026

The paper presents a Bayes-optimal decision rule that minimizes inference error using likelihood ratio tests under probabilistic training assumptions.
It shows that scalar loss, under exponential-family models, is asymptotically sufficient for membership inference, bridging white-box and black-box methods in many settings.
It outlines practical approximations like shadow-model Monte Carlo and Bayesian variance inference that improve MI attack performance while reducing computational complexity.

Bayes optimal strategies for membership inference constitute a principled framework for inferring whether a data point or dataset participated in training a deployed machine learning model. The Bayes-optimal attack is the decision rule that minimizes expected inference error (Bayes risk) under specified probabilistic assumptions about the training process, model parameters, and data-generating distributions. These strategies are fundamental to privacy auditing and inform both theoretical privacy guarantees and the practical design of membership inference attacks (MIAs) across modern ML pipelines.

1. Formal Problem Setup and Bayesian Decision Rule

Consider an observed data-generating process: i.i.d. samples $z_i \sim \mathcal{D}$ , with a subset $D = \{z_i: m_i = 1\}$ of records selected for training. The learning algorithm samples model parameters $\theta$ from a posterior $p(\theta|D)$ , which may take exponential-family (temperature-regularized) form: $p(\theta|D) \propto \exp\left(-\frac{1}{T} \sum_{i: m_i=1} \ell(\theta, z_i)\right)$ where $\ell(\theta, z)$ is the per-sample loss and $T$ is a "temperature" hyperparameter. The aim is, given access to $\theta$ and a test point $z^*$ , to test the binary hypotheses: $H_1: z^* \in D \qquad H_0: z^* \not\in D$ The Bayes-optimal (minimum risk) rule is a likelihood ratio (LR) test: $D = \{z_i: m_i = 1\}$ 0 and the corresponding membership posterior: $D = \{z_i: m_i = 1\}$ 1 Thresholding this posterior at $D = \{z_i: m_i = 1\}$ 2 (for equal cost false positives/negatives) yields the optimal decision under 0-1 loss (Sablayrolles et al., 2019, Huang, 31 May 2025, Lassila et al., 30 May 2025).

2. Reduction to Sufficient Statistics and Exponential-Family Likelihoods

In many canonical learning settings, particularly under the exponential family or regular Bayesian posteriors, the Bayes-optimal test depends on model parameters and the test sample only through a low-dimensional statistic. For deep learning models or regularized empirical risk minimization, it is often—and in some regimes, provably—sufficient to consider the loss evaluated at $D = \{z_i: m_i = 1\}$ 3: $D = \{z_i: m_i = 1\}$ 4 where

$D = \{z_i: m_i = 1\}$ 5

Notably, this implies that access to the scalar loss is asymptotically sufficient for optimal membership inference—the white-box (full parameter) and black-box (loss-only) settings become equivalent for a broad class of models (Sablayrolles et al., 2019, Lassila et al., 30 May 2025). In more general parametric models where the posterior over $D = \{z_i: m_i = 1\}$ 6 is not conditioned via exponential family structure, the optimal attack depends on higher-order statistics—see Section 7.

Expanding on this, the exponential-family log-likelihood ratio (LLR) framework (Brännvall, 12 Mar 2026) generalizes MIA scoring rules: $D = \{z_i: m_i = 1\}$ 7 for scalar summaries $D = \{z_i: m_i = 1\}$ 8 (e.g., loss, logit, prediction confidence), where each distribution $D = \{z_i: m_i = 1\}$ 9 (for $\theta$ 0) is modeled in the exponential family, allowing for Bayesian updating of the parameters.

3. Practical Approximations: Shadow Models, Bayesian Inference, and Variance Estimation

Due to the intractability of evaluating expectations over high-dimensional model posteriors, several practical approximations of the Bayes-optimal MIA have been developed:

Shadow Model Monte Carlo (BASE, G-BASE, LiRA, RMIA): Approximate the expectation in $\theta$ 1 by training shadow models on random data splits and evaluating the relevant statistic. For graph-structured data, the optimal attack involves also marginalizing over neighboring node memberships, addressed by further Monte Carlo (MCMC) sampling (Lassila et al., 30 May 2025, Brännvall, 12 Mar 2026).
Per-sample and Global Thresholds (MAST, MALT): Assume that the expected attack statistic (e.g., expected loss absent $\theta$ 2) is either per-sample (MAST) or globally constant (MALT), recovering classic loss-threshold attacks and enabling fast black-box implementation (Sablayrolles et al., 2019).
Bayesian Predictive Inference (BaVarIA, BMIA): Employ conjugate priors to provide stabilized estimates of the mean and variance of attack-score distributions, yielding a robust log-likelihood ratio, especially when shadow-model budgets are small. BaVarIA, using a normal-inverse-gamma prior, produces a Student-t predictive for each point, avoiding overfitting and instability (Brännvall, 12 Mar 2026).
Laplace-Approximate Bayesian Neural Networks (BMIA): Model epistemic and aleatoric uncertainty in the attack scores using a Laplace approximation of the posterior, thereby directly estimating the conditional score distribution via a single reference model (Liu et al., 10 Mar 2025).

Empirically, shadow-model-based methods (BASE, RMIA) match or surpass previous state-of-the-art performance at a lower computational cost, while Bayesian variance inference (BaVarIA) yields the highest AUC under small shadow-model budgets and a smooth trade-off between per-sample and pooled variance estimation (Lassila et al., 30 May 2025, Brännvall, 12 Mar 2026).

4. Bayesian Optimality on Datasets and Detection of Distribution Shift

Beyond single-point inference, Bayesian decision theory extends naturally to dataset-level membership inference. By extracting a vector of distributional metrics (error, entropy, perturbation, etc.) and modeling the likelihood of feature-vector metrics under member vs. non-member hypotheses, one directly computes the posterior probability of dataset membership: $\theta$ 3 This approach affords exact Bayes-optimality under model assumptions, requires a single trained model, and produces fully interpretable posterior probabilities of membership. It also supports distribution shift detection as an auxiliary function, with experimental results reporting near-perfect separation of member from non-member sets (Huang, 31 May 2025).

5. Differential Privacy, Adversarial Success, and Information-Theoretic Bounds

Differential privacy (DP), especially via the Gaussian mechanism or DP-SGD, aims to bound information leakage via MI attack advantage. The asymptotic, Bayes-optimal adversary achieves an advantage equal to the total variation distance between the parameter outputs with and without a candidate record: $\theta$ 4 For (sampled) Gaussian mechanisms, explicit formulas relate the advantage to the mean shift and noise variance: $\theta$ 5 where $\theta$ 6 is the normal CDF. This gives far tighter membership-inference bounds than direct conversion from $\theta$ 7-DP, in closer agreement with empirical MI attack rates (Mahloujifar et al., 2022).

6. Hierarchy of Score-Based Attacks and Unified LLR Formalism

Recent work has unified the landscape of score-based attacks (including RMIA, LiRA, BASE) by casting them as plug-in approximations to the exponential-family LLR test. The “BASE1–BASE4” hierarchy interpolates between fully pooled and fully individualized parameter estimation:

BASE1 (RMIA): Loss-centered, global mean/variance
BASE4 (LiRA): Pointwise mean/variance estimation
BaVarIA-t/n: Bayesian posterior-predictive using Student-t or stabilized Gaussian variance

This hierarchy enables robust MIA performance, particularly at small shadow-model budgets, and removes the need for ad-hoc parameter switching, with Student-t posteriors automatically accounting for variance estimation uncertainty (Brännvall, 12 Mar 2026).

Attack	Statistic	Variance Estimation	Regime
BASE1/RMIA	Loss	Global pooled	Online, pooled
BASE4/LiRA	Logit/loss	Per-sample	Online, pointwise
BaVarIA-t/n	Any scalar z	Bayesian (NIG)	Any, robust

7. White-Box vs Black-Box: Theoretical Limits and the Role of Model Parameters

The claim that white-box and black-box MIAs are equally powerful (in the Bayes-optimal sense) holds under certain generative model assumptions, especially when the loss is a sufficient statistic for the model posterior (Sablayrolles et al., 2019). However, for stochastic gradient descent (SGD) and in settings where parameter distributions are nontrivially structured, access to the full model parameters provably increases the power of the optimal attack. Here, the Bayes-optimal test is a likelihood ratio on the high-dimensional parameter vector, reducing to: $\theta$ 8 where $\theta$ 9 for $p(\theta|D)$ 0. The operational attack (Inverse Hessian Attack, IHA) requires computation of inverse-Hessian vector products, realizable via iterative solvers such as conjugate gradients: $p(\theta|D)$ 1 In practice, the white-box attack strictly dominates the black-box (loss-only) test, except when the scalar statistic is sufficient for the model parameter (Suri et al., 2024). This establishes the theoretical limit: the Bayes-optimal membership risk is strictly lower for white-box access under SGD-trained models.

8. Empirical Evaluation and Practical Recommendations

Across tasks (CIFAR-10, ImageNet, tabular, graphs), Bayes-optimal or near-optimal MIAs (BASE, G-BASE, BaVarIA, BMIA, IHA) empirically outperform previous state-of-the-art methods, achieving increased true positive rates at conservatively low false positive rates and often reducing computational costs via variance-stabilized or single-model Bayesian inference:

BASE/G-BASE match RMIA/LiRA but avoid extra passes over large holdout sets (Lassila et al., 30 May 2025, Brännvall, 12 Mar 2026)
BaVarIA improves AUC, especially in low-budget settings (K ≤ 16) (Brännvall, 12 Mar 2026)
White-box IHA yields strictly greater accuracy than loss-only black-box tests, especially under realistic SGD settings (Suri et al., 2024)

A key practical recommendation is to select pooling/variance estimation strategies to fit the available shadow-model budget and to prefer conjugate Bayesian updates over ad-hoc variance switching. For auditing or adversarial applications where white-box access is available, parameter-based (IHA) attacks are strictly preferable to classic loss-thresholding mechanisms.

References

(Sablayrolles et al., 2019) White-box vs Black-box: Bayes Optimal Strategies for Membership Inference
(Suri et al., 2024) Do Parameters Reveal More than Loss for Membership Inference?
(Liu et al., 10 Mar 2025) Efficient Membership Inference Attacks by Bayesian Neural Network
(Lassila et al., 30 May 2025) Practical Bayes-Optimal Membership Inference Attacks
(Huang, 31 May 2025) Bayesian Inference of Training Dataset Membership
(Brännvall, 12 Mar 2026) Exponential-Family Membership Inference: From LiRA and RMIA to BaVarIA
(Mahloujifar et al., 2022) Optimal Membership Inference Bounds for Adaptive Composition of Sampled Gaussian Mechanisms