Papers
Topics
Authors
Recent
Search
2000 character limit reached

Moments Accountant in Differential Privacy

Updated 24 December 2025
  • Moments accountant is a framework that rigorously tracks cumulative privacy loss in adaptive differential privacy mechanisms using log-moment generating functions.
  • It underpins DP-SGD by enabling tighter privacy accounting than classical composition, leading to improved privacy-utility tradeoffs in machine learning.
  • It integrates analytical and numerical optimizations, with refinements via Rényi differential privacy and f-divergences for practical, scalable implementations.

A moments accountant is a framework for tightly tracking and composing the cumulative privacy loss in mechanisms satisfying differential privacy (DP), specifically designed for the analysis of adaptive compositions in differentially private machine learning. It allows for precise quantification of (ε,δ)(\varepsilon, \delta)-DP guarantees by maintaining the log-moment generating function (MGF) of the privacy-loss random variable, rather than relying on classical, looser composition theorems. Moments accountant techniques are now foundational in the practice of private stochastic gradient descent (DP-SGD) and are continually refined through advances in Rényi differential privacy (RDP) and information-theoretic optimization over ff-divergences.

1. Formal Definition and Theoretical Foundations

Let MM be a randomized mechanism with input dDnd \in D^n and output in some range R\mathcal{R}. For adjacent datasets d,dd, d', and any auxiliary input, define the privacy-loss random variable for outcome oRo \in \mathcal{R}:

c(o;M,d,d)=ln(Pr[M(d)=o]Pr[M(d)=o])c(o; M, d, d') = \ln \left( \frac{\Pr[M(d) = o]}{\Pr[M(d') = o]} \right)

The λ\lambdath log-moment of the privacy-loss random variable is:

αM(λ;d,d)=lnEoM(d)[exp(λc(o;M,d,d))]\alpha_M(\lambda; d, d') = \ln \mathbb{E}_{o \sim M(d)} \left[ \exp \left( \lambda c(o; M, d, d') \right) \right]

The moments accountant records the worst-case log-moment across all auxiliary inputs and neighboring datasets:

αM(λ)=maxdd  αM(λ;d,d)\alpha_M(\lambda) = \max_{d \sim d'} \; \alpha_M(\lambda; d, d')

For adaptive compositions M=M1M2MkM = M_1 \circ M_2 \circ \cdots \circ M_k, the log-moments are additive:

αM(λ)i=1kαMi(λ)\alpha_M(\lambda) \le \sum_{i=1}^k \alpha_{M_i}(\lambda)

The key mechanism for extracting (ε,δ)(\varepsilon, \delta)-DP guarantees is the tail bound:

δ=infλ>0exp[αM(λ)λε]\delta = \inf_{\lambda > 0} \exp[\alpha_M(\lambda) - \lambda \varepsilon]

This approach subsumes and improves on basic and advanced composition theorems, ensuring the tightest possible privacy accounting for a given sequence of private operations (Abadi et al., 2016).

2. Relation to Rényi Differential Privacy and ff-divergences

The moments accountant framework generalizes naturally to Rényi Differential Privacy (RDP), defined as follows: a mechanism MM satisfies (α,ϵ)(\alpha, \epsilon)-RDP for α>1\alpha > 1 if for adjacent S,SS, S':

Dα(M(S)M(S))ϵD_{\alpha}(M(S) \parallel M(S')) \leq \epsilon

where

Dα(PQ)=1α1logExQ[(P(x)Q(x))α]D_{\alpha}(P \parallel Q) = \frac{1}{\alpha-1} \log \mathbb{E}_{x \sim Q} \left[ \left( \frac{P(x)}{Q(x)} \right)^\alpha \right]

Let K(λ)K(\lambda) denote the cumulant generating function (CGF) of the privacy-loss random variable:

K(λ)=logE[eλL]K(\lambda) = \log \mathbb{E}[e^{\lambda L}]

There is a direct equivalence:

M is (α,ϵ)-RDP    K(α1)(α1)ϵM \text{ is } (\alpha, \epsilon)\text{-RDP} \iff K(\alpha-1) \leq (\alpha-1)\epsilon

RDP facilitates straightforward composition (CGFs add), and the translation back to (ε,δ)(\varepsilon, \delta)-DP is performed via bisection/minimization over λ\lambda as above (Wang et al., 2018, Abadi et al., 2016).

Recent advances relate these accounting schemes to information-theoretic quantities, in particular Csiszár’s ff-divergences. For example, the hockey-stick (or EλE_\lambda) divergence and the χα\chi^\alpha-divergence play key roles:

  • (ε,δ)(\varepsilon, \delta)-DP is characterized by bounds on EeεE_{e^\varepsilon}.
  • (α,γ)(\alpha, \gamma)-RDP is characterized by bounds on DαD_\alpha, equivalently χα\chi^\alpha (Asoodeh et al., 2020).

3. Algorithmic Implementation and Analytical Moments Accountant (AMA)

The Analytical Moments Accountant (AMA) maintains and updates the CGFs for all contributing mechanisms, supporting arbitrary adaptive and subsampled compositions. The essential steps are:

  1. Track for each mechanism MiM_i a symbolic or oracle CGF Ki(λ)K_i(\lambda).
  2. For data subsampling (e.g., SGD with minibatching), upper-bound the RDP parameter of the subsampled mechanism using the tight amplification theorem:

ϵMsub(α)1α1log(1+γ2(α2)m2+j=3αγj(αj)mj)\epsilon_{\text{M} \circ \text{sub}}(\alpha) \leq \frac{1}{\alpha - 1} \log\left( 1 + \gamma^2 { \alpha \choose 2 } m_2 + \sum_{j=3}^\alpha \gamma^j { \alpha \choose j } m_j \right)

where mjm_j depends on the RDP constants eϵ(j)e^{\epsilon(j)}.

  1. In AMA, composition is performed by adding CGFs:

Ktotal(λ)Ktotal(λ)+Ki(λ)K_{\text{total}}(\lambda) \leftarrow K_{\text{total}}(\lambda) + K_i(\lambda)

  1. (ε,δ)(\varepsilon, \delta)-DP guarantees are recovered by minimizing over λ\lambda as described above.

Optimized implementations exploit log-domain operations (log-sum-exp, geometric tail truncation), ensure numerical convexity, and support O(logα)O(\log \alpha) time per query (Wang et al., 2018).

4. Practical Applications in Differentially Private Learning

Moments accountant techniques, and their generalizations (including AMA), are central to the privacy analysis of private stochastic gradient descent (DP-SGD):

  • At each iteration, the per-step privacy cost is quantified as a log-moment (in practice, using a precomputed grid of λ\lambda values).
  • After TT iterations, the total log-moment is αT(λ)=t=1Tαstep(λ)\alpha_T(\lambda) = \sum_{t=1}^T \alpha_{\text{step}}(\lambda), with privacy guarantees extracted by minimizing exp(αT(λ)λε)\exp(\alpha_T(\lambda) - \lambda \varepsilon) over λ>0\lambda > 0.
  • Empirically, moments accountant bounds permit an order-of-magnitude more training steps or sharper privacy–utility tradeoffs compared to classical composition. For example: with q=0.01q = 0.01, σ=4\sigma = 4, and k=10,000k = 10,000, the moments accountant yields ε1.3\varepsilon \approx 1.3 compared to ε9.3\varepsilon \approx 9.3 for strong composition (Abadi et al., 2016).

Comprehensive empirical studies on MNIST and CIFAR-10 validate these improvements for deep learning with non-convex objectives, showing competitive test accuracy under modest privacy budgets.

5. Recent Refinements: Optimal ff-divergence Conversion and Tighter Bounds

Asoodeh et al. (Asoodeh et al., 2020) present an information-theoretically optimal extension of the moments accountant, leveraging the joint range of ff-divergences for tighter conversion from RDP to (ε,δ)(\varepsilon,\delta)-DP:

  • The minimal δ\delta such that an (α,γ)(\alpha, \gamma)-RDP mechanism is (ε,δ)(\varepsilon, \delta)-DP is

δα,ε(γ)=sup{Eλ(PQ)  :  χα(PQ)χ(γ),λ=eε}\delta^{*}_{\alpha,\varepsilon}(\gamma) = \sup\{ E_\lambda(P\|Q) \; : \; \chi^\alpha(P\|Q) \leq \chi(\gamma), \lambda = e^\varepsilon \}

  • The refined moments accountant bound replaces the classical loose Markov-based conversion with

δ(ε)=infα>1δα,ε(Tγ(α))\delta(\varepsilon) = \inf_{\alpha > 1} \delta^*_{\alpha, \varepsilon}(T \gamma(\alpha))

where TT is the number of SGD steps, and γ(α)\gamma(\alpha) encodes the per-step RDP guarantee.

  • For fixed privacy budget, refined bounds allow up to 100 extra iterations for DP-SGD under standard settings, representing a substantial utility gain.

The classical conversion is a one-sided Chernoff/Markov bound, whereas the refined analysis is information-theoretically tight, optimizing over the entire joint range of the relevant ff-divergences.

6. Comparative Empirical Performance and Practical Impact

Empirical evaluation under fixed privacy budgets demonstrates that the moments accountant enables more accurate, longer training regimes:

Method Maximum SGD Steps for ε=1\varepsilon=1
Classical Accountant 4200
Refined Bound (Asoodeh et al., 2020) 4305

For MNIST (60k examples) and CIFAR-10 (50k examples), experiments confirm that the moments accountant allows for high test accuracy even at small ε\varepsilon, with hyperparameters such as batch size and noise level σ\sigma influencing the privacy–utility tradeoff (Abadi et al., 2016).

7. Algorithmic and Numerical Considerations

Efficient implementation of the moments accountant, especially in the context of diverse or subsampled mechanisms, involves:

  • Maintaining CGFs in closed or symbolic form, or as an oracle interface.
  • Performing optimizations to extract privacy guarantees via (quasi-)convex minimization over λ\lambda/α\alpha using bisection, exploiting the convexity and monotonicity properties of the CGFs.
  • Implementing log-domain arithmetic to prevent overflow or loss of precision.
  • Where possible, leveraging geometric truncation and log-sum-exp approximations for scalable evaluation with large α\alpha or step counts.

Pseudocode frameworks for both vanilla and analytical moments accountant workflows are detailed in (Abadi et al., 2016, Wang et al., 2018), supporting practical integration into private machine learning workflows.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Moments Accountant.