DP Stochastic Convex Optimization
- DP-SCO is the study of algorithms that minimize expected convex loss under differential privacy constraints, ensuring each data point has a negligible impact on the outcome.
- The methodology includes techniques like calibrated noisy SGD, output perturbation, and exponential mechanisms that achieve near-optimal minimax excess risk rates over bounded convex functions.
- Implications of DP-SCO include a clearly defined privacy-utility tradeoff that depends on sample size, data dimension, and privacy parameters, guiding privacy-aware algorithm design.
Differentially Private Stochastic Convex Optimization (DP-SCO) is the study of algorithms that minimize the expected loss of a convex function over a dataset of i.i.d. samples, subject to differential privacy (DP) constraints. In DP-SCO, the learning algorithm receives a sample from an unknown distribution and provides an output (model or hypothesis) such that the population risk is minimized while ensuring that any single data point (or, in the user-level setting, any block of data corresponding to one user) contributes negligibly to the output, in accordance with differential privacy.
1. Definition and Foundational Problem Setup
A DP-SCO instance consists of a convex loss function (with convex for every ), a convex constraint set , and i.i.d. samples . The statistical goal is to minimize the population risk
by computing an output with small excess risk: A randomized algorithm is -DP if for any two datasets and differing in one data point, the distributions of and are close in the sense of differential privacy. Analogously, user-level DP requires privacy when datasets differ in one entire user's data block.
2. Minimax Rates and Core Utility–Privacy Tradeoffs
For Lipschitz convex losses over bounded domains (-Lipschitz, diameter ), the minimax excess risk rate for -DP is, up to logarithmic factors,
where is sample size, is dimension (Bassily et al., 2019). This rate is attained by several algorithmic strategies, notably calibrated noisy stochastic gradient descent (DP-SGD), output perturbation with sensitivity analysis, and variants of exponential mechanism or Gibbs sampling in general norms (Gopi et al., 2022).
For heavy-tailed losses (finite -th moment of gradient norms), rates interpolate: [ O\left(G_2\frac{1}{\sqrt{n}} + G_k\left(\frac{\sqrt