Papers
Topics
Authors
Recent
Search
2000 character limit reached

Projection-Based Zeroth-Order Fed SGD

Updated 23 June 2026
  • Projection-based zeroth-order federated SGD is a technique that approximates gradients using finite differences and projections onto subspaces derived from historical updates.
  • It leverages QR decomposition to construct promising subspaces, balancing exploration and exploitation through non-isotropic sampling in distributed nonconvex optimization.
  • The method achieves provable convergence and efficient communication, demonstrating its practical value on diverse tasks including CNN training and manifold-constrained optimization.

Projection-based zeroth-order federated stochastic gradient descent (SGD) refers to a class of optimization algorithms for federated learning that estimate gradients from function values (zeroth-order information) using randomized projections, often leveraging subspace structure to improve efficiency and convergence. These methods are of particular significance when explicit gradients are unavailable, and their development bridges zeroth-order optimization, projection techniques, and distributed stochastic optimization under data and system heterogeneity (Wu et al., 2024, Akhavan et al., 25 Sep 2025, Wang et al., 30 Jul 2025, Jang et al., 2024).

1. Problem Formulation and Zeroth-Order Oracle Model

The goal is typically federated minimization of a global objective:

F(x)=1Mi=1Mfi(x),xRdF(x) = \frac{1}{M} \sum_{i=1}^{M} f_i(x), \qquad x \in \mathbb{R}^d

where fif_i is client ii’s (potentially nonconvex) local objective, e.g., fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]. In zeroth-order federated settings, clients lack access to fi\nabla f_i and are limited to querying scalar function values, for example:

  • Two-point finite difference: fi(x+hu)fi(xhu)f_i(x + h u) - f_i(x - h u)
  • One-point finite difference: fi(x+hu)fi(x)f_i(x + h u) - f_i(x)

The typical gradient surrogate at xx using a two-point estimate for a unit vector uu (possibly randomized) is:

gi(x)=fi(x+hu)fi(xhu)2h ug_i(x) = \frac{f_i(x + h u) - f_i(x - h u)}{2h}\ u

The overall protocol entails distributed estimation and aggregation of such surrogates to update fif_i0. Projection-based approaches modify the randomization mechanism to emphasize subspaces believed to contain significant descent directions (Wu et al., 2024).

2. Projection Subspace Construction from Historical Trajectories

A central innovation in recent work is the use of non-isotropic sampling guided by the optimization trajectory history:

  • Trajectory matrix: At round fif_i1, form increments fif_i2. Collect the last fif_i3 increments into fif_i4.
  • Basis via QR decomposition: Apply thin QR: fif_i5 with fif_i6 (fif_i7), fif_i8. The columns of fif_i9 span a "promising" subspace based on recent optimization progress.
  • Projectors: ii0 projects onto this subspace, ii1 onto its orthogonal complement.

This data-driven subspace is leveraged for sampling and regularization in gradient estimation, with the intent to enhance both exploitation of known good directions and exploration of new directions (Wu et al., 2024).

3. Non-Isotropic Gradient Estimation via Projections

Projections inform the covariance structure for direction sampling in zeroth-order estimation:

  • Sampling covariance: ii2 for trade-off parameter ii3.
  • Sampling directions: Sample ii4, ii5,

ii6

Normalize if desired.

This mechanism allows gradient estimates to concentrate on historically relevant subspaces when ii7 is large, while still enabling search in the full space when ii8 is small. The estimator remains unbiased for the smoothed gradient associated with the resulting Gaussian (Wu et al., 2024). For comparison, other projection-based federated zeroth-order methods exploit different randomness distributions, such as the uniform measure on the ii9-sphere for refined concentration properties (Akhavan et al., 25 Sep 2025), or tangent-space projections for Riemannian constraints (Wang et al., 30 Jul 2025).

4. Algorithmic Protocols and Computational Structure

The generic projection-based zeroth-order FedSGD protocol (Wu et al., 2024) comprises:

  1. Model broadcast: Server shares fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]0 with selected clients.
  2. Subspace refresh: Every fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]1 rounds, server computes fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]2 and shares to clients.
  3. Local updates: Each client constructs fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]3, runs fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]4 local stochastic steps using projected zeroth-order directions, and returns updated fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]5.
  4. Aggregation: Server averages client models, records the increment for future fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]6 construction.

Key computational overheads include the QR decomposition for the projection subspace (fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]7 amortized), and fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]8 per-sample cost for subspace-based sampling. This overhead is negligible for practical settings where fi(x)=EξDi[F(x;ξ)]f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]9.

Related projection-based zeroth-order federated protocols include:

  • FedZero (Akhavan et al., 25 Sep 2025): uses projection onto constraints fi\nabla f_i0 at each server update, with fi\nabla f_i1-sphere-based randomization for improved dimension dependence.
  • Riemannian projection-based ZO-FL (Wang et al., 30 Jul 2025): projection onto curved feasible sets (e.g., matrix manifolds), with random perturbations in the ambient Euclidean space and corrections for statistical heterogeneity.
  • Fed-ZOE (Jang et al., 2024): applies random projection compression to local update vectors for communication-efficient over-the-air aggregation but uses first-order local training, offering a contrast to “full” zeroth-order protocols.

5. Theoretical Guarantees and Bias-Variance Trade-Offs

Rigorous convergence analyses are available for projection-based zeroth-order federated SGD under various assumptions:

  • Nonconvex setting (Wu et al., 2024): Under fi\nabla f_i2-smooth local objectives and bounded heterogeneity and sampling variance, with sufficiently small fi\nabla f_i3 and appropriate stepsizes, the expected squared norm of the global gradient is bounded as

fi\nabla f_i4

with more precise bounds incorporating fi\nabla f_i5 and the subspace dimension. The two-point estimator remains unbiased for the smoothed gradient, with the second moment depending on fi\nabla f_i6 and the sampling structure.

  • Convex case and high-probability bounds (Akhavan et al., 25 Sep 2025): For constraint sets and fi\nabla f_i7-randomized estimators, the excess loss achieves rates fi\nabla f_i8 up to logarithmic factors, matching information-theoretic minimax lower bounds for federated zeroth-order optimization.
  • Riemannian manifolds (Wang et al., 30 Jul 2025): For projection-based zeroth-order methods on manifolds, convergence to a stationary point proceeds at fi\nabla f_i9 with query complexity trade-offs dictated by the estimator batch size and geometric constants.

Bias-variance decompositions reveal trade-offs between exploration (full-space search, reducing bias) and exploitation (subspace-centric search, reducing variance). Moderate values of the mixing parameter (fi(x+hu)fi(xhu)f_i(x + h u) - f_i(x - h u)0) empirically offer the best balance in projection-based finite-difference sampling (Wu et al., 2024).

6. Applications, Empirical Findings, and Protocol Comparisons

Extensive numerical validation has been conducted:

  • Tabular benchmarks: Logistic regression, SVM, and MLP on MNIST, Fashion-MNIST, and RCV1 (under IID and non-IID splits) consistently show that projection-based methods (fi(x+hu)fi(xhu)f_i(x + h u) - f_i(x - h u)1 moderate, small fi(x+hu)fi(xhu)f_i(x + h u) - f_i(x - h u)2) accelerate convergence (fewer function calls) relative to isotropic ZO-variants (Wu et al., 2024).
  • Sparsity effects: On highly sparse tasks (e.g., RCV1), isotropic sampling can be competitive; on dense tasks, projection-based sampling confers clear advantages.
  • Manifold-constrained FL: Projection-based zeroth-order estimators accelerate convergence in kPCA and low-rank MLP training on Stiefel and low-rank manifolds, achieving comparable rates to first-order methods while reducing tangent-space computation (Wang et al., 30 Jul 2025).
  • Over-the-air FL: Fed-ZOE and related protocols apply projection-based compression to local model updates for substantial uplink reduction—e.g., fi(x+hu)fi(xhu)f_i(x + h u) - f_i(x - h u)3 of symbols relative to full-size communication—while maintaining baseline test accuracy (Jang et al., 2024).

A summary table of key empirical results from (Jang et al., 2024):

Method CIFAR-10 SVHN Tiny-ImageNet CIFAR-100 Brain-CT
Fed–OtA (100% symbols) 93.0% 95.4% 72.1% 74.3% 85.2%
LoRA-OtA (10%) 91.8% 94.7% 69.0% 71.5% 83.3%
ZO-OtA (100%) 88.5% 92.1% 65.2% 68.0% 80.5%
Fed–ZOE (0.07%) 92.6% 95.0% 71.0% 73.5% 84.0%

Comm. Load is normalized to Fed–OtA full uplink.

7. Variants and Extensions

Variants of projection-based zeroth-order federated SGD address distinct constraints and system architectures:

  • Manifold constraints: Riemannian zeroth-order optimization with Euclidean perturbations and projection onto manifolds enables gradient-free FL under non-Euclidean model constraints (Wang et al., 30 Jul 2025).
  • High-probability guarantees: Utilizing fi(x+hu)fi(xhu)f_i(x + h u) - f_i(x - h u)4-sphere randomization yields tighter concentration properties and improved high-probability regret bounds in federated convex ZO-SGD (Akhavan et al., 25 Sep 2025).
  • Communication compression: Over-the-air protocols such as Fed-ZOE leverage projection-based compression, achieving both communication and computational reductions through low-dimensional random sketches of model updates (Jang et al., 2024).
  • Hybrid approaches: Some recent methods perform first-order updates locally and use projection-based zeroth-order compression for transmission, combining the computational advantages of first-order optimization with the bandwidth efficiency of zeroth-order sketching.

A plausible implication is that ongoing research will increasingly hybridize projection-based zeroth-order techniques with first-order methods, especially when communication is the principal bottleneck.


References: (Wu et al., 2024, Akhavan et al., 25 Sep 2025, Wang et al., 30 Jul 2025, Jang et al., 2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Projection-Based Zeroth-Order Federated SGD.