Projection-Based Zeroth-Order Fed SGD

Updated 23 June 2026

Projection-based zeroth-order federated SGD is a technique that approximates gradients using finite differences and projections onto subspaces derived from historical updates.
It leverages QR decomposition to construct promising subspaces, balancing exploration and exploitation through non-isotropic sampling in distributed nonconvex optimization.
The method achieves provable convergence and efficient communication, demonstrating its practical value on diverse tasks including CNN training and manifold-constrained optimization.

Projection-based zeroth-order federated stochastic gradient descent (SGD) refers to a class of optimization algorithms for federated learning that estimate gradients from function values (zeroth-order information) using randomized projections, often leveraging subspace structure to improve efficiency and convergence. These methods are of particular significance when explicit gradients are unavailable, and their development bridges zeroth-order optimization, projection techniques, and distributed stochastic optimization under data and system heterogeneity (Wu et al., 2024, Akhavan et al., 25 Sep 2025, Wang et al., 30 Jul 2025, Jang et al., 2024).

1. Problem Formulation and Zeroth-Order Oracle Model

The goal is typically federated minimization of a global objective:

$F(x) = \frac{1}{M} \sum_{i=1}^{M} f_i(x), \qquad x \in \mathbb{R}^d$

where $f_i$ is client $i$ ’s (potentially nonconvex) local objective, e.g., $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ . In zeroth-order federated settings, clients lack access to $\nabla f_i$ and are limited to querying scalar function values, for example:

Two-point finite difference: $f_i(x + h u) - f_i(x - h u)$
One-point finite difference: $f_i(x + h u) - f_i(x)$

The typical gradient surrogate at $x$ using a two-point estimate for a unit vector $u$ (possibly randomized) is:

$g_i(x) = \frac{f_i(x + h u) - f_i(x - h u)}{2h}\ u$

The overall protocol entails distributed estimation and aggregation of such surrogates to update $f_i$ 0. Projection-based approaches modify the randomization mechanism to emphasize subspaces believed to contain significant descent directions (Wu et al., 2024).

2. Projection Subspace Construction from Historical Trajectories

A central innovation in recent work is the use of non-isotropic sampling guided by the optimization trajectory history:

Trajectory matrix: At round $f_i$ 1, form increments $f_i$ 2. Collect the last $f_i$ 3 increments into $f_i$ 4.
Basis via QR decomposition: Apply thin QR: $f_i$ 5 with $f_i$ 6 ( $f_i$ 7), $f_i$ 8. The columns of $f_i$ 9 span a "promising" subspace based on recent optimization progress.
Projectors: $i$ 0 projects onto this subspace, $i$ 1 onto its orthogonal complement.

This data-driven subspace is leveraged for sampling and regularization in gradient estimation, with the intent to enhance both exploitation of known good directions and exploration of new directions (Wu et al., 2024).

3. Non-Isotropic Gradient Estimation via Projections

Projections inform the covariance structure for direction sampling in zeroth-order estimation:

Sampling covariance: $i$ 2 for trade-off parameter $i$ 3.
Sampling directions: Sample $i$ 4, $i$ 5,

$i$ 6

Normalize if desired.

This mechanism allows gradient estimates to concentrate on historically relevant subspaces when $i$ 7 is large, while still enabling search in the full space when $i$ 8 is small. The estimator remains unbiased for the smoothed gradient associated with the resulting Gaussian (Wu et al., 2024). For comparison, other projection-based federated zeroth-order methods exploit different randomness distributions, such as the uniform measure on the $i$ 9-sphere for refined concentration properties (Akhavan et al., 25 Sep 2025), or tangent-space projections for Riemannian constraints (Wang et al., 30 Jul 2025).

4. Algorithmic Protocols and Computational Structure

The generic projection-based zeroth-order FedSGD protocol (Wu et al., 2024) comprises:

Model broadcast: Server shares $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 0 with selected clients.
Subspace refresh: Every $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 1 rounds, server computes $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 2 and shares to clients.
Local updates: Each client constructs $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 3, runs $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 4 local stochastic steps using projected zeroth-order directions, and returns updated $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 5.
Aggregation: Server averages client models, records the increment for future $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 6 construction.

Key computational overheads include the QR decomposition for the projection subspace ( $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 7 amortized), and $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 8 per-sample cost for subspace-based sampling. This overhead is negligible for practical settings where $f_i(x) = \mathbb{E}_{\xi \sim \mathcal{D}_i}[F(x; \xi)]$ 9.

Related projection-based zeroth-order federated protocols include:

FedZero (Akhavan et al., 25 Sep 2025): uses projection onto constraints $\nabla f_i$ 0 at each server update, with $\nabla f_i$ 1-sphere-based randomization for improved dimension dependence.
Riemannian projection-based ZO-FL (Wang et al., 30 Jul 2025): projection onto curved feasible sets (e.g., matrix manifolds), with random perturbations in the ambient Euclidean space and corrections for statistical heterogeneity.
Fed-ZOE (Jang et al., 2024): applies random projection compression to local update vectors for communication-efficient over-the-air aggregation but uses first-order local training, offering a contrast to “full” zeroth-order protocols.

5. Theoretical Guarantees and Bias-Variance Trade-Offs

Rigorous convergence analyses are available for projection-based zeroth-order federated SGD under various assumptions:

Nonconvex setting (Wu et al., 2024): Under $\nabla f_i$ 2-smooth local objectives and bounded heterogeneity and sampling variance, with sufficiently small $\nabla f_i$ 3 and appropriate stepsizes, the expected squared norm of the global gradient is bounded as

$\nabla f_i$ 4

with more precise bounds incorporating $\nabla f_i$ 5 and the subspace dimension. The two-point estimator remains unbiased for the smoothed gradient, with the second moment depending on $\nabla f_i$ 6 and the sampling structure.

Convex case and high-probability bounds (Akhavan et al., 25 Sep 2025): For constraint sets and $\nabla f_i$ 7-randomized estimators, the excess loss achieves rates $\nabla f_i$ 8 up to logarithmic factors, matching information-theoretic minimax lower bounds for federated zeroth-order optimization.
Riemannian manifolds (Wang et al., 30 Jul 2025): For projection-based zeroth-order methods on manifolds, convergence to a stationary point proceeds at $\nabla f_i$ 9 with query complexity trade-offs dictated by the estimator batch size and geometric constants.

Bias-variance decompositions reveal trade-offs between exploration (full-space search, reducing bias) and exploitation (subspace-centric search, reducing variance). Moderate values of the mixing parameter ( $f_i(x + h u) - f_i(x - h u)$ 0) empirically offer the best balance in projection-based finite-difference sampling (Wu et al., 2024).

6. Applications, Empirical Findings, and Protocol Comparisons

Extensive numerical validation has been conducted:

Tabular benchmarks: Logistic regression, SVM, and MLP on MNIST, Fashion-MNIST, and RCV1 (under IID and non-IID splits) consistently show that projection-based methods ( $f_i(x + h u) - f_i(x - h u)$ 1 moderate, small $f_i(x + h u) - f_i(x - h u)$ 2) accelerate convergence (fewer function calls) relative to isotropic ZO-variants (Wu et al., 2024).
Sparsity effects: On highly sparse tasks (e.g., RCV1), isotropic sampling can be competitive; on dense tasks, projection-based sampling confers clear advantages.
Manifold-constrained FL: Projection-based zeroth-order estimators accelerate convergence in kPCA and low-rank MLP training on Stiefel and low-rank manifolds, achieving comparable rates to first-order methods while reducing tangent-space computation (Wang et al., 30 Jul 2025).
Over-the-air FL: Fed-ZOE and related protocols apply projection-based compression to local model updates for substantial uplink reduction—e.g., $f_i(x + h u) - f_i(x - h u)$ 3 of symbols relative to full-size communication—while maintaining baseline test accuracy (Jang et al., 2024).

A summary table of key empirical results from (Jang et al., 2024):

Method	CIFAR-10	SVHN	Tiny-ImageNet	CIFAR-100	Brain-CT
Fed–OtA (100% symbols)	93.0%	95.4%	72.1%	74.3%	85.2%
LoRA-OtA (10%)	91.8%	94.7%	69.0%	71.5%	83.3%
ZO-OtA (100%)	88.5%	92.1%	65.2%	68.0%	80.5%
Fed–ZOE (0.07%)	92.6%	95.0%	71.0%	73.5%	84.0%

Comm. Load is normalized to Fed–OtA full uplink.

7. Variants and Extensions

Variants of projection-based zeroth-order federated SGD address distinct constraints and system architectures:

Manifold constraints: Riemannian zeroth-order optimization with Euclidean perturbations and projection onto manifolds enables gradient-free FL under non-Euclidean model constraints (Wang et al., 30 Jul 2025).
High-probability guarantees: Utilizing $f_i(x + h u) - f_i(x - h u)$ 4-sphere randomization yields tighter concentration properties and improved high-probability regret bounds in federated convex ZO-SGD (Akhavan et al., 25 Sep 2025).
Communication compression: Over-the-air protocols such as Fed-ZOE leverage projection-based compression, achieving both communication and computational reductions through low-dimensional random sketches of model updates (Jang et al., 2024).
Hybrid approaches: Some recent methods perform first-order updates locally and use projection-based zeroth-order compression for transmission, combining the computational advantages of first-order optimization with the bandwidth efficiency of zeroth-order sketching.

A plausible implication is that ongoing research will increasingly hybridize projection-based zeroth-order techniques with first-order methods, especially when communication is the principal bottleneck.

References: (Wu et al., 2024, Akhavan et al., 25 Sep 2025, Wang et al., 30 Jul 2025, Jang et al., 2024)

Markdown Report Issue Upgrade to Chat

References (4)

A Historical Trajectory Assisted Optimization Method for Zeroth-Order Federated Learning (2024)

High-Probability Analysis of Online and Federated Zero-Order Optimisation (2025)

Federated Learning on Riemannian Manifolds: A Gradient-Free Projection-Based Approach (2025)

Fed-ZOE: Communication-Efficient Over-the-Air Federated Learning via Zeroth-Order Estimation (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Projection-Based Zeroth-Order Federated SGD.

Projection-Based Zeroth-Order Fed SGD

1. Problem Formulation and Zeroth-Order Oracle Model

2. Projection Subspace Construction from Historical Trajectories

3. Non-Isotropic Gradient Estimation via Projections

4. Algorithmic Protocols and Computational Structure

5. Theoretical Guarantees and Bias-Variance Trade-Offs

6. Applications, Empirical Findings, and Protocol Comparisons

7. Variants and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Projection-Based Zeroth-Order Fed SGD

1. Problem Formulation and Zeroth-Order Oracle Model

2. Projection Subspace Construction from Historical Trajectories

3. Non-Isotropic Gradient Estimation via Projections

4. Algorithmic Protocols and Computational Structure

5. Theoretical Guarantees and Bias-Variance Trade-Offs

6. Applications, Empirical Findings, and Protocol Comparisons

7. Variants and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research