Papers
Topics
Authors
Recent
Search
2000 character limit reached

On-Manifold PGD

Updated 3 June 2026
  • On-Manifold PGD is an optimization technique that projects gradient steps onto a manifold, ensuring iterates remain feasible and convergence is improved.
  • The method integrates standard gradient updates with explicit or learned projections, achieving efficient performance in sparse recovery, estimation, and adversarial attacks.
  • Empirical and theoretical results confirm local linear convergence and robustness across applications ranging from signal processing to adversarial training.

On-Manifold Projected Gradient Descent (OM-PGD) refers to a family of optimization algorithms that enforce constraints intrinsic to a manifold structure, projecting iterates back onto a manifold after (or in combination with) standard gradient steps. By leveraging the geometry of the feasible set, OM-PGD enables efficient and theoretically grounded optimization in problems ranging from high-dimensional statistics and signal processing to adversarial robustness in deep learning and attacks on LLMs. The following sections survey the key algorithmic frameworks, mathematical formulations, application domains, and convergence guarantees underlying contemporary OM-PGD research.

1. Mathematical Formulation and General Principles

OM-PGD operates in optimization settings where feasible solutions are constrained to a non-linear subset—a manifold—of the ambient parameter space. Let M\mathcal{M} denote such a manifold (e.g., the set of unit-norm vectors, low-rank matrices, sparse vectors, or structured atomic measures). The canonical OM-PGD iteration takes the general form:

x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)

where ff is the objective, f\nabla f is the gradient (interpreted in the Euclidean or Riemannian sense), αk\alpha_k is a step size, and ΠM\Pi_{\mathcal{M}} projects onto M\mathcal{M} with respect to a chosen metric.

The projection can be explicit (e.g., via closed-form normalization or combinatorial operations) or approximate (e.g., via learned neural projections, Nyström extension, or entropy regularization). The gradient may be computed with respect to the ambient space and then projected onto the tangent space of M\mathcal{M}. Some OM-PGD variants incorporate problem-specific initialization methods, manifold-tangent directions, or two-phase perturbed subroutines for escaping non-optimal stationary points.

2. Implementation in High-Dimensional Estimation and Manifold-Constrained Models

In high-dimensional linear estimation, OM-PGD finds widespread use for empirical risk minimization constrained to structured sets such as sparsity or low-rank manifolds. For θ=[β;μ]Rp+1\theta = [\beta; \mu] \in \mathbb{R}^{p+1}, one solves:

minβ L^n(β,μ)=12yXβμ122subject to R(β)R\min_{\beta} \ \widehat L_n(\beta, \mu) = \frac{1}{2} \| y - X\beta - \mu 1 \|_2^2 \quad \text{subject to} \ R(\beta) \leq R^*

where x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)0 is a regularizer (e.g., x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)1, x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)2 norm), and x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)3 defines the manifold. Projection x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)4 is the Euclidean projection onto this set.

The algorithm iterates:

  • x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)5
  • x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)6

Statistical convergence rates match the minimax-optimal bounds, and the algorithm accommodates heavy-tailed (sub-exponential) data distributions, yielding robust and efficient solutions even when x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)7 (Sattar et al., 2019).

3. OM-PGD in Signal Processing: Off-the-Grid Sparse Recovery

For the superresolution task of off-the-grid spike recovery from Fourier measurements, OM-PGD addresses nonconvex optimization over the s-spike manifold:

x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)8

The procedure consists of:

  • Over-parametrized continuous OMP initialization, producing candidate spike locations and amplitudes.
  • Alternating gradient steps in parameter space (x(k+1)=ΠM(x(k)αkf(x(k)))x^{(k+1)} = \Pi_{\mathcal{M}}\big(x^{(k)} - \alpha_k \nabla f(x^{(k)})\big)9) and a merge-based projection operator ff0 that enforces the spike separation constraint.
  • The gradient of the data fidelity (ff1) is computed in closed form with respect to both amplitudes and locations.

Crucially, with sufficient initial accuracy (ensured by the OMP initialization and measurement redundancy), local linear convergence to the ground truth is guaranteed provided the step size is in ff2, where ff3 is the gradient Lipschitz constant (Bénard et al., 2022).

4. Riemannian and Non-Euclidean Generalizations

For smooth manifolds endowed with Riemannian metrics, OM-PGD extends to optimization over matrix groups, spheres, or product manifolds. The unit-modulus least-squares (UMLS) problem is prototypical:

ff4

The projection ff5 normalizes each pair of real and imaginary coordinates to unit norm, mapping back to the product of ff6 circles. The tangent space and Riemannian Hessian structure are made explicit, enabling precise local linear convergence rates. Adaptive step-size variants—such as backtracking PGD and restarted Nesterov PGD—further accelerate practical convergence (Vu et al., 2022).

Generalizing beyond smooth constraints, “Perturbed Riemannian Gradient Descent” (an OM-PGD variant) operates with random tangent-space perturbations to escape strict saddle points. It provably attains ff7-second-order criticality with ff8 gradient queries, matching the complexity of high-dimensional Euclidean PGD and making the approach suitable for large-scale tasks such as PCA and low-rank completion (Criscitiello et al., 2019).

5. OM-PGD for Adversarial Robustness and Data-Manifold Constrained Training

For adversarial robustness in neural networks, OM-PGD is applied to generate perturbations constrained to a learned class manifold ff9, constructed via conformally invariant diffusion maps (CIDM). Manifold tangent spaces are approximated using the spectral exterior calculus (SEC), providing an orthonormal basis f\nabla f0 of f\nabla f1. The OM-PGD attack projects input data onto f\nabla f2, takes steps in the tangent direction of the classifier’s loss gradient, and reprojects onto the manifold via Nyström extension.

This strategy produces adversaries that remain on the semantic manifold, yielding human-interpretable misclassifications without the off-manifold artifacts common in standard f\nabla f3-ball PGD. OM-PGD is also embedded into min-max robust training objectives, in which the inner maximization is solved on f\nabla f4 rather than in ambient pixel space. Empirical results demonstrate improved explainability and the ability to uncover class boundary weaknesses attributable to realistic variations, such as rare background effects (Mahler et al., 2023).

6. OM-PGD in Discrete and Combinatorial Domains

In discrete optimization for adversarial prompt generation against LLMs, OM-PGD operates over a discrete simplex manifold—prompt matrices with one-hot encoding—relaxed to a continuous row-wise simplex f\nabla f5. Iterations consist of:

  • Gradient steps via Adam or SGD within f\nabla f6 on the relaxed loss.
  • Row-wise simplex projections (as in Duchi et al.) to enforce normalization.
  • Additional Tsallis-entropy (Gini-index) projections to harden iterates toward low-entropy (nearly discrete) solutions.
  • Periodic one-hot discretization to evaluate and recover the best discrete prompts.

Controlling the continuous/discrete relaxation error via entropy projection is essential: ablation increases the attack cross-entropy from 0.078 to 0.092, indicating a substantial loss in attack efficiency. OM-PGD, when compared to combinatorial methods such as gradient-assisted combinatorial search (GCG), achieves 87% attack success rate versus 83% for GCG and exhibits a nearly order-of-magnitude faster wall-clock speed (28.2 vs. 0.3 it/s over 25 prompts) (Geisler et al., 2024).

7. Convergence Properties, Initialization, and Practical Considerations

OM-PGD admits both global and local convergence guarantees under appropriate assumptions:

  • Local linear convergence is achievable when initialization lies within a basin of attraction around an isolated minimum and when the objective is locally strongly convex and smooth over the manifold. For off-the-grid superresolution and UMLS, explicit contraction rates and sufficient step-size intervals are established (Bénard et al., 2022, Vu et al., 2022).
  • In Riemannian settings, OM-PGD (as PRGD) escapes strict saddles efficiently, with complexity comparable to Euclidean space (Criscitiello et al., 2019).

Initialization quality is often crucial—over-parametrized OMP ensures each true component is covered in spike recovery, while pre-training with vanilla parameters may suffice for high-dimensional estimation. In adversarial optimization, relaxation schedules, entropy projections, and projection frequency are significant for both attack and defense efficacy (Geisler et al., 2024, Mahler et al., 2023).

The projection operator’s implementation is problem-dependent—merge-based heuristics (sparse spikes), closed-form normalization (unit modulus), simplex projection algorithms (combinatorial attacks), or learned/analytic Nyström maps (data manifold methods).


Key References:

(Bénard et al., 2022, Geisler et al., 2024, Sattar et al., 2019, Vu et al., 2022, Criscitiello et al., 2019, Mahler et al., 2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to On-Manifold Projected Gradient Descent (OM-PGD).