On-Manifold PGD
- On-Manifold PGD is an optimization technique that projects gradient steps onto a manifold, ensuring iterates remain feasible and convergence is improved.
- The method integrates standard gradient updates with explicit or learned projections, achieving efficient performance in sparse recovery, estimation, and adversarial attacks.
- Empirical and theoretical results confirm local linear convergence and robustness across applications ranging from signal processing to adversarial training.
On-Manifold Projected Gradient Descent (OM-PGD) refers to a family of optimization algorithms that enforce constraints intrinsic to a manifold structure, projecting iterates back onto a manifold after (or in combination with) standard gradient steps. By leveraging the geometry of the feasible set, OM-PGD enables efficient and theoretically grounded optimization in problems ranging from high-dimensional statistics and signal processing to adversarial robustness in deep learning and attacks on LLMs. The following sections survey the key algorithmic frameworks, mathematical formulations, application domains, and convergence guarantees underlying contemporary OM-PGD research.
1. Mathematical Formulation and General Principles
OM-PGD operates in optimization settings where feasible solutions are constrained to a non-linear subset—a manifold—of the ambient parameter space. Let denote such a manifold (e.g., the set of unit-norm vectors, low-rank matrices, sparse vectors, or structured atomic measures). The canonical OM-PGD iteration takes the general form:
where is the objective, is the gradient (interpreted in the Euclidean or Riemannian sense), is a step size, and projects onto with respect to a chosen metric.
The projection can be explicit (e.g., via closed-form normalization or combinatorial operations) or approximate (e.g., via learned neural projections, Nyström extension, or entropy regularization). The gradient may be computed with respect to the ambient space and then projected onto the tangent space of . Some OM-PGD variants incorporate problem-specific initialization methods, manifold-tangent directions, or two-phase perturbed subroutines for escaping non-optimal stationary points.
2. Implementation in High-Dimensional Estimation and Manifold-Constrained Models
In high-dimensional linear estimation, OM-PGD finds widespread use for empirical risk minimization constrained to structured sets such as sparsity or low-rank manifolds. For , one solves:
where 0 is a regularizer (e.g., 1, 2 norm), and 3 defines the manifold. Projection 4 is the Euclidean projection onto this set.
The algorithm iterates:
- 5
- 6
Statistical convergence rates match the minimax-optimal bounds, and the algorithm accommodates heavy-tailed (sub-exponential) data distributions, yielding robust and efficient solutions even when 7 (Sattar et al., 2019).
3. OM-PGD in Signal Processing: Off-the-Grid Sparse Recovery
For the superresolution task of off-the-grid spike recovery from Fourier measurements, OM-PGD addresses nonconvex optimization over the s-spike manifold:
8
The procedure consists of:
- Over-parametrized continuous OMP initialization, producing candidate spike locations and amplitudes.
- Alternating gradient steps in parameter space (9) and a merge-based projection operator 0 that enforces the spike separation constraint.
- The gradient of the data fidelity (1) is computed in closed form with respect to both amplitudes and locations.
Crucially, with sufficient initial accuracy (ensured by the OMP initialization and measurement redundancy), local linear convergence to the ground truth is guaranteed provided the step size is in 2, where 3 is the gradient Lipschitz constant (Bénard et al., 2022).
4. Riemannian and Non-Euclidean Generalizations
For smooth manifolds endowed with Riemannian metrics, OM-PGD extends to optimization over matrix groups, spheres, or product manifolds. The unit-modulus least-squares (UMLS) problem is prototypical:
4
The projection 5 normalizes each pair of real and imaginary coordinates to unit norm, mapping back to the product of 6 circles. The tangent space and Riemannian Hessian structure are made explicit, enabling precise local linear convergence rates. Adaptive step-size variants—such as backtracking PGD and restarted Nesterov PGD—further accelerate practical convergence (Vu et al., 2022).
Generalizing beyond smooth constraints, “Perturbed Riemannian Gradient Descent” (an OM-PGD variant) operates with random tangent-space perturbations to escape strict saddle points. It provably attains 7-second-order criticality with 8 gradient queries, matching the complexity of high-dimensional Euclidean PGD and making the approach suitable for large-scale tasks such as PCA and low-rank completion (Criscitiello et al., 2019).
5. OM-PGD for Adversarial Robustness and Data-Manifold Constrained Training
For adversarial robustness in neural networks, OM-PGD is applied to generate perturbations constrained to a learned class manifold 9, constructed via conformally invariant diffusion maps (CIDM). Manifold tangent spaces are approximated using the spectral exterior calculus (SEC), providing an orthonormal basis 0 of 1. The OM-PGD attack projects input data onto 2, takes steps in the tangent direction of the classifier’s loss gradient, and reprojects onto the manifold via Nyström extension.
This strategy produces adversaries that remain on the semantic manifold, yielding human-interpretable misclassifications without the off-manifold artifacts common in standard 3-ball PGD. OM-PGD is also embedded into min-max robust training objectives, in which the inner maximization is solved on 4 rather than in ambient pixel space. Empirical results demonstrate improved explainability and the ability to uncover class boundary weaknesses attributable to realistic variations, such as rare background effects (Mahler et al., 2023).
6. OM-PGD in Discrete and Combinatorial Domains
In discrete optimization for adversarial prompt generation against LLMs, OM-PGD operates over a discrete simplex manifold—prompt matrices with one-hot encoding—relaxed to a continuous row-wise simplex 5. Iterations consist of:
- Gradient steps via Adam or SGD within 6 on the relaxed loss.
- Row-wise simplex projections (as in Duchi et al.) to enforce normalization.
- Additional Tsallis-entropy (Gini-index) projections to harden iterates toward low-entropy (nearly discrete) solutions.
- Periodic one-hot discretization to evaluate and recover the best discrete prompts.
Controlling the continuous/discrete relaxation error via entropy projection is essential: ablation increases the attack cross-entropy from 0.078 to 0.092, indicating a substantial loss in attack efficiency. OM-PGD, when compared to combinatorial methods such as gradient-assisted combinatorial search (GCG), achieves 87% attack success rate versus 83% for GCG and exhibits a nearly order-of-magnitude faster wall-clock speed (28.2 vs. 0.3 it/s over 25 prompts) (Geisler et al., 2024).
7. Convergence Properties, Initialization, and Practical Considerations
OM-PGD admits both global and local convergence guarantees under appropriate assumptions:
- Local linear convergence is achievable when initialization lies within a basin of attraction around an isolated minimum and when the objective is locally strongly convex and smooth over the manifold. For off-the-grid superresolution and UMLS, explicit contraction rates and sufficient step-size intervals are established (Bénard et al., 2022, Vu et al., 2022).
- In Riemannian settings, OM-PGD (as PRGD) escapes strict saddles efficiently, with complexity comparable to Euclidean space (Criscitiello et al., 2019).
Initialization quality is often crucial—over-parametrized OMP ensures each true component is covered in spike recovery, while pre-training with vanilla parameters may suffice for high-dimensional estimation. In adversarial optimization, relaxation schedules, entropy projections, and projection frequency are significant for both attack and defense efficacy (Geisler et al., 2024, Mahler et al., 2023).
The projection operator’s implementation is problem-dependent—merge-based heuristics (sparse spikes), closed-form normalization (unit modulus), simplex projection algorithms (combinatorial attacks), or learned/analytic Nyström maps (data manifold methods).
Key References:
(Bénard et al., 2022, Geisler et al., 2024, Sattar et al., 2019, Vu et al., 2022, Criscitiello et al., 2019, Mahler et al., 2023)