Constrained Gradient Method (CGM)

Updated 24 November 2025

Constrained Gradient Method (CGM) is an iterative algorithm that optimizes objectives over constraint sets by projecting gradients onto tangent spaces to maintain near-feasible iterates.
Its modern variant, ODCGM, eliminates costly retraction steps by using orthogonal projections onto low-dimensional subspaces, significantly reducing per-iteration computational overhead.
The approach achieves near-optimal iteration complexity in both deterministic and stochastic regimes, ensuring efficient convergence in high-dimensional nonconvex problems.

A constrained gradient method (CGM) is an iterative procedure for optimizing an objective function over a feasible region defined by equality and/or inequality constraints. Classical gradient-based methods require explicit projections or retractions to ensure feasibility after each step, which often incurs significant computational overhead, especially when the feasible set is a smooth manifold or is defined by nontrivial functional constraints. Modern CGM variants exploit projection onto low-dimensional subspaces, convex cones, or velocity polytopes, yielding efficient algorithms for both convex and nonconvex constrained problems in high dimensions. In particular, recent developments such as the Orthogonal Directions Constrained Gradient Method (ODCGM) propose infeasible iterates pulled towards the constraint manifold, achieving near-optimal optimization complexity without relying on costly retraction operations (Schechtman et al., 2023).

1. Problem Formulation and Assumptions

CGM addresses problems of the form: $\min_{x \in \mathcal{M}} f(x), \quad \mathcal{M} = \{x \in \mathbb{R}^n\;|\; h(x) = 0\}$ where $f: \mathbb{R}^n \to \mathbb{R}$ is generally non-convex and $h: \mathbb{R}^n \to \mathbb{R}^{n_h}$ is continuously differentiable, with $\nabla h$ of full rank in a neighborhood $K$ containing $\mathcal{M}$ with constraints $|h(x)| \leq r_1$ . For the special case of the Stiefel manifold, $x=X\in \mathbb{R}^{p \times q}$ and $h(X) = X^TX - I$ .

The standard assumptions include:

$f$ , $h$ are $L_f, L_h$ -Lipschitz gradient functions on $K$ .
There exists a positive-definite matrix $A(x)$ such that $\nabla h(x)^T \nabla h(x) A(x) \succ 0$ with smallest eigenvalue at least $\alpha_m>0$ .

These structural properties ensure well-posedness for both the objective and the constraint geometry.

2. Algorithmic Structure of ODCGM

At each point $x\in\mathbb{R}^n$ , define:

Quadratic constraint penalty $H(x) = \frac{1}{2}|h(x)|^2$ , with gradient $\nabla H(x) = \nabla h(x) h(x)$ .
The extended tangent space $V(x) = \ker(\nabla h(x)^T)$ .
The projection of the gradient $\nabla f$ onto $V(x)$ , denoted as $\nabla_V f(x) = P_{V(x)} \nabla f(x)$ .

ODCGM constructs a step direction by projecting the negative gradient onto $V(x)$ , followed by an update: $x^+ = x - \text{step\_size} \times \nabla_V f(x)$ Unlike classical manifold optimization methods (e.g., trust-region, retraction-based), ODCGM does not enforce exact feasibility after each step. Instead, iterates remain infeasible but are consistently pulled toward the constraint manifold by the effect of the projected gradient step.

Key properties include:

The algorithm requires only the computation of the (linear) projection onto the tangent space, omitting retraction or full projection to $\mathcal{M}$ .
In regions with full-rank constraints, iterates contract towards $\mathcal{M}$ as the penalty $H(x)$ decreases.

3. Analysis of Convergence and Complexity

ODCGM achieves the following oracle complexities:

Deterministic case: to obtain an $\epsilon$ -stationary point, ODCGM needs $\mathcal{O}(1/\epsilon^2)$ iterations.
Stochastic case: the iteration complexity is $\mathcal{O}(1/\epsilon^4)$ .

The gradient of the penalty $H(x)$ acts as a regularizer, ensuring that the projected steps both decrease $f(x)$ and repel iterates from violating the constraints excessively. The convergence argument leverages Lipschitz properties, spectral conditions on $A(x)$ , and the constant pull exerted by the violated constraints.

A salient feature is that, under an appropriately chosen projection metric, the ODCGM framework recovers the landing algorithm of Ablin and Peyré (2022): their approach for optimization over the Stiefel manifold with orthogonality constraints fits as a special case (Schechtman et al., 2023).

4. Comparison to Retraction-based and Projection Methods

Classical manifold optimization methods (e.g., Riemannian gradient descent, trust-region methods) require:

Computation of a retraction after each step to return to the manifold (e.g., via QR decomposition for Stiefel).
Explicitly feasible iterates at each stage.

In contrast, ODCGM is:

Simpler to implement, since it replaces retractions with orthogonal projections onto the tangent space (vector subspace).
Computationally cheaper per iteration, as no full retraction or projection is required.
Infeasible during optimization, but exhibits contraction towards the manifold, so the limit points are guaranteed to lie close to or on $\mathcal{M}$ .

This methodological simplification leads to broad applicability, especially in large-scale problems and high-dimensional settings.

5. Connections to Prior Work and Extensions

The theoretical scaffold of ODCGM extends analysis by Ablin & Peyré (2022), particularly in the context of optimization over the Stiefel manifold. By choosing appropriate tangent space metrics and projection operators, ODCGM reproduces the behavior of their landing algorithm (Schechtman et al., 2023).

A plausible implication is that the projection-based ODCGM paradigm can be generalized to other smooth manifolds where local tangent space projections are tractable. This framework is especially suitable for problems where the classical geodesic/parallel transport computations are expensive or intractable.

6. Numerical Behavior and Practical Implementation

Experiments demonstrate that ODCGM efficiently solves high-dimensional non-convex problems constrained to smooth manifolds, such as the Stiefel case. In the tested scenarios, ODCGM produced competitive or superior solutions relative to classical retraction-based algorithms, particularly as problem dimension grows.

The empirical efficiency stems from:

Reduced per-iteration cost (no full retraction).
Infeasible trajectory that still converges to the feasible set without explicit constraint restoration.

Near-optimal iteration complexity in both deterministic and stochastic regimes reinforces ODCGM's suitability for challenging nonsmooth feasibility scenarios, especially where the cost of explicit feasibility is prohibitive.

7. Theoretical and Practical Significance

ODCGM establishes a new class of constrained gradient algorithms that offer:

Simplicity of implementation (only vector space projections).
Near-optimal complexity for general nonconvex, functional, and manifold-constrained problems.
Rigorous guarantees of constraint satisfaction in the limit, despite infeasible intermediate iterates.
Recoverability of specific prior algorithms (e.g., landing algorithm) through metric selection.

This methodological advance broadens the reach of first-order optimization for constrained nonconvex landscapes, while reducing the algorithmic and computational burden typically associated with manifold-based optimization (Schechtman et al., 2023).

PDF Markdown Chat (Pro)

References (1)

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Constrained Gradient Method (CGM).