Projected Gradient Descent with Constraints

Updated 1 August 2025

PGDC is a first-order iterative optimization method that alternates gradient updates with projections to maintain feasibility under various structural constraints.
The method guarantees convergence to Bouligand or proximal stationary points in both convex and nonconvex settings, emphasizing the importance of projection quality and step-size.
Widely used in applications like low-rank matrix recovery, sparse learning, and manifold optimization, PGDC balances computational efficiency with robust performance.

Projected Gradient Descent with Constraints (PGDC) refers to a class of first-order iterative optimization methods that operate over a feasible set by alternating between gradient-descent updates and projections onto the constraint set. The approach is widely applicable in convex and nonconvex settings across signal processing, statistics, machine learning, and engineering design, particularly when problem variables are required to satisfy explicit structural constraints such as norm bounds, low-rankness, sparsity, or feasibility in manifold or combinatorial sets.

1. Theoretical Foundations and General Framework

The canonical PGDC algorithm solves problems of the form

$\min_{x \in C} f(x)$

where $f$ is typically continuously differentiable (possibly nonconvex) and $C$ is a closed (not necessarily convex) constraint set. At each iteration,

$x_{k+1} = \mathrm{Proj}_C( x_k - \alpha_k \nabla f(x_k) )$

where $\alpha_k$ is a positive step size and $\mathrm{Proj}_C(\cdot)$ denotes projection onto $C$ in the Euclidean norm. The projection operator plays a central role in enforcing feasibility and is defined as

$\mathrm{Proj}_C(y) = \arg\min_{z \in C} \|z - y\|$

for each candidate point $y$ .

PGDC methods are versatile: for convex $C$ and smooth $f$ , classical results guarantee convergence to a global minimizer; in nonconvex settings—where, for example, $C$ encodes a rank, sparsity, or manifold constraint—convergence to stationary points is often the best possible guarantee.

Theoretical advances (Olikier et al., 4 Mar 2024) have clarified the types of stationarity PGDC achieves in general: under basic assumptions, all accumulation points are Bouligand stationary, and—with locally Lipschitz gradients—even proximally stationary, exceeding the weaker Mordukhovich stationarity that sometimes appears in classical analysis.

2. Stationarity Concepts and Optimality Notions

Three nested optimality conditions are relevant in constrained nonconvex optimization:

Mordukhovich Stationarity: $-\nabla f(x) \in N_C(x)$ (limiting normal cone).
Bouligand Stationarity: $-\nabla f(x) \in \widehat{N}_C(x)$ (regular/Bouligand/Clarke normal cone). This is strictly stronger and reflects the geometric tangent properties of $C$ at $x$ .
Proximal Stationarity: For some $\alpha > 0$ , $x = \mathrm{Proj}_C(x - \alpha \nabla f(x))$ , which is equivalent to $-\nabla f(x)$ belonging to the proximal normal cone.

PGDC is proven (Olikier et al., 4 Mar 2024) to produce sequences whose accumulation points meet at least the Bouligand stationarity condition, and—under local gradient Lipschitz conditions—are proximally stationary.

Such results depend critically on the geometry of $C$ . For nonconvex and nonsmooth sets (e.g., sets of low-rank matrices), normal and tangent cones may have complex structure but can nevertheless be characterized explicitly in many applications, facilitating practical implementation.

3. Algorithmic Structure and Practical Implementation

At the core, each PGDC step consists of a descent along the negative gradient and a projection: $\begin{align*} y_{k+1} &= x_k - \alpha_k \nabla f(x_k) \ x_{k+1} &= \mathrm{Proj}_C(y_{k+1}) \end{align*}$ Key factors affecting implementation and performance include:

Step-size selection ( $\alpha_k$ ): May be constant, diminishing, or obtained via line-search for sufficient descent (e.g., Armijo or Wolfe schemes).
Projection computation: For convex $C$ , the projection has unique and often analytic or efficiently computable form. For nonconvex $C$ , projections may involve combinatorial algorithms (e.g., hard-thresholding for sparsity, SVD truncation for rank, or specialized solvers for manifold constraints).
Nonmonotone strategies: In some settings, acceptance of successive iterates relies not on monotonic decrease but on generalized descent criteria (e.g., satisfying a nonmonotone sufficient decrease condition).

These choices directly influence convergence speed, computational cost per iteration, and robustness.

4. Constraints Handling and Structured Feasible Sets

A major utility of PGDC is its ability to enforce feasibility with respect to diverse constraint sets:

Simple structure (box, balls, affine, simplex): Projections are closed-form and computationally trivial.
Geometry-induced structure (manifolds, low-rank varieties, determinantal sets): Projections typically require numerical algorithms (e.g., SVD for rank projection, hard-thresholding for $\ell_0$ constraints).
Combinatorial/discrete structure: Projections may be approximated by randomized or heuristic schemes (e.g., XOR sampling for combinatorial constraints (Ding et al., 2022)).

PGDC’s reliance on exact projection can be a practical challenge where projection is computationally demanding, but theory (Olikier et al., 4 Mar 2024) and extensions based on surrogates or approximate projections have broadened applicability.

5. Convergence Properties and Guarantees

Convergence properties of PGDC are highly sensitive to the constraint geometry and function regularity:

For convex $C$ and L-smooth $f$ , convergence to global minimizers is classical.
For nonconvex $C$ , convergence is to Bouligand stationary or, under stronger smoothness, to proximal stationary points (Olikier et al., 4 Mar 2024).
For constraint sets where the projection mapping is locally continuously differentiable (e.g., smooth manifolds), linear convergence rates are attainable locally (within a region of attraction), as analyzed in terms of the linearization of the fixed-point mapping associated with the optimality condition (Vu et al., 2021).

This can be formalized as: $x_{k+1} - x^* = H (x_k - x^*) + o(\|x_k - x^*\|)$ where $H$ is the linearized iteration matrix, with spectral radius $\rho(H) < 1$ necessary for local geometric convergence.

The global convergence landscape is more intricate outside the convex setting. Convergence to global minimizers can often only be guaranteed under additional assumptions (e.g., local convexity, invexity, or specific properties of the objective and set $C$ ), with convergence to stationary points as the general outcome.

6. Applications and Illustrative Examples

PGDC finds wide application across domains including:

Low-rank optimization: Constraint sets defined by rank (e.g., matrix completion, robust PCA, structured PCA). Feasibility is enforced by projection onto determinantal varieties (e.g., truncated SVD) (Olikier et al., 4 Mar 2024, Olikier et al., 2022).
Sparsity-constrained learning: PGDC-type algorithms apply to $\ell_0$ -constrained sparse recovery, with projection by hard-thresholding.
Manifold optimization: Constraints given by geometric manifolds (e.g., Stiefel, Grassmann, sphere), where projection uses analytic formulas for the nearest point (Vu et al., 2021).
Topology optimization in engineering: Extremely large-scale designs with bound and global nonlinear constraints, where improvements in projection algorithms (e.g., via Schur complement for univariate constraints and vector space decomposition for handling nonlinear constraints) further increase robustness and scalability (Barbeau et al., 10 Dec 2024).

Table: Typical Projections in PGDC

Constraint Type	Example of PGDC Projection	Computational Form
Box / Ball	$[y]_j \leftarrow \min\{\max\{y_j,\ell_j\}, u_j\}$	Closed-form coordinatewise
Simplex	Euclidean projection onto $\{x \geq 0, \sum x = 1\}$	Efficient O(n log n) algorithm
Rank (≤ r)	Truncated SVD $U\Sigma_r V^T$	SVD with r nonzero values
Manifold (unit norm)	$y/\\|y\\|$	Normalization

Algorithmic improvements such as the incorporation of univariate constraints in projection via Schur complement (Barbeau et al., 10 Dec 2024), vector‐space decomposition for updates, bulk constraint manipulation, and step‐size adaptation based on Lagrangian approximations address robustness and computational efficiency in challenging applications such as large‐scale topology optimization.

7. Practical Considerations, Strengths, and Limitations

PGDC’s primary strengths include:

Simplicity and broad applicability
Explicit preservation of feasibility at each iteration
Systematic convergence to strong forms of first-order stationarity under mild assumptions (Olikier et al., 4 Mar 2024)

However, limitations arise:

Computational expense of projections for complex or nonconvex $C$
Potential for slow convergence near saddle points or in poorly conditioned problems
Sensitivity to step-size selection and to the quality (and existence) of projections

Recent research addresses these aspects via:

Perturbation schemes for saddle-point escape and achieving second-order or approximate global optimality in nonconvex problems (Zhang et al., 5 Mar 2024)
Adaptive or parameter-free schemes for gradient step-size tuning (Chzhen et al., 2023)
Efficient projection implementations for specific application domains (Barbeau et al., 10 Dec 2024)

Conclusion

PGDC stands as a cornerstone methodology in modern constrained optimization, combining strong theoretical guarantees with flexible implementation across diverse constraint sets. Its convergence to Bouligand (and, under further regularity, proximal) stationary points (Olikier et al., 4 Mar 2024) distinguishes it in general nonconvex scenarios. Ongoing work focuses on further improving its practical scalability, robustness, and acceleration, especially for large-scale and highly structured problems in signal processing, data science, machine learning, and engineering design.