Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 45 tok/s
GPT-5 High 43 tok/s Pro
GPT-4o 103 tok/s
GPT OSS 120B 475 tok/s Pro
Kimi K2 215 tok/s Pro
2000 character limit reached

Simple Subgradient Descent Algorithm

Updated 20 August 2025
  • Simple subgradient descent is an iterative method that minimizes nondifferentiable convex functions by using subgradients instead of gradients.
  • It applies a basic update rule with diminishing step sizes to guarantee progress toward the global optimum even in complex high-dimensional settings.
  • The algorithm is widely used in facility location, engineering design, and computational statistics where traditional gradient methods fail.

A simple subgradient descent algorithm is a foundational iterative optimization method for minimizing nondifferentiable convex functions. It generalizes classical gradient descent by replacing the gradient with a subgradient, facilitating optimization over a broad class of nonsmooth objectives common in applications such as facility location, engineering design, and computational statistics. The method is distinguished by its elementary update rule, minimal informational requirements, and robust applicability in high-dimensional and structurally complex settings.

1. Subgradient and Subdifferential Fundamentals

For a convex function f:RnRf : \mathbb{R}^n \to \mathbb{R} that may be nondifferentiable, a vector gRng \in \mathbb{R}^n is a subgradient at xx if

f(y)f(x)+g(yx)for all yRn.f(y) \geq f(x) + g^\top(y - x) \quad \text{for all}~ y \in \mathbb{R}^n.

The set f(x)\partial f(x) of all such gg—the subdifferential—extends the gradient concept to nondifferentiable points. At any xdom(f)x \in \operatorname{dom}(f) where ff is convex, f(x)\partial f(x) is nonempty, compact, and convex.

In the context of classical convex models, this generalized derivative structure is crucial when the function possesses "kinks" or "ridges," such as in absolute deviation (1\ell_1 norms), maximum, or piecewise-linear cost functions typical in location-science and facility placement problems (Nam et al., 2013).

2. Iterative Update Rule and Step Size Conditions

The basic subgradient descent iteration at step kk is given by: xk+1=xkαkgk,gkf(xk),x_{k+1} = x_k - \alpha_k g_k, \quad g_k \in \partial f(x_k), where αk\alpha_k is the step size and gkg_k is any chosen subgradient at xkx_k. This update does not require differentiability or uniqueness of the subgradient. The core requirements for the convergence of the sequence {xk}\{ x_k \}, assuming ff is convex and bounded below, are on the step size sequence: k=0αk=,k=0αk2<,\sum_{k=0}^\infty \alpha_k = \infty, \qquad \sum_{k=0}^\infty \alpha_k^2 < \infty, which guarantee that the objective values approach the infimum of ff (more precisely, the ergodic average may converge under further technical details).

A common practical rule is to use a diminishing step size sequence, such as αk=a/k\alpha_k = a/\sqrt{k}, with a>0a > 0 tuned to the problem scale.

3. Applications to Facility Location and Nonsmooth Models

In location problems—such as the solution to generalized Fermat-Torricelli or smallest enclosing circle formulations—the cost function aggregates distances or deviations, inherently producing nondifferentiabilities (e.g., use of Axb1\|A x - b\|_1, sum of maximums or absolute values). Direct application of gradient descent fails as the gradient does not exist at the nonsmooth points, while the subgradient approach naturally provides a valid direction for model reduction at each iterate.

A prototypical step includes:

  • Compute a subgradient gkg_k (for Axb1\|A x - b\|_1, the iith component is sign((Axkb)i)\operatorname{sign}((A x_k - b)_i)).
  • Update with chosen αk\alpha_k per the scheme above.
  • Optionally, project onto feasible domains if constraints are present (the base method can be extended easily with projection steps in constrained settings).

Ergodic or iterate averaging has also been advocated to improve convergence in practice, especially when the sequence {xk}\{x_k\} displays oscillatory behavior in unstructured nonsmooth regions.

4. Comparison to Gradient-Type Algorithms

Subgradient methods provide robust advantages in the presence of nondifferentiability:

  • Robustness: Any subgradient produces a valid descent direction in convex settings, handling discontinuities (kinks) where gradients do not exist.
  • Simplicity and Generality: The method does not require smoothing, infimal convolutions, or any auxiliary approximation of nonsmooth regions.
  • Flexibility: The algorithm structure lends itself to integration with other strategies (e.g., cutting-plane, bundle, or projection methods) that are common in large-scale facility location.
  • Theoretical Guarantees: With convexity and appropriate diminishing stepsizes, convergence to the global optimum or an optimal set is reliably achieved.

Trade-offs are notable. For smooth objectives, classical gradient descent with constant step size exhibits linear (geometric) convergence under strong convexity, while subgradient descent is limited to a sublinear rate (O(1/k)O(1/\sqrt{k}) in terms of objective value gap for general convex functions).

5. Implementation and Computational Considerations

The practical implementation of the simple subgradient descent algorithm is direct:

  • At each iteration, evaluate any subgradient of the current iterate (fast for piecewise-linear or absolute deviation terms).
  • Choose or update the stepsize per standard rules; for high-dimensional problems, stepsizes may be scaled with norms to enforce stability.
  • If the feasible set is present, perform a projection step after the update.
  • Iterative averaging, xkA=(1/k)j=1kxjx^A_k = (1/k) \sum_{j=1}^k x_j, may empirically stabilize convergence especially for nonsmooth models.

Resource requirements are minimal: no storage of matrices beyond the subgradient at the current value and possible cumulative averages. No higher-order derivatives, Lipschitz constants, or complex parameter tuning are needed.

Limitations include a convergence rate that deteriorates relative to optimal first-order methods when the objective is smooth, and sensitivity of progress to the scaling of the step size in high-dimensional or ill-conditioned problems.

6. Pedagogical and Theoretical Value

Simple subgradient descent exemplifies the extension of first-order methods beyond the confines of smooth analysis. It offers:

  • A didactic introduction to generalized gradient concepts (subgradients and subdifferentials).
  • A basic vehicle for convergence proofs in nonsmooth optimization, providing clarity on diminishing stepsize requirements and monotonicity properties.
  • Direct application to real-world convex optimization problems with nondifferentiabilities, allowing students and practitioners to connect theoretical properties with numerically implementable methods.

By examining its limitations (sublinear convergence, nonuniqueness of descent direction) versus the sophistication of bundle or smoothing methods, the simple subgradient descent algorithm occupies a central role in teaching and understanding convex optimization and its nondifferentiable extensions (Nam et al., 2013).


Key Property Gradient Descent Simple Subgradient Descent
Differentiability Required Not required
Convergence Rate Linear (strongly convex) Sublinear (O(1/k)O(1/\sqrt{k}))
Direction Selection Unique (gradient) Any subgradient (f(x)\partial f(x))
Applicability Smooth objectives Nonsmooth convex objectives
Stepsize Choice Constant/Adaptive Diminishing (e.g., 1/k1/\sqrt{k})

This table compares the simple subgradient method to its gradient-based counterpart, underscoring how modifications to the update rule enable extension to the nonsmooth convex regime.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)