Papers
Topics
Authors
Recent
2000 character limit reached

Gradient Descent–Mirror Ascent Algorithm

Updated 25 November 2025
  • Gradient Descent–Mirror Ascent is a framework that combines gradient descent (primal progress) and mirror descent (dual progress) to optimize smooth, convex functions.
  • The algorithm employs a linear coupling of primal and dual steps, achieving accelerated convergence rates of O(1/K²) and extending naturally to non-Euclidean and composite settings.
  • Its practical implementation leverages adaptive line-search and restart heuristics, making it applicable to large-scale machine learning tasks such as sparse logistic regression.

The Gradient Descent–Mirror Ascent (GD–MA) algorithm, also termed the "Linear Coupling" algorithm, constitutes a framework that unifies gradient descent and mirror descent to optimize smooth, convex functions over closed convex sets. This method exploits the complementary strengths of gradient descent (primal progress) and mirror descent (dual progress), leveraging a linear combination of these updates to realize acceleration and generalization to non-Euclidean geometries and composite objectives (Allen-Zhu et al., 2014).

1. Problem Formulation and Geometry

The GD–MA framework addresses the problem: minxQf(x)\min_{x \in Q} f(x) where QRnQ \subseteq \mathbb{R}^n is a closed convex set and f:QRf: Q \to \mathbb{R} is a convex, LL-smooth function. The LL-smoothness condition is expressed as: f(y)f(x)+f(x),yx+L2yx2f(y) \leq f(x) + \langle \nabla f(x), y - x \rangle + \frac{L}{2}\|y - x\|^2 for all x,yQx,y \in Q and some norm \|\cdot\|. Convexity is assumed: f(y)f(x)+f(x),yxf(y) \geq f(x) + \langle \nabla f(x), y - x \rangle The geometry can be Euclidean or induced by a non-Euclidean mirror map ψ:QR\psi: Q \rightarrow \mathbb{R}, which is 1-strongly convex with respect to \|\cdot\| and differentiable on the interior of QQ. The associated Bregman divergence is: D(u,w)=ψ(u)ψ(w)ψ(w),uwD(u, w) = \psi(u) - \psi(w) - \langle \nabla\psi(w), u - w \rangle This abstraction allows the algorithm to operate over a broad class of geometries and constraints.

2. The Linear Coupling Algorithm

The GD–MA algorithm maintains two iterates: the "primal" point xkx_k for gradient descent and the "dual" point zkz_k for mirror descent. A convex combination produces the coupled point yky_k: yk=τkzk+(1τk)xky_k = \tau_k z_k + (1 - \tau_k) x_k The iteration consists of:

  • Primal step (gradient descent):

xk+1=argminxQ{f(yk),xyk+L2xyk2}x_{k+1} = \arg\min_{x \in Q} \left\{ \langle \nabla f(y_k), x - y_k \rangle + \frac{L}{2}\|x - y_k\|^2 \right\}

  • Dual step (mirror descent):

zk+1=argminzQ{αkf(yk),zzk+D(z,zk)}z_{k+1} = \arg\min_{z \in Q} \left\{ \langle \alpha_k \nabla f(y_k), z - z_k \rangle + D(z, z_k) \right\}

Choice of step parameters is essential. Common schedules are τk=2k+2\tau_k = \frac{2}{k+2} and αk=k+12L\alpha_k = \frac{k+1}{2L}. The final output is the coupled point yKy_K.

3. Connection to Nesterov’s Acceleration

The GD–MA framework reconstructs Nesterov’s accelerated gradient method by coupling step-sizes so that primal and dual updates share progress. A three-term potential function encapsulates the global progress: Φk=Ak(f(yk)f(x))+D(x,zk)\Phi_k = A_k (f(y_k) - f(x^*)) + D(x^*, z_k) with Ak=i=0k1αiA_k = \sum_{i=0}^{k-1} \alpha_i. Each iteration contracts the potential: Φk+1Φk12αkf(yk)2\Phi_{k+1} \leq \Phi_k - \frac{1}{2}\alpha_k\|\nabla f(y_k)\|_*^2 This framework offers a cleaner and more general understanding than Nesterov’s original analysis, unifying acceleration principles for both Euclidean and mirror-proximal cases. With proper step-selections, the method achieves the accelerated convergence rate O(1/K2)O(1/K^2).

4. Convergence Mechanism and Analytical Structure

At each iteration, the algorithm derives progress from both primal and dual sources:

  • Primal gain: Gradient descent step yields a decrease in f()f(\cdot) proportional to f(yk)2/L\|\nabla f(y_k)\|_*^2 / L.
  • Dual gain: Mirror descent step reduces the Bregman divergence D(x,zk)D(x^*, z_k), balancing the coupling cross-term with an appropriate choice of τk\tau_k:

τk=αkAk+αk\tau_k = \frac{\alpha_k}{A_k + \alpha_k}

This calibration ensures that the quadratic remainder terms are controlled and the unified potential decreases monotonically. Summing over all iterations gives the non-asymptotic bound on suboptimality as O(1/K2)O(1/K^2).

5. Extensions to General Norms, Composite Objectives, and Constraints

The linear coupling methodology extends to a range of scenarios:

  • Non-Euclidean geometries: Replace all prox-steps with those induced by an arbitrary strongly convex function; mirror map and associated Bregman divergence are chosen accordingly.
  • Composite objectives: For f(x)=g(x)+h(x)f(x) = g(x) + h(x), where gg is smooth and hh convex (possibly nonsmooth), use a composite gradient step for xk+1x_{k+1}, with g(yk)\nabla g(y_k) and h(x)h(x) in the proximal term, retaining the mirror step for zk+1z_{k+1}.
  • Constraints: Enforced via indicator functions in the primal step; the mirror step remains otherwise unchanged.

These adaptations preserve the main coupling argument and its global convergence properties.

6. Implementation Recommendations and Applications

Parameter tuning is practical via adaptive line-search to estimate local smoothness LL, dynamically adjusting αk\alpha_k and τk\tau_k. Restart heuristics are recommended in the presence of strong convexity to achieve linear convergence rates. In large-scale machine learning, linear coupling directly extends to stochastic settings (SGD–SMD coupling), coordinate descent, and block-mirror frameworks. Empirical deployments include training generalized linear models and large-scale sparse logistic regression, benefiting from the algorithm’s flexibility and scalability (Allen-Zhu et al., 2014).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Gradient Descent–Mirror Ascent Algorithm.