Gradient Descent–Mirror Ascent Algorithm

Updated 25 November 2025

Gradient Descent–Mirror Ascent is a framework that combines gradient descent (primal progress) and mirror descent (dual progress) to optimize smooth, convex functions.
The algorithm employs a linear coupling of primal and dual steps, achieving accelerated convergence rates of O(1/K²) and extending naturally to non-Euclidean and composite settings.
Its practical implementation leverages adaptive line-search and restart heuristics, making it applicable to large-scale machine learning tasks such as sparse logistic regression.

The Gradient Descent–Mirror Ascent (GD–MA) algorithm, also termed the "Linear Coupling" algorithm, constitutes a framework that unifies gradient descent and mirror descent to optimize smooth, convex functions over closed convex sets. This method exploits the complementary strengths of gradient descent (primal progress) and mirror descent (dual progress), leveraging a linear combination of these updates to realize acceleration and generalization to non-Euclidean geometries and composite objectives (Allen-Zhu et al., 2014).

1. Problem Formulation and Geometry

The GD–MA framework addresses the problem: $\min_{x \in Q} f(x)$ where $Q \subseteq \mathbb{R}^n$ is a closed convex set and $f: Q \to \mathbb{R}$ is a convex, $L$ -smooth function. The $L$ -smoothness condition is expressed as: $f(y) \leq f(x) + \langle \nabla f(x), y - x \rangle + \frac{L}{2}\|y - x\|^2$ for all $x,y \in Q$ and some norm $\|\cdot\|$ . Convexity is assumed: $f(y) \geq f(x) + \langle \nabla f(x), y - x \rangle$ The geometry can be Euclidean or induced by a non-Euclidean mirror map $\psi: Q \rightarrow \mathbb{R}$ , which is 1-strongly convex with respect to $\|\cdot\|$ and differentiable on the interior of $Q$ . The associated Bregman divergence is: $D(u, w) = \psi(u) - \psi(w) - \langle \nabla\psi(w), u - w \rangle$ This abstraction allows the algorithm to operate over a broad class of geometries and constraints.

2. The Linear Coupling Algorithm

The GD–MA algorithm maintains two iterates: the "primal" point $x_k$ for gradient descent and the "dual" point $z_k$ for mirror descent. A convex combination produces the coupled point $y_k$ : $y_k = \tau_k z_k + (1 - \tau_k) x_k$ The iteration consists of:

Primal step (gradient descent):

$x_{k+1} = \arg\min_{x \in Q} \left\{ \langle \nabla f(y_k), x - y_k \rangle + \frac{L}{2}\|x - y_k\|^2 \right\}$

Dual step (mirror descent):

$z_{k+1} = \arg\min_{z \in Q} \left\{ \langle \alpha_k \nabla f(y_k), z - z_k \rangle + D(z, z_k) \right\}$

Choice of step parameters is essential. Common schedules are $\tau_k = \frac{2}{k+2}$ and $\alpha_k = \frac{k+1}{2L}$ . The final output is the coupled point $y_K$ .

3. Connection to Nesterov’s Acceleration

The GD–MA framework reconstructs Nesterov’s accelerated gradient method by coupling step-sizes so that primal and dual updates share progress. A three-term potential function encapsulates the global progress: $\Phi_k = A_k (f(y_k) - f(x^*)) + D(x^*, z_k)$ with $A_k = \sum_{i=0}^{k-1} \alpha_i$ . Each iteration contracts the potential: $\Phi_{k+1} \leq \Phi_k - \frac{1}{2}\alpha_k\|\nabla f(y_k)\|_*^2$ This framework offers a cleaner and more general understanding than Nesterov’s original analysis, unifying acceleration principles for both Euclidean and mirror-proximal cases. With proper step-selections, the method achieves the accelerated convergence rate $O(1/K^2)$ .

4. Convergence Mechanism and Analytical Structure

At each iteration, the algorithm derives progress from both primal and dual sources:

Primal gain: Gradient descent step yields a decrease in $f(\cdot)$ proportional to $\|\nabla f(y_k)\|_*^2 / L$ .
Dual gain: Mirror descent step reduces the Bregman divergence $D(x^*, z_k)$ , balancing the coupling cross-term with an appropriate choice of $\tau_k$ :

$\tau_k = \frac{\alpha_k}{A_k + \alpha_k}$

This calibration ensures that the quadratic remainder terms are controlled and the unified potential decreases monotonically. Summing over all iterations gives the non-asymptotic bound on suboptimality as $O(1/K^2)$ .

5. Extensions to General Norms, Composite Objectives, and Constraints

The linear coupling methodology extends to a range of scenarios:

Non-Euclidean geometries: Replace all prox-steps with those induced by an arbitrary strongly convex function; mirror map and associated Bregman divergence are chosen accordingly.
Composite objectives: For $f(x) = g(x) + h(x)$ , where $g$ is smooth and $h$ convex (possibly nonsmooth), use a composite gradient step for $x_{k+1}$ , with $\nabla g(y_k)$ and $h(x)$ in the proximal term, retaining the mirror step for $z_{k+1}$ .
Constraints: Enforced via indicator functions in the primal step; the mirror step remains otherwise unchanged.

These adaptations preserve the main coupling argument and its global convergence properties.

6. Implementation Recommendations and Applications

Parameter tuning is practical via adaptive line-search to estimate local smoothness $L$ , dynamically adjusting $\alpha_k$ and $\tau_k$ . Restart heuristics are recommended in the presence of strong convexity to achieve linear convergence rates. In large-scale machine learning, linear coupling directly extends to stochastic settings (SGD–SMD coupling), coordinate descent, and block-mirror frameworks. Empirical deployments include training generalized linear models and large-scale sparse logistic regression, benefiting from the algorithm’s flexibility and scalability (Allen-Zhu et al., 2014).

Markdown Upgrade to Chat

References (1)

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Gradient Descent–Mirror Ascent Algorithm.