Gradient Descent–Mirror Ascent Algorithm
- Gradient Descent–Mirror Ascent is a framework that combines gradient descent (primal progress) and mirror descent (dual progress) to optimize smooth, convex functions.
- The algorithm employs a linear coupling of primal and dual steps, achieving accelerated convergence rates of O(1/K²) and extending naturally to non-Euclidean and composite settings.
- Its practical implementation leverages adaptive line-search and restart heuristics, making it applicable to large-scale machine learning tasks such as sparse logistic regression.
The Gradient Descent–Mirror Ascent (GD–MA) algorithm, also termed the "Linear Coupling" algorithm, constitutes a framework that unifies gradient descent and mirror descent to optimize smooth, convex functions over closed convex sets. This method exploits the complementary strengths of gradient descent (primal progress) and mirror descent (dual progress), leveraging a linear combination of these updates to realize acceleration and generalization to non-Euclidean geometries and composite objectives (Allen-Zhu et al., 2014).
1. Problem Formulation and Geometry
The GD–MA framework addresses the problem: where is a closed convex set and is a convex, -smooth function. The -smoothness condition is expressed as: for all and some norm . Convexity is assumed: The geometry can be Euclidean or induced by a non-Euclidean mirror map , which is 1-strongly convex with respect to and differentiable on the interior of . The associated Bregman divergence is: This abstraction allows the algorithm to operate over a broad class of geometries and constraints.
2. The Linear Coupling Algorithm
The GD–MA algorithm maintains two iterates: the "primal" point for gradient descent and the "dual" point for mirror descent. A convex combination produces the coupled point : The iteration consists of:
- Primal step (gradient descent):
- Dual step (mirror descent):
Choice of step parameters is essential. Common schedules are and . The final output is the coupled point .
3. Connection to Nesterov’s Acceleration
The GD–MA framework reconstructs Nesterov’s accelerated gradient method by coupling step-sizes so that primal and dual updates share progress. A three-term potential function encapsulates the global progress: with . Each iteration contracts the potential: This framework offers a cleaner and more general understanding than Nesterov’s original analysis, unifying acceleration principles for both Euclidean and mirror-proximal cases. With proper step-selections, the method achieves the accelerated convergence rate .
4. Convergence Mechanism and Analytical Structure
At each iteration, the algorithm derives progress from both primal and dual sources:
- Primal gain: Gradient descent step yields a decrease in proportional to .
- Dual gain: Mirror descent step reduces the Bregman divergence , balancing the coupling cross-term with an appropriate choice of :
This calibration ensures that the quadratic remainder terms are controlled and the unified potential decreases monotonically. Summing over all iterations gives the non-asymptotic bound on suboptimality as .
5. Extensions to General Norms, Composite Objectives, and Constraints
The linear coupling methodology extends to a range of scenarios:
- Non-Euclidean geometries: Replace all prox-steps with those induced by an arbitrary strongly convex function; mirror map and associated Bregman divergence are chosen accordingly.
- Composite objectives: For , where is smooth and convex (possibly nonsmooth), use a composite gradient step for , with and in the proximal term, retaining the mirror step for .
- Constraints: Enforced via indicator functions in the primal step; the mirror step remains otherwise unchanged.
These adaptations preserve the main coupling argument and its global convergence properties.
6. Implementation Recommendations and Applications
Parameter tuning is practical via adaptive line-search to estimate local smoothness , dynamically adjusting and . Restart heuristics are recommended in the presence of strong convexity to achieve linear convergence rates. In large-scale machine learning, linear coupling directly extends to stochastic settings (SGD–SMD coupling), coordinate descent, and block-mirror frameworks. Empirical deployments include training generalized linear models and large-scale sparse logistic regression, benefiting from the algorithm’s flexibility and scalability (Allen-Zhu et al., 2014).