Mirror Descent with Bregman Divergence
- Mirror descent with Bregman divergence is a geometry-aware optimization framework that leverages strictly convex mirror maps to respect problem structures like sparsity and manifold constraints.
- The method utilizes dual gradient steps and Bregman projections to ensure robust convergence, achieving sublinear to linear rates depending on convexity of the objective.
- It underpins a wide range of applications, from machine learning and statistics to control and reinforcement learning, and supports distributed and primal-dual optimization settings.
Mirror descent with Bregman divergence is a general first-order optimization framework that extends classical gradient descent by leveraging non-Euclidean geometries, as defined by strictly convex "mirror maps." The essence of mirror descent is the utilization of Bregman divergence—generated by a mirror map—to measure proximity and dictate update steps, enabling algorithms to respect problem structure such as sparsity, simplex constraints, and manifold geometry. This approach yields robust convergence guarantees over a wide family of domains, supports primal-dual and distributed settings, and underpins a vast array of applications in statistics, machine learning, control, and reinforcement learning.
1. Definition and General Framework
Let be a strictly convex, differentiable function with open domain, referred to as the "mirror map." Given an optimization problem over a closed convex set , the Bregman divergence associated with is defined as
where . is nonnegative, convex in its first argument, and recovers squared Euclidean distance for . The mirror descent update is specified by:
- Dual step:
- Primal projection: or, equivalently,
This framework respects the geometry of by encoding it into the choice of , and supports a variety of optimization landscapes (Raskutti et al., 2013).
2. Specialization to Probability Simplex: Entropic Mirror Descent
On the probability simplex , the canonical mirror map is the negative entropy . For this choice,
which is simply the Kullback-Leibler divergence when . The Bregman projection onto the simplex consists of normalization: The mirror descent update becomes: This entropic form arises naturally in learning probability distributions, portfolio optimization, and boosting (Halder, 2018).
3. Variational Principle and Fixed Point Structure
Mirror descent is variationally equivalent to minimizing a composite objective of the form: where is the Kullback-Leibler divergence to a reference vector ("influence"), and is the "extropy" (entropy of the complement). The fixed point of mirror descent (e.g., the DeGroot–Friedkin map) solves
Strict convexity ensures existence and uniqueness of , and standard Lyapunov arguments establish convergence (Halder, 2018).
4. Links to Information Geometry and Natural Gradient
Mirror descent can be viewed as gradient descent in the dual Riemannian geometry, with the metric tensor given by the Hessian . The Legendre transform defines the dual geometry, and the update in dual coordinates corresponds to natural gradient descent: which, via the chain rule, becomes steepest descent on the dual manifold (Raskutti et al., 2013). In exponential families, mirror descent with negative entropy achieves asymptotic statistical efficiency, attaining the Cramér–Rao lower bound for parameter estimation.
5. Convergence Analysis and Lyapunov Perspective
Mirror descent admits rigorous convergence guarantees:
- Sublinear convergence for convex objectives with constant step size
- Linear (geometric) rate for strongly convex objectives, where depends on the strong convexity of and
- For mirror descent steps, the Bregman divergence serves as a Lyapunov function
- Integral Quadratic Constraint (IQC) analyses show that the Bregman Lyapunov is a special case of Popov-criterion storage functions, enabling tight rates via matrix inequalities (Li et al., 2022, Li et al., 2023).
6. Practical Applications and Specialized Algorithms
Mirror descent with Bregman divergence is foundational in:
- Composite, distributed, and online optimization, where geometry-aware updates outperform Euclidean approaches (Yuan et al., 2020, Chen et al., 2021)
- Policy optimization in reinforcement learning, where PMD-style updates guarantee finite-step optimality and adapt to geometry-inducing divergences (Lin et al., 2022)
- Stochastic control, both with vector-valued and measure-valued actions. Relative smoothness and strong convexity with respect to provide linear or exponential rates, depending on regularization (Sethi et al., 3 Jun 2025, Kerimkulov et al., 2024)
- Implicit regularization in separable data: choice of the mirror map directly affects margin bounds and learning behavior (Li et al., 2021)
- Optimization over curved manifolds and norm-constrained sets: dual-norm mirror descent and generalized logarithmic mirrors extend the method to non-Euclidean settings, often yielding closed-form projection-free updates (Nock et al., 2016, Cichocki, 8 Jun 2025)
- Statistical learning in exponential families, phase retrieval, optimal transport (Sinkhorn algorithm as a mirror descent with KL divergence) (Godeme et al., 2022, 2002.03758)
7. Algorithmic Templates and Implementation Considerations
The prototypical mirror descent algorithm is:
1 2 3 4 |
for t in range(1, T+1): g_t = ∇f(x_t) dual = ∇h(x_t) - η_t * g_t x_{t+1} = (∇h)^{-1}(dual) |
- The efficiency of mirror descent depends on the choice of and the tractability of inverting .
- For simplex domains, exponentiated-gradient and its generalizations (Tempesta logarithms, Tsallis/Kaniadakis mirrors) provide closed-form updates and enable domain adaptation via hyperparameters (Cichocki, 8 Jun 2025).
- For distributed and non-smooth optimization, Bregman damping and ergodic gap analysis yield rates (for saddle point/constrained problems) (Chen et al., 2021).
8. Theoretical and Empirical Insights
Mirror descent unifies proximal, primal-dual, and natural gradient frameworks. The geometry is entirely governed by the mirror map and its Bregman divergence, providing both interpretability and flexibility. Analysis via IQC and Lyapunov methods confirms the tightness of classical rates and allows systematic extension to advanced settings—stochastic, distributed, measure-valued, nonconvex—maintaining robust guarantees (Li et al., 2023, Fatkhullin et al., 2024). Proper tuning of the underlying mirror geometry yields optimal statistical efficiency and domain-adaptive regularization.
Summary Table: Core Components
| Component | Definition / Role | Classical Case |
|---|---|---|
| Mirror Map | Strictly convex, differentiable potential encoding geometry of | |
| Bregman Divergence | Squared Euclidean distance | |
| Dual Step | Additive (Euclidean) update | |
| Primal Projection | Standard Euclidean projection | |
| Typical Geometry | Simplex (entropy), orthant, manifold, dual-norm, measure space | |
| Convergence Rate | (convex); (strongly convex); exponential (strong regularizer) | Same under Euclidean geometry |
The generality, geometry-awareness, and provable efficiency of mirror descent with Bregman divergence position it as a cornerstone method in modern convex, stochastic, distributed, and nonconvex optimization.