FTRL: Follow-the-Regularized-Leader

Updated 31 August 2025

Follow-the-Regularized-Leader (FTRL) is an online optimization framework that minimizes cumulative losses plus a regularization penalty, balancing exploitation and exploration.
It unifies methods like mirror descent, dual averaging, and FTRL-Proximal by tailoring the regularization geometry and centering to achieve adaptive regret bounds.
Modern FTRL variants employ adaptive learning rates and composite update strategies to deliver near-optimal performance in adversarial, bandit, deep learning, and constrained environments.

Follow-the-Regularized-Leader (FTRL) is a central paradigm in online convex optimization, online learning, and sequential decision making. It provides a framework for designing algorithms with optimal or near-optimal regret guarantees in adversarial, stochastic, and structured feedback settings. At its core, FTRL iteratively selects actions by minimizing cumulative observed losses plus a regularization penalty, thereby balancing exploitation of past knowledge and exploration or stability via the regularizer. The framework unifies and generalizes numerous fundamental online algorithms including mirror descent, dual averaging, and composite-objective methods, and it is tightly connected to both statistical learning theory and convex analysis.

1. General FTRL Update Rule and Instantiations

The canonical FTRL update at round $t$ selects $x_{t+1}$ via

$x_{t+1} \in \arg\min_x \left\{ \sum_{s=1}^t f_s(x) + R_{1:t}(x) + \alpha_{1:t} \Psi(x) \right\},$

where:

$f_s(x)$ is the convex loss at step $s$ (possibly linearized);
$R_{1:t}(x)$ is an incremental regularization term, commonly quadratic or based on Bregman divergences, and may be either proximal (centered at $x_t$ ) or centered at the origin;
$\Psi(x)$ is a nonsmooth penalty (e.g., inducing sparsity via $L_1$ regularization) weighted by $\alpha_{1:t}$ .

This general template recovers:

Regularized Dual Averaging (RDA): $R_{1:t}$ centered at zero; accumulates losses and applies global regularization.
Mirror Descent (MD) / FOBOS: $R_{1:t}$ as a sequence of Bregman divergences centered at $x_t$ ; regularizer updated iteratively.
FTRL-Proximal: $R_{1:t}$ is a sum of local regularizers each proximate to the most recent iterate, bridging the gap between RDA and MD.

Equivalence theorems (McMahan, 2010) establish that these algorithms arise as special cases of a single FTRL principle with different choices of regularization geometry and centering.

2. Regret Analysis and Stability–Penalty Decomposition

The regret of FTRL algorithms is bounded as: $\mathrm{Regret}_T(x^*) \leq R_{1:T}(x^*) + \sum_{t=1}^T \big[ h_{0:t}(x_t) - h_{0:t}(x_{t+1}) - r_t(x_t) \big],$ where $h_{0:t}$ collects cumulative losses and regularizer values. Central "Strong FTRL" lemmas (McMahan, 2014) allow the per-round regret to be decomposed into:

A stability term (variation in action selection);
A penalty term (regularization cost at the comparator).

This decomposition exposes the crucial role of stability (how rapidly the parameter estimates move) versus penalty (how rapidly the regularizer grows with cumulative loss). The best regret bounds require matching these components, which dictates the learning rate schedule $\eta_t$ and adaptive regularizer scaling.

Regret guarantees can depend on the geometry of the regularizer and may be tuned dynamically to observed gradients (as in AdaGrad-style approaches) for data-dependent performance.

3. Adaptive Learning Rates and Generalizations

Recent research (Ito et al., 1 Mar 2024, Tsuchiya et al., 2023, Tsuchiya et al., 30 May 2024) has placed particular focus on adaptive learning rates. Adaptive FTRL frameworks select the learning rate $\eta_t$ (or its dual, the regularization scaling $\beta_t$ ) to minimize regret bounds arising from the stability and penalty decomposition.

Key strategies include:

Stability-Penalty Matching (SPM): Choosing $\eta_t$ so that in each round, $\eta_t \cdot z_t = (\frac{1}{\eta_t} - \frac{1}{\eta_{t-1}}) h_t$ , matching the contributions of stability and penalty (Ito et al., 1 Mar 2024).
Stability-Penalty-Bias (SPB) Matching: For indirect feedback scenarios (partial monitoring, graph bandits), the bias term due to forced exploration is included, and $\beta_t$ is chosen to balance all three sources.
SPA Learning Rate: Learning rate depends jointly on observed stability $z_t$ and penalty $h_t$ , yielding regret of order $O(\sqrt{\sum_t z_t h_{t+1}})$ (Tsuchiya et al., 2023).

These approaches deliver best-of-both-worlds (BOBW) regret bounds: minimax $\tilde{O}(\sqrt{T})$ rates in adversarial regimes; problem-dependent $\tilde{O}(\log T)$ rates in stochastic settings; and, for "hard" partial-feedback problems, the optimal $\tilde{O}(T^{2/3})$ rate.

4. Regularizer Design and Universality

The choice of regularizer has a profound impact on the dimension- and environment-dependent regret rates:

In finite-arm multiclass problems, entropy-based (Shannon or Tsallis) regularizers enable optimal $\sqrt{KT}$ adversarial and $O(\sum_{i:\Delta_i>0} (\log T)/\Delta_i)$ stochastic regret bounds (Jin et al., 2023).
For online linear optimization, the optimal regularizer can be problem-dependent (Gatmiry et al., 22 Oct 2024); constructing a regularizer tailored to the geometry of the action and loss set yields regret within universal constant factors of the minimax rate, improving earlier existential bounds.
In online semidefinite programming and matrix problems, the log-determinant regularizer provides locally strong convexity adapted to loss directions, particularly benefiting problems with sparse losses (Moridomi et al., 2017).

Notably, FTPL (Follow-the-Perturbed-Leader) with carefully chosen noise distributions (e.g., Fréchet-type) can be shown via discrete choice theory to correspond to FTRL with certain regularizers, such as Tsallis entropy (Lee et al., 8 Mar 2024), providing a unified understanding of regularization through randomized smoothing.

5. Implicit and Composite Updates, and Extensions

FTRL encompasses implicit update methods, where the current loss is not linearly approximated but instead optimized exactly (where tractable). Implicit FTRL and generalized implicit FTRL (Chen et al., 2023) leverage Fenchel–Young inequalities as tighter surrogates than first-order approximations, providing improved or robust regret guarantees without increasing computational overhead in quadratic/proximal settings.

Composite-objective FTRL supports nonsmooth composite regularization for, e.g., sparse learning under $L_1$ constraints, and theoretical insights show that methods which handle cumulative nonsmooth penalties in closed form (RDA, FTRL-Proximal) yield greater model sparsity than those which only approximate previous regularization via subgradients (FOBOS) (McMahan, 2010).

6. Applications: Bandits, Games, Constraints, and Deep Learning

FTRL's adaptability enables near-optimal algorithms in a wide array of problems:

Bandits & Partial Monitoring: Through the use of hybrid regularizers and adaptive learning rates, FTRL-based algorithms achieve BOBW bounds in finite-armed, linear, semi-bandit, and structured partial-feedback environments (Ito et al., 1 Mar 2024, Kong et al., 2023, Tsuchiya et al., 30 May 2024).
Zero-Sum and Quantum Games: FTRL (and quantum analogs) furnish no-regret learning dynamics with constant or vanishing regret, Poincaré recurrence, and explicit trade-offs between stability and observability in adversarial (e.g., GAN, min-max) scenarios (Abe et al., 2022, Lotidis et al., 2023, Feng et al., 15 Jun 2024).
Time-Varying Constraints: Penalized-FTRL extends FTRL to settings with dynamic constraints, where cumulative constraint violation is explicitly penalized, yielding $O(\sqrt{T})$ regret and violation under general conditions (Leith et al., 2022).
Deep Learning Optimizers: The Adam optimizer is revealed as a dynamic (discounted) FTRL online learner, providing a theoretical explanation for its empirical success and for the role of momentum and adaptive scaling in achieving low dynamic regret (Ahn et al., 2 Feb 2024).

7. Computational, Structural, and Theoretical Developments

Advanced FTRL implementations address challenges of computation, structural adaptivity, and universality:

Efficient Implementation: Bisection schemes in FTPL allow FTRL-equivalent policies to be run efficiently, even in large scale, with similar regret (Li et al., 30 Sep 2024).
Selection of Optimal Regularizers: In low dimensions, one can algorithmically solve for (nearly) optimal regularizers given action/loss geometry (Gatmiry et al., 22 Oct 2024), though this is generally computationally intractable in high dimensions (NP-hardness of strong convexity decision).
Adversarial Robustness: Mutation-driven FTRL variants offer last-iterate convergence properties in competitive and adversarial games, outperforming standard FTRL in stability and convergence (Abe et al., 2022).
Limitations: There are regimes where dualities between FTRL and FTPL break down, especially in the presence of symmetric perturbations and for $K \geq 3$ arms (Lee et al., 26 Aug 2025), highlighting the necessity for ongoing theoretical refinement.

In summary, FTRL is a foundational online optimization framework whose adaptability to geometry, regularization, learning rate, and feedback structure enables both minimax and instance-optimal guarantees across online convex optimization, bandit, and adversarial learning domains. Modern research leverages FTRL's analytic tractability and unifying power to design algorithms that are computationally efficient, robust to nonstationarity, and provably adaptive to complex sequential environments.