FTRL: Follow the Regularized Leader

Updated 26 January 2026

FTRL is a framework for online convex optimization that minimizes cumulative loss by combining past observed losses with a regularizer for stability and adaptivity.
It employs adaptive learning rates and problem-specific regularizers to balance stability and penalty, yielding optimal regret guarantees in various settings.
FTRL’s versatility is evident in its successful applications in online bandits, linear optimization, and control, effectively handling both adversarial and stochastic regimes.

Follow the Regularized Leader (FTRL) Framework

The Follow the Regularized Leader (FTRL) framework is a foundational paradigm in online convex optimization, online learning, and repeated games. FTRL produces a sequence of decisions by minimizing the sum of observed (or estimated) losses and a regularizer chosen for stability, curvature, or problem-specific geometric considerations. When instantiated appropriately—with tailored regularizers, step sizes, and update rules—FTRL attains optimal regret rates in online linear optimization, bandits, Blackwell approachability, constrained OCO, and partial monitoring, including best-of-both-worlds performance in stochastic and adversarial regimes.

1. FTRL: Core Update Rule and General Properties

FTRL considers a decision set $W \subset \mathbb{R}^n$ (typically convex and compact). At each round $t$ , the learner selects $w_t \in W$ , observes a (convex) loss $f_t$ , and incurs cost $f_t(w_t)$ . The cumulative performance is measured by regret relative to any comparator $w^* \in W$ : $\mathrm{Regret}_T(w^*) = \sum_{t=1}^T f_t(w_t) - \sum_{t=1}^T f_t(w^*)$

The generic FTRL update is: $w_t = \arg\min_{w\in W} \left\{ \sum_{s=1}^{t-1} f_s(w) + R_t(w) \right\}$ where $R_t$ is a strictly convex, possibly time-dependent regularizer, and the sum may be replaced by linearizations in bandit or gradient-based settings.

When the loss is linearized with $g_s \in \partial f_s(w_s)$ , the update reads: $w_t = \arg\min_{w\in W} \left\{ \sum_{s=1}^{t-1} \langle g_s, w \rangle + R_t(w) \right\}$ Learning rates may be absorbed in the regularizer as scaling.

Classical choices for $R_t$ yield known methods: quadratic regularizers give online gradient descent; negative entropy yields the Hedge/Exp3 algorithms; log-determinant and Burg entropy generate matrix and vector FTRL variants with geometry-adapted behavior (Moridomi et al., 2017).

Theoretical guarantees require strong convexity of $R_t$ . Adaptive or data-driven regularizers (e.g., AdaGrad-style) can yield improved, often per-coordinate, regret bounds in sparse and structured settings (McMahan, 2014).

2. Regret Decomposition, Stability, and Penalty Terms

The regret of FTRL decomposes naturally into terms capturing the trade-off between stability (sensitivity to prior loss observations) and penalty (complexity control via regularization). Standard analysis yields: $\mathrm{Regret}_T \leq \sum_{t=1}^T \frac{z_t}{\beta_t} + \beta_1 h_1 + \sum_{t=2}^T (\beta_t - \beta_{t-1}) h_t$ where $\beta_t = 1/\eta_t$ is the (dual) learning rate, $z_t$ quantifies local "variance" or curvature (typically $\|g_t\|_*^2$ ), and $h_t$ bounds the maximum Bregman divergence (Ito et al., 2024, McMahan, 2014). In bandit and partial monitoring problems, additional "bias" terms arise due to importance weighting and forced exploration (Tsuchiya et al., 2024).

Explicit partition into stability and penalty terms enables fine-grained adaptation of the learning rate $\eta_t$ and regularizer $R_t$ , as well as competitive ratio analyses and BOBW tuning (Ito et al., 2024, Tsuchiya et al., 2023).

3. Adaptive Learning Rates and Matching Principles

Learning rate adaptation in FTRL is critical for balancing exploration, stability, and penalty. Several advanced strategies include:

Stability-Penalty Matching (SPM): Choose $\eta_t$ such that the stability and penalty terms are locally matched, yielding explicit relations

$\eta_1 z_1 = \frac{1}{\eta_1} h_1,\quad \eta_t z_t = \left(\frac{1}{\eta_t} - \frac{1}{\eta_{t-1}}\right) h_t$

with recursive closed-form solutions for $\eta_t$ (Ito et al., 2024).

Stability-Penalty-Bias (SPB) Matching: For problems with minimax regret scaling as $\Theta(T^{2/3})$ , as in partial monitoring, combine the three components and set

$(\beta_t - \beta_{t-1}) h_t = 2 \sqrt{\frac{z_t+u_t}{\beta_t}}$

yielding regret $O(T^{2/3})$ in the adversarial case and $O(\log T)$ in the stochastic case (Tsuchiya et al., 2024).

Stability-Penalty-Adaptive (SPA) Learning Rate: The master learning rate $\eta_t = 1/\beta_t$ is recursively updated via

$\beta_{t+1} = \beta_t + c_1 z_t / \sqrt{c_2 + \tau h_1 + \sum_{s=1}^{t-1} z_s h_{s+1}}$

allowing joint sparsity- and game-dependency adaptivity (Tsuchiya et al., 2023).

Designing learning rates to match these decompositions enables robust, instance-optimal performance both in adversarial and stochastic regimes. Best-of-both-worlds regrets are attained without prior knowledge of regime or parameters (Ito et al., 2024, Tsuchiya et al., 2023, Tsuchiya et al., 2024).

4. Regularizer Choice and Geometric Adaptivity

The regularizer $R$ in FTRL encodes geometric information crucial for optimal regret and problem adaptivity:

Euclidean/Quadratic: $R(w)=\frac{1}{2}\|w\|_2^2$ , optimal for isotropic losses and leads to standard OGD-style regret.
Entropy-based: Negative Shannon or Tsallis entropy for simplex or probability vector decisions. Tsallis entropy with parameter $\alpha$ is especially effective in bandit settings for best-of-both-worlds performance and variance self-bounding (Jin et al., 2023).
Log-determinant: In matrix settings, $R(X) = -\alpha \log \det(X+\varepsilon I)$ attains minimax-optimal regret rates for PSD decision spaces subject to sparsity and trace constraints, exploiting strong convexity "with respect to the loss set" (Moridomi et al., 2017).
Self-concordant barriers: Used in adversarial linear bandits, they yield local norm control and dimension-minimax regret $O(d \sqrt{n \ln n})$ (Lévy et al., 28 Oct 2025).
Problem-specific constructed barriers: Given any centrally symmetric convex action and loss sets, it is possible to construct, offline, a regularizer guaranteeing regret within a constant factor of the minimax rate for that geometry (Gatmiry et al., 2024).

Regularizer construction may require expensive preprocessing (e.g., dimension-exponential convex programming), but provides universality and often resolves previously open dimension-dependent optimality questions (Gatmiry et al., 2024).

5. Extensions: Approachability, Implicit and Optimistic FTRL

FTRL generalizes to sophisticated setups:

Blackwell’s Approachability: FTRL is used to minimize support-function distances to convex cones, reducing approachability to FTRL over generators of the polar cone with strongly convex regularization. This refines classic $O(1/\sqrt{T})$ convergence in repeated games, obtaining explicit dependence on geometry and allowing arbitrary distance-like quantities (Euclidean, $\ell_p$ , entropic distances) (Kwon, 2020).
Generalized Implicit FTRL: By replacing linearization with Fenchel-Young inequalities and surrogate loss conjugates, implicit and Mirror-Prox style algorithms are encompassed, strictly improving worst-case regret when the chosen surrogate is sharper than linearity (Chen et al., 2023).
Optimistic and Dynamic FTRL: Injecting predictions or hints about future losses in the update yields so-called Optimistic FTRL (OFTRL), which interpolates between $O(\sqrt{T})$ static regret (adversarial, no prediction) and $O(1)$ regret with perfect future information (Mhaisen et al., 2024, Mhaisen et al., 28 May 2025). Extensions such as history-pruning synchronize dual and primal iterates, restoring minimax optimal dynamic regret rates (Mhaisen et al., 28 May 2025).

In constrained and composite optimization, FTRL supports time-varying and penalized constraints, and analysis extends to Lagrangian primal-dual schemes for OCO with moving feasible sets, attaining $O(\sqrt{T})$ regret and violation under mild Slater conditions (Anderson et al., 2022, Leith et al., 2022).

6. Applications: Online Bandits, Linear Optimization, Control

FTRL is the central machinery behind state-of-the-art algorithms across online learning subdomains:

Multi-Armed, Graph, and Linear Bandits: FTRL, with regularization by negative entropy, Tsallis entropy, and carefully tuned learning rates, attains adversarial minimax rates $O(\sqrt{KT})$ , stochastic rates $O(\sum_k (\log T)/\Delta_k)$ , and unifies analysis for best-of-both-worlds settings—even with non-unique optimal arms (Ito et al., 2024, Jin et al., 2023).
Partial Monitoring: Adaptive FTRL with problem-adapted regularizers and rates matches all known worst-case rates ( $\Theta(T^{2/3})$ ), while the SPA learning rate additionally delivers game-dependent and sparsity-robust guarantees (Tsuchiya et al., 2023, Tsuchiya et al., 2024).
Online Linear Optimization: FTRL with geometry-adapted regularizer (constructed as in (Gatmiry et al., 2024)) guarantees regret within a constant factor of the minimax rate for any convex centrally symmetric action/loss sets, showing complete universality.
Non-Stochastic Control & Prediction: Optimistic FTRL provides policy regret guarantees that interpolate from $O(1)$ to $O(\sqrt{T})$ as prediction quality worsens, now applied to disturbance-action control in dynamic systems (Mhaisen et al., 2024).

FTRL’s unifying capacity is further highlighted by its connections to Follow-the-Perturbed-Leader (FTPL): for any separable strictly convex regularizer, a corresponding ambiguity-based FTPL algorithm exists with computational complexity $O(K \log(1/\epsilon))$ per round, matching FTRL's distributions and regret guarantees but with far greater efficiency (Li et al., 2024).

7. Algorithmic and Analytical Innovations

FTRL can now be viewed as a meta-algorithmic template. Major innovations include:

Learning Rate and Penalty Adaptation: Fine-grained tuning based on problem- and instance-dependent stability and curvature decompositions achieves robust optimality across regimes (Ito et al., 2024, Tsuchiya et al., 2023, Tsuchiya et al., 2024).
Implicit and Composite Loss Handling: The generalized implicit FTRL framework allows for incorporation of Fenchel-type conjugate surrogates directly, yielding improved regret and encompassing implicit OMD, Mirror-Prox, and aProx (Chen et al., 2023).
Geometric Regularization Construction: Explicit construction of minimax-optimal regularizers for arbitrary convex geometry using Gaussian smoothing and quasi-quadratic interpolation, with dimension-exponential but offline computation (Gatmiry et al., 2024).
Dynamic and Pruned History Handling: Pruning approaches ensure that the FTRL state remains synchronized with primal iterates, preventing excessive inertia and enabling true adaptability in dynamic settings (Mhaisen et al., 28 May 2025).

Fundamentally, FTRL is both a unifying analytical tool and a practical algorithmic engine at the heart of modern online learning and OCO, with its ongoing refinement central to advances in regret minimization, approachability, and adaptive online decision-making.