Online Convex Optimization

Updated 15 November 2025

Online Convex Optimization is a framework for sequential decision-making where a learner minimizes convex losses and regret over time in uncertain environments.
Key algorithmic paradigms, including OGD, FTRL, OMD, and ONS, offer varied update rules and optimal regret bounds tailored to different convexity and feedback settings.
Practical applications span online learning, adaptive control, and resource allocation, improving strategies in both static and dynamic optimization contexts.

Online convex optimization (OCO) is a foundational paradigm for sequential decision-making under uncertainty, unifying adversarial, stochastic, and dynamic optimization procedures where the objective is convex. It models interactions over rounds between a learner, who selects decisions from a convex feasible set, and an environment, which sequentially reveals convex loss functions. Performance is benchmarked by regret relative to a prescribed competitor class, typically the best fixed decision in hindsight, though stronger comparators—such as those encompassing time-varying benchmarks or policies—are also considered.

1. Formal Model and Core Principles

In OCO, the learner operates over a closed convex set $\mathcal{K} \subset \mathbb{R}^d$ , and iterates as follows for $t=1,...,T$ :

Select $x_t \in \mathcal{K}$ .
Observe convex loss $f_t:\mathcal{K} \to \mathbb{R}$ after committing to $x_t$ .
Incur loss $f_t(x_t)$ .

Regret with respect to any $u\in\mathcal{K}$ is defined as: $\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u),$ with the standard metric being static regret (minimum over $u$ ) or, in adversarial/dynamic settings, more general (dynamic/pathwise) regret.

Common assumptions are that each $f_t$ is convex (possibly $G$ -Lipschitz: $\|\nabla f_t(x)\| \leq G$ ), and $\mathcal{K}$ is bounded with diameter $D$ . Subdifferentials and projections onto $\mathcal{K}$ are assumed efficiently computable.

This model subsumes and extends classical online learning with expert advice (where $\mathcal{K}$ is the simplex and $f_t$ is linear), bandit settings (where only $f_t(x_t)$ is revealed), and many variants with additional structure or constraints.

2. Canonical Algorithms and Regret Bounds

The OCO literature is anchored on several algorithmic paradigms, each tailored to different geometric, statistical, or computational objectives:

Method	Per-round Update (summary)	Regret Rate
Online Gradient Descent (OGD)	$x_{t+1} = \Pi_{\mathcal{K}}\left(x_t - \eta_t\nabla f_t(x_t)\right)$	$O(GD\sqrt{T})$
Follow-The-Regularized-Leader (FTRL)	$x_{t+1} = \arg\min_{x\in\mathcal{K}}\{\sum_{s=1}^t f_s(x) + R(x)\}$	$O(GD\sqrt{T})$ , mirrors OGD with suitable $R$
Online Mirror Descent (OMD)	Mirror/proximal step in a $\psi$ -divergence geometry	$O(G_{\psi} D_{\psi}\sqrt{T})$
Online Newton Step (ONS)	Second-order update, exp-concave $f_t$	$O((d/\alpha)\log T)$ for $\alpha$ -exp-concave
Hedge/EG (Expert advice)	Entropic regularization ( $\psi$ negative entropy)	$O(\sqrt{T\log n})$ ( $n$ experts)
Bandit OCO	Non-constructive, gradient estimation with partial feedback	$O(T^{3/4})$ (general), $O(\sqrt{T n})$ (linear)
Frank-Wolfe/Conditional Gradient	Linear optimization instead of projection	$O(T^{3/4})$ (general), projection-free

Key regret bounds:

For convex $f_t$ , OGD/OMD/FTRL with appropriate step-size $\eta_t = \Theta(1/\sqrt{T})$ yields regret $O(GD\sqrt{T})$ .
If $f_t$ is $\alpha$ -strongly convex, step-size $\eta_t = 1/(\alpha t)$ yields $O((G^2/\alpha)\log T)$ regret.
ONS achieves $O((d/\alpha)\log T)$ when $f_t$ are $\alpha$ -exp-concave.
Projection-free methods (Frank-Wolfe) do not reach the optimal $O(\sqrt{T})$ regret in all cases, but are useful for complex $\mathcal{K}$ .

Mirror descent and the entropic geometry yield Hedge/Multiplicative Weights for $\mathcal{K}$ as simplex, matching classical bounds in online combinatorial settings.

3. Structural Regularities and Algorithmic Adaptation

OCO frameworks have evolved to leverage structural properties of loss sequences and the ambient geometry:

Strong Convexity and Exp-Concavity

For $\alpha$ -strongly convex or $\alpha$ -exp-concave losses, static regret drops to $O((G^2/\alpha)\log T)$ or $O((d/\alpha)\log T)$ respectively, as realized by variable-rate OGD or ONS.

Adaptive Bounds and Per-Coordinate Rates

FTPRL (McMahan et al., 2010) adapts regularizer strength per-coordinate, yielding regret bounds

$\mathrm{Regret}_T \le 2 \sum_{i=1}^n D_i \sqrt{\sum_{t=1}^T g_{t,i}^2},$

(where $D_i$ = width in coordinate $i$ ), which can be much tighter than a global bound when losses are sparse or anisotropic.

Variation-Based Regret

For environments with temporal smoothness, dynamic regret bounds scale as

$O\left( \max\{L, \sqrt{\mathrm{VAR}_T^s} \} \right)$

where $\mathrm{VAR}_T^s$ is (gradient) variation over time (Yang et al., 2011). This line mirrors advances in dynamic tracking and provides $o(\sqrt{T})$ regret whenever the environment drifts slowly.

Dynamic and Universal Regret

Regret against moving comparators (dynamic regret) provably scales with the path-length $P = \sum_{t=2}^T \|u_t - u_{t-1}\|$ of the comparator sequence (Gokcesu et al., 2019): $R_T^d(\mathbf{u}) = O\big(\sqrt{D(P+D)}\,G_T\big),$ where $G_T = (\sum_t \|g_t\|^2 )^{1/2}$ . Parameter-free, universal algorithms simultaneously achieve optimal $O(\sqrt{D(P+D)}\,G_T)$ regret for all $P$ .

Contaminated and Non-Stationary Regimes

Recent advances consider contaminated OCO (Kamijima et al., 28 Apr 2024): if $k$ out of $T$ rounds violate strong convexity (or exp-concavity), the optimal regret interpolates between $O(\log T)$ (pure strongly convex/exp-concave) and $O(\sqrt{T})$ (fully general), as $O(\log T+\sqrt{k})$ .

4. Generalizations and Advanced Frameworks

OCO admits numerous sophisticated generalizations:

Time-Varying and Stochastic Constraints

Incorporating time-varying or stochastic constraints, algorithms leveraging virtual queues and drift-Lyapunov analysis achieve $O(1/\epsilon^2)$ convergence time to $\epsilon$ -feasibility and near-optimality under mild Slater-type conditions (Neely et al., 2017).

OCO with Unbounded Memory

OCO has been extended to settings where the loss at time $t$ depends on the entire history $(x_1,...,x_t)$ . The $p$ -effective memory capacity $H_p$ quantifies the influence of past decisions (Kumar et al., 2022), leading to tight regret rates $O(\sqrt{H_p T})$ . This framework subsumes finite-memory and discounted-memory OCO, and yields sharper regret bounds for online control and performative prediction.

Stochastic, Bandit, and Decentralized Models

In the zeroth-order ("bandit") setting, only $f_t(x_t)$ is accessible per round. Quantum algorithms have enabled $O(\sqrt{T})$ regret (removing classical dimension dependence) for general convex and $O(\log T)$ for strongly convex loss sequences (He et al., 2020).
Decentralized OCO considers interacting networked agents receiving local losses. Algorithms based on accelerated gossip achieve near-optimal regret scaling as $\tilde{O}(n\rho^{-1/4}\sqrt{T})$ (convex) and $\tilde{O}(n\rho^{-1/2}\log T)$ (strongly convex), matching lower bounds up to logs in the number of agents $n$ , time horizon $T$ , and spectral gap $\rho$ (Wan et al., 14 Feb 2024).

5. Projection-Free and Oracle-Efficient Approaches

Scenarios with complex feasible sets $\mathcal{K}$ motivate projection-free OCO algorithms:

Frank–Wolfe type algorithms avoid the cost of Euclidean projections, requiring instead linear optimization oracle calls, but trade optimal regret for per-iteration efficiency.
Recent advancements (Mhammedi, 2021, Gatmiry et al., 2023) develop wrappers and barrier-based Newton methods that turn any OCO on a Euclidean ball into a projection-free variant for general convex domains, using only a membership oracle. Such methods yield optimal $O(\sqrt{T}\, \mathrm{polylog}(T))$ regret, with only $O(T\log T)$ membership calls, and are computationally favorable in high-dimensional settings where projections or linear optimization are expensive.

6. Practical Applications and Specialized Models

OCO frameworks underpin online learning, control, network resource allocation, and adaptive signal processing:

Resource allocation under non-stationarity benefits from discounted/forgetting-factor OCO, trading off static and dynamic regret in environments with varying temporal smoothness (Yuan, 2020).
Predictive OCO integrates forecasts (e.g., estimated future gradients), leading to strictly improved dynamic regret without additional environmental assumptions (Lesage-Landry et al., 2019).
Coordinate descent variants (Lin et al., 2022) extend OCO to high-dimensional scenarios where only partial, block-wise updates are computationally feasible, still guaranteeing optimal regret rates and efficient scaling.

7. Open Directions and Frontiers

Active research directions in OCO include:

Tightening regret bounds in contaminated, partially informative, or hybrid adversarial/stochastic environments.
Minimizing oracle complexity (projections, membership, linear oracles) in nontrivial geometries and high-dimensional domains.
Advanced dynamic regret with path-dependent or memory-dependent loss, extending the theory of predictor-adaptive and universal methods.
Decentralized, federated, and privacy-preserving OCO in distributed networks with communication constraints.
Quantum and bandit-feedback OCO bridging the gap between information-theoretic and computational lower bounds.

These developments continue to reinforce OCO as the central lens for designing and analyzing algorithms in adversarial and adaptive decision-making systems across machine learning, control, statistics, and operations research.