Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 109 tok/s
Gemini 3.0 Pro 52 tok/s Pro
Gemini 2.5 Flash 159 tok/s Pro
Kimi K2 203 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Online Convex Optimization

Updated 15 November 2025
  • Online Convex Optimization is a framework for sequential decision-making where a learner minimizes convex losses and regret over time in uncertain environments.
  • Key algorithmic paradigms, including OGD, FTRL, OMD, and ONS, offer varied update rules and optimal regret bounds tailored to different convexity and feedback settings.
  • Practical applications span online learning, adaptive control, and resource allocation, improving strategies in both static and dynamic optimization contexts.

Online convex optimization (OCO) is a foundational paradigm for sequential decision-making under uncertainty, unifying adversarial, stochastic, and dynamic optimization procedures where the objective is convex. It models interactions over rounds between a learner, who selects decisions from a convex feasible set, and an environment, which sequentially reveals convex loss functions. Performance is benchmarked by regret relative to a prescribed competitor class, typically the best fixed decision in hindsight, though stronger comparators—such as those encompassing time-varying benchmarks or policies—are also considered.

1. Formal Model and Core Principles

In OCO, the learner operates over a closed convex set KRd\mathcal{K} \subset \mathbb{R}^d, and iterates as follows for t=1,...,Tt=1,...,T:

  • Select xtKx_t \in \mathcal{K}.
  • Observe convex loss ft:KRf_t:\mathcal{K} \to \mathbb{R} after committing to xtx_t.
  • Incur loss ft(xt)f_t(x_t).

Regret with respect to any uKu\in\mathcal{K} is defined as: RegretT(u)=t=1Tft(xt)t=1Tft(u),\mathrm{Regret}_T(u) = \sum_{t=1}^T f_t(x_t) - \sum_{t=1}^T f_t(u), with the standard metric being static regret (minimum over uu) or, in adversarial/dynamic settings, more general (dynamic/pathwise) regret.

Common assumptions are that each ftf_t is convex (possibly GG-Lipschitz: ft(x)G\|\nabla f_t(x)\| \leq G), and K\mathcal{K} is bounded with diameter DD. Subdifferentials and projections onto K\mathcal{K} are assumed efficiently computable.

This model subsumes and extends classical online learning with expert advice (where K\mathcal{K} is the simplex and ftf_t is linear), bandit settings (where only ft(xt)f_t(x_t) is revealed), and many variants with additional structure or constraints.

2. Canonical Algorithms and Regret Bounds

The OCO literature is anchored on several algorithmic paradigms, each tailored to different geometric, statistical, or computational objectives:

Method Per-round Update (summary) Regret Rate
Online Gradient Descent (OGD) xt+1=ΠK(xtηtft(xt))x_{t+1} = \Pi_{\mathcal{K}}\left(x_t - \eta_t\nabla f_t(x_t)\right) O(GDT)O(GD\sqrt{T})
Follow-The-Regularized-Leader (FTRL) xt+1=argminxK{s=1tfs(x)+R(x)}x_{t+1} = \arg\min_{x\in\mathcal{K}}\{\sum_{s=1}^t f_s(x) + R(x)\} O(GDT)O(GD\sqrt{T}), mirrors OGD with suitable RR
Online Mirror Descent (OMD) Mirror/proximal step in a ψ\psi-divergence geometry O(GψDψT)O(G_{\psi} D_{\psi}\sqrt{T})
Online Newton Step (ONS) Second-order update, exp-concave ftf_t O((d/α)logT)O((d/\alpha)\log T) for α\alpha-exp-concave
Hedge/EG (Expert advice) Entropic regularization (ψ\psi negative entropy) O(Tlogn)O(\sqrt{T\log n}) (nn experts)
Bandit OCO Non-constructive, gradient estimation with partial feedback O(T3/4)O(T^{3/4}) (general), O(Tn)O(\sqrt{T n}) (linear)
Frank-Wolfe/Conditional Gradient Linear optimization instead of projection O(T3/4)O(T^{3/4}) (general), projection-free

Key regret bounds:

  • For convex ftf_t, OGD/OMD/FTRL with appropriate step-size ηt=Θ(1/T)\eta_t = \Theta(1/\sqrt{T}) yields regret O(GDT)O(GD\sqrt{T}).
  • If ftf_t is α\alpha-strongly convex, step-size ηt=1/(αt)\eta_t = 1/(\alpha t) yields O((G2/α)logT)O((G^2/\alpha)\log T) regret.
  • ONS achieves O((d/α)logT)O((d/\alpha)\log T) when ftf_t are α\alpha-exp-concave.
  • Projection-free methods (Frank-Wolfe) do not reach the optimal O(T)O(\sqrt{T}) regret in all cases, but are useful for complex K\mathcal{K}.

Mirror descent and the entropic geometry yield Hedge/Multiplicative Weights for K\mathcal{K} as simplex, matching classical bounds in online combinatorial settings.

3. Structural Regularities and Algorithmic Adaptation

OCO frameworks have evolved to leverage structural properties of loss sequences and the ambient geometry:

Strong Convexity and Exp-Concavity

For α\alpha-strongly convex or α\alpha-exp-concave losses, static regret drops to O((G2/α)logT)O((G^2/\alpha)\log T) or O((d/α)logT)O((d/\alpha)\log T) respectively, as realized by variable-rate OGD or ONS.

Adaptive Bounds and Per-Coordinate Rates

FTPRL (McMahan et al., 2010) adapts regularizer strength per-coordinate, yielding regret bounds

RegretT2i=1nDit=1Tgt,i2,\mathrm{Regret}_T \le 2 \sum_{i=1}^n D_i \sqrt{\sum_{t=1}^T g_{t,i}^2},

(where DiD_i = width in coordinate ii), which can be much tighter than a global bound when losses are sparse or anisotropic.

Variation-Based Regret

For environments with temporal smoothness, dynamic regret bounds scale as

O(max{L,VARTs})O\left( \max\{L, \sqrt{\mathrm{VAR}_T^s} \} \right)

where VARTs\mathrm{VAR}_T^s is (gradient) variation over time (Yang et al., 2011). This line mirrors advances in dynamic tracking and provides o(T)o(\sqrt{T}) regret whenever the environment drifts slowly.

Dynamic and Universal Regret

Regret against moving comparators (dynamic regret) provably scales with the path-length P=t=2Tutut1P = \sum_{t=2}^T \|u_t - u_{t-1}\| of the comparator sequence (Gokcesu et al., 2019): RTd(u)=O(D(P+D)GT),R_T^d(\mathbf{u}) = O\big(\sqrt{D(P+D)}\,G_T\big), where GT=(tgt2)1/2G_T = (\sum_t \|g_t\|^2 )^{1/2}. Parameter-free, universal algorithms simultaneously achieve optimal O(D(P+D)GT)O(\sqrt{D(P+D)}\,G_T) regret for all PP.

Contaminated and Non-Stationary Regimes

Recent advances consider contaminated OCO (Kamijima et al., 28 Apr 2024): if kk out of TT rounds violate strong convexity (or exp-concavity), the optimal regret interpolates between O(logT)O(\log T) (pure strongly convex/exp-concave) and O(T)O(\sqrt{T}) (fully general), as O(logT+k)O(\log T+\sqrt{k}).

4. Generalizations and Advanced Frameworks

OCO admits numerous sophisticated generalizations:

Time-Varying and Stochastic Constraints

Incorporating time-varying or stochastic constraints, algorithms leveraging virtual queues and drift-Lyapunov analysis achieve O(1/ϵ2)O(1/\epsilon^2) convergence time to ϵ\epsilon-feasibility and near-optimality under mild Slater-type conditions (Neely et al., 2017).

OCO with Unbounded Memory

OCO has been extended to settings where the loss at time tt depends on the entire history (x1,...,xt)(x_1,...,x_t). The pp-effective memory capacity HpH_p quantifies the influence of past decisions (Kumar et al., 2022), leading to tight regret rates O(HpT)O(\sqrt{H_p T}). This framework subsumes finite-memory and discounted-memory OCO, and yields sharper regret bounds for online control and performative prediction.

Stochastic, Bandit, and Decentralized Models

  • In the zeroth-order ("bandit") setting, only ft(xt)f_t(x_t) is accessible per round. Quantum algorithms have enabled O(T)O(\sqrt{T}) regret (removing classical dimension dependence) for general convex and O(logT)O(\log T) for strongly convex loss sequences (He et al., 2020).
  • Decentralized OCO considers interacting networked agents receiving local losses. Algorithms based on accelerated gossip achieve near-optimal regret scaling as O~(nρ1/4T)\tilde{O}(n\rho^{-1/4}\sqrt{T}) (convex) and O~(nρ1/2logT)\tilde{O}(n\rho^{-1/2}\log T) (strongly convex), matching lower bounds up to logs in the number of agents nn, time horizon TT, and spectral gap ρ\rho (Wan et al., 14 Feb 2024).

5. Projection-Free and Oracle-Efficient Approaches

Scenarios with complex feasible sets K\mathcal{K} motivate projection-free OCO algorithms:

  • Frank–Wolfe type algorithms avoid the cost of Euclidean projections, requiring instead linear optimization oracle calls, but trade optimal regret for per-iteration efficiency.
  • Recent advancements (Mhammedi, 2021, Gatmiry et al., 2023) develop wrappers and barrier-based Newton methods that turn any OCO on a Euclidean ball into a projection-free variant for general convex domains, using only a membership oracle. Such methods yield optimal O(Tpolylog(T))O(\sqrt{T}\, \mathrm{polylog}(T)) regret, with only O(TlogT)O(T\log T) membership calls, and are computationally favorable in high-dimensional settings where projections or linear optimization are expensive.

6. Practical Applications and Specialized Models

OCO frameworks underpin online learning, control, network resource allocation, and adaptive signal processing:

  • Resource allocation under non-stationarity benefits from discounted/forgetting-factor OCO, trading off static and dynamic regret in environments with varying temporal smoothness (Yuan, 2020).
  • Predictive OCO integrates forecasts (e.g., estimated future gradients), leading to strictly improved dynamic regret without additional environmental assumptions (Lesage-Landry et al., 2019).
  • Coordinate descent variants (Lin et al., 2022) extend OCO to high-dimensional scenarios where only partial, block-wise updates are computationally feasible, still guaranteeing optimal regret rates and efficient scaling.

7. Open Directions and Frontiers

Active research directions in OCO include:

  • Tightening regret bounds in contaminated, partially informative, or hybrid adversarial/stochastic environments.
  • Minimizing oracle complexity (projections, membership, linear oracles) in nontrivial geometries and high-dimensional domains.
  • Advanced dynamic regret with path-dependent or memory-dependent loss, extending the theory of predictor-adaptive and universal methods.
  • Decentralized, federated, and privacy-preserving OCO in distributed networks with communication constraints.
  • Quantum and bandit-feedback OCO bridging the gap between information-theoretic and computational lower bounds.

These developments continue to reinforce OCO as the central lens for designing and analyzing algorithms in adversarial and adaptive decision-making systems across machine learning, control, statistics, and operations research.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Online Convex Optimization.