Online eXp-concave Optimization (OXO)

Updated 5 January 2026

OXO is a framework for sequential decision-making under exp-concave losses that guarantees logarithmic or constant regret with strong curvature properties.
It employs quasi-Newton methods and projection-free techniques to efficiently address full-information, bandit, delayed, and quantum feedback settings.
The approach underpins adaptive learning, online regression, and portfolio optimization, offering robust theoretical guarantees and scalable performance.

Online eXp-concave Optimization (OXO) is a central framework for sequential decision-making under exp-concave loss functions. Exp-concavity, which imposes strong local curvature on losses, enables algorithms to attain logarithmic or even constant regret with optimal computational efficiency. OXO encompasses full-information, bandit, delayed, and quantum feedback settings, and underlies a significant portion of expert advice, online regression, and adaptive learning literature.

1. Foundations: Exp-concavity and Regret Guarantees

Let $\mathcal{K}\subset\mathbb{R}^n$ be a convex decision set, typically with diameter $D$ . The learner plays $T$ rounds, choosing $x_t\in\mathcal{K}$ and incurring loss $f_t(x_t)$ , where each $f_t:\mathcal{K}\to\mathbb{R}$ is $\alpha$ -exp-concave. Formally, $f$ is $\alpha$ -exp-concave if $g(x)=\exp(-\alpha f(x))$ is concave, or equivalently,

$f(y) \geq f(x) + \nabla f(x)^\top(y-x) + \frac{\eta}{2} \left[\nabla f(x)^\top(y-x)\right]^2$

for $\eta \leq \alpha/2$ (He et al., 2024).

Exp-concavity includes many losses ubiquitous in statistical learning (e.g., squared, logistic, and log-loss) (Mahdavi et al., 2014). It leads to sharp regret bounds, $O(n\log T)$ for Online Newton Step (ONS), $O(1)$ in certain expert settings, and $O(d\log n/n)$ excess risk via online-to-batch conversion (Mahdavi et al., 2014).

2. Algorithmic Theory: Online Newton Step and Quasi-Newton Variants

The canonical OXO algorithm is Online Newton Step:

Maintain positive-definite $A_t$ (empirical Hessian estimate).
Update via quasi-Newton direction: $x_{t+1} = \Pi_{\mathcal{K}}^{A_t}\left(x_t - \frac{1}{\eta}A_t^{-1}\nabla f_t(x_t)\right)$ , with $\Pi_{\mathcal{K}}^{A}(y)=\arg\min_{x\in\mathcal{K}}\|x-y\|_A$ .

Per-round complexity is $O(n^2)$ (Sherman–Morrison update), but Mahalanobis projections may cost $\Omega(n^\omega)$ arithmetic operations for $\omega\in(2,3]$ if the domain is nontrivial (Wang et al., 29 Dec 2025). Runtime reduction is possible via LightONS, hysteresis-based projection amortization, self-concordant barriers, or frequent-directions sketching, yielding $O(n^2T + n^\omega\sqrt{T\log T})$ complexity while maintaining $O(n\log T)$ regret (Wang et al., 29 Dec 2025, Mhammedi et al., 2022).

Recent quantum quasi-Newton approaches extend this theory to bandit and low-information feedback, leveraging quantum gradient estimation. These methods utilize a quantum zeroth-order oracle and achieve $O(n\log T)$ regret with only $O(1)$ queries per round—a factor of $T^{2/3}$ improvement over optimal classical OXO under multi-point bandit feedback (He et al., 2024).

3. Regret Analysis, Adaptivity, and Dynamic Regret

For static regret, OXO delivers the minimax rate $O(n\log T)$ for full-information exp-concave losses (Mahdavi et al., 2014, Wang et al., 29 Dec 2025). Under delayed feedback, adaptive regularization schedules yield $R_T = O(n\log T + \min\{d_{\max} n\log T,\sqrt{d_{\mathrm{tot}}}\})$ , where $d_{\max}$ and $d_{\mathrm{tot}}$ are delay parameters (Qiu et al., 9 Jun 2025).

Dynamic regret—comparing against time-varying comparator sequences—can be sharply minimized. A strongly adaptive improper learner combining Follow-the-Leading-History with ONS attains $\tilde{O}(n^{1/3}C_n^{2/3}\vee 1)$ regret, where $C_n$ is the total path variation of comparators (Baby et al., 2021). This matches known lower bounds for exp-concave learning subsets and improves over $\widetilde{O}(\sqrt{nC_n})$ rates for convex losses.

Table: Representative regret rates under different OXO settings

Feedback Regime	Algorithmic Technique	Regret Bound
Full Information	ONS, LightONS, OQNS	$O(n \log T)$
Delayed Feedback	Delayed ONS, Adaptive FTRL	$O(n \log T+\min\{d_{\max} n\log T,\sqrt{d_{\mathrm{tot}}}\})$
Bandit Multi-Point	Quantum Quasi-Newton	$O(n\log T)$ ( $O(1)$ queries)
Dynamic Comparator	FLH + ONS	$\tilde{O}(n^{1/3}C_n^{2/3})$
Stochastic Excess Risk	Online-to-batch LightONS/OQNS	$O(n\log n/n)$

4. Efficient and Projection-Free OXO

Computational complexity of OXO can be prohibitive in high dimensions or structured domains. Projection-free OXO, relying on a linear optimization oracle (LOO) rather than projection, ensures tractability for complex feasible sets (Garber et al., 2023). By combining block-based infeasible Newton steps with Frank–Wolfe approximate projections, regret bounds $\widetilde{O}(n^{2/3}T^{2/3})$ are attainable in the worst-case, and $\widetilde{O}(\rho^{2/3}T^{2/3})$ when the loss gradients span a low-dimensional subspace of dimension $\rho\ll n$ .

LightONS (Wang et al., 29 Dec 2025) and barrier-based OQNS (Mhammedi et al., 2022) further amortize expensive projections (or eliminate them entirely via self-concordance), restoring near-optimal statistical and runtime efficiency even for general convex sets.

5. Universal and Adaptive OXO: Problem-dependent Regret

Recent advances introduce universal and adaptive OXO schemes that automatically interpolate between convex ( $O(\sqrt{T})$ ), exp-concave ( $O(n\log T)$ ), and strongly convex ( $O(\log T)$ ) rates, while further adapting to problem-dependent quantities such as the cumulative gradient variation $V_T$ (Zhao et al., 25 Nov 2025). The UniGrad framework (UniGrad.Correct, UniGrad.Bregman, UniGrad++) achieves $O(n\log V_T)$ regret for exp-concave losses, $O(\log V_T)$ for strongly convex, and $O(\sqrt{V_T})$ for convex, all without prior knowledge of curvature. Meta algorithms run $O(\log T)$ base learners (each an ONS or OGD variant), or just one gradient per round via surrogate loss construction, and maintain computational efficiency.

6. Extensions: Composite Losses, Bandits, Banach Spaces

Composite loss structures—proper composite and mixable losses—can be transformed into exp-concave surrogates using explicit reparameterizations (geometric or calculus-based links) (Kamalaruban et al., 2018). This enables constant regret OXO (Weighted Average/Exponential Weights), crucial for online prediction with expert advice, mixability, and aggregation algorithms.

OXO forms the basis for parameter-free online learning in Banach spaces via coin-betting and ONS reductions (Cutkosky et al., 2018). One attains optimal bounds under arbitrary norms, merging statistical theory and computational feasibility.

Quantum OXO leverages quantum gradient estimation to attain optimal regret with exponentially decreased query complexity, suggesting additional quantum advantages may arise for higher-order bandit settings or for reducing linear dimension dependence (He et al., 2024).

7. Applications and Open Problems

OXO is central to on-line regression, portfolio optimization, adaptive filtering, and bandit learning. For exp-concave loss classes, optimal rates underpin locally adaptive nonparametric regression, TV-denoising, stochastic bandits, and game-theoretic equilibrium.

Current open problems include:

Dimension reduction in quantum OXO algorithms while preserving near-optimal regret.
Generalization of quasi-Newton and rank- $k$ updates for tradeoff optimization.
Full-information Hessian estimation in $O(1)$ quantum queries and bandit optimization under single-point or partial feedback (He et al., 2024).
Extensions to memory-efficient sketching and generalized linear bandits (Wang et al., 29 Dec 2025).
Robust OXO mechanisms for fully improper/stochastic feedback and universal gradient-scale adaptivity (Zhao et al., 25 Nov 2025).