Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

Published 25 May 2026 in cs.LG, math.OC, and stat.ML | (2605.26373v1)

Abstract: We study adversarial online learning with hidden-convex losses, i.e., nonconvex losses that become convex after a nonlinear reparameterization. Ghai, Lu and Hazan (2022) proved that, under geometric and smoothness assumptions, online gradient descent (OGD) on such nonconvex losses approximately simulates online mirror descent (OMD) on the underlying convex losses with a suitable regularizer, yielding $\mathcal{O}(T^{2/3})$ regret. They left open whether the optimal $Θ(\sqrt{T})$ regret from online convex optimization can be recovered in this hidden-convex setting. We answer this question affirmatively. More specifically, via a sharper discrete-time algorithmic equivalence argument, we prove that OGD achieves $\mathcal{O}(\sqrt{T})$ regret under the same assumptions, matching the optimal worst-case rate for adversarial online convex optimization. We also address another open question of Ghai, Lu and Hazan (2022) by clarifying the geometry required for this algorithmic equivalence. We replace the diagonal-Jacobian sufficient condition with a necessary-and-sufficient Hessian compatibility condition, thereby expanding the class of admissible reparameterizations. We complement our tight regret bound with a lower bound showing that the Hessian compatibility assumption is essential for OGD; when it fails, we construct a smooth reparameterization and an adversarial sequence of hidden-convex losses for which OGD suffers $Ω(T)$ regret. Finally, we extend our analysis to one-point bandit feedback and prove a $\mathcal{O}(T^{3/4})$ expected regret bound for bandit OGD with spherical smoothing, matching its classical rate on convex losses.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper demonstrates that OGD attains the optimal O(√T) regret in the online hidden-convex setting by leveraging algorithmic equivalence with OMD.
It introduces the Hessian compatibility condition as a generalized geometric barrier ensuring that the inverse metric matches the Hessian of a convex regularizer.
The study also extends the analysis to bandit feedback, showing that despite limited gradient information, the algorithm achieves O(T^(3/4)) expected regret.

Optimal Regret and Algorithmic Equivalence in Online Learning with Hidden-Convex Losses

Problem Setting and Motivation

The paper addresses adversarial online learning with losses that exhibit hidden convexity, i.e., losses $\ell_t$ that are nonconvex in their native parameterization but become convex after a smooth, nonlinear bijection $q$ . Such structural properties are pervasive in modern optimization, including deep learning, reinforcement learning, and nonconvex games, where objectives are nonconvex in original parameters but convex in suitably transformed coordinates. The central question is whether first-order online algorithms can recover the optimal regret guarantees from online convex optimization (OCO) in this more general hidden-convex structure, without oracle access or knowledge of the reparameterization. Prior work established only a $\mathcal{O}(T^{2/3})$ regret bound for OGD in this setting [ghai-lu-hazan22], leaving open whether the $\Theta(\sqrt{T})$ optimal rate from OCO is attainable and what geometric assumptions are necessary.

Main Contributions

1. Sharp Regret Bounds for OGD under Exact Gradient Feedback

The authors prove that OGD achieves $\mathcal{O}(\sqrt{T})$ regret for adversarial online learning with hidden-convex losses, provided the reparameterization $q$ and the sequence of convex losses $h_t$ satisfy regularity and a geometric compatibility condition. This result matches the optimal worst-case rate for OCO and improves upon the previous $\mathcal{O}(T^{2/3})$ guarantee.

The core technical advance is a sharper discrete-time algorithmic equivalence argument between OGD in the original space and OMD in the reparameterized convex space. The analysis demonstrates that OGD mapped through $q$ closely tracks OMD on convex losses, with discretization errors controlled at order $O(\eta^2)$ per step, rather than the previous $q$ 0. By leveraging perturbed OMD analysis, they show these errors do not impact the order of regret, enabling $q$ 1 performance.

2. Refined Geometric Condition: Hessian Compatibility

Prior results required a "diagonal-Jacobian" assumption on $q$ 2 to ensure algorithmic equivalence. The present work generalizes this requirement to a necessary-and-sufficient "Hessian compatibility" condition: the inverse metric $q$ 3 in transformed coordinates must coincide with the Hessian of a convex regularizer $q$ 4. This characterization expands the class of admissible parameterizations, capturing a broader range of hidden-convexity transformations.

3. Impossibility Result: Linear Regret without Compatibility

The paper establishes that the Hessian compatibility condition is essential for OGD to achieve sublinear regret. If it is violated—i.e., the induced metric in the transformed space is not the Hessian of any convex regularizer—then adversarial losses can be constructed so that OGD incurs $q$ 5 regret, even under strong smoothness and positive-definiteness. The proof leverages geometric intuition: without compatibility, OGD trajectories can cycle nontrivially in the transformed space, accumulating regret proportional to the number of cycles.

4. Extension to Bandit Feedback

The analysis is extended to the one-point bandit setting, where only function values—not gradients—are observed per round. By adapting spherical smoothing for gradient estimation, the authors prove that OGD achieves $q$ 6 expected regret against oblivious adversaries. This matches the classical bandit OCO rate [flaxman-kalai-mcmahan05] for convex losses, showing that hidden convexity does not worsen bandit convergence rates.

Technical Approach

The regret analysis leverages algorithmic equivalence between OGD (in the original space) and OMD (in the transformed convex space). The Hessian compatibility is formalized: for $q$ 7, the inverse metric $q$ 8 must satisfy $q$ 9 for all $\mathcal{O}(T^{2/3})$ 0, so $\mathcal{O}(T^{2/3})$ 1 is a Hessian field. Examples and constructive verifications are provided for various parameterizations, including affine and rank-one nonlinear mixing.

In discrete time, the analysis bounds the difference between OGD and OMD iterates via first-order optimality, strong convexity, and error terms induced by discretization and Taylor expansion. The coupling error is shown to scale as $\mathcal{O}(T^{2/3})$ 2.

For the impossibility result, adversarial losses are constructed so that the OGD dynamics in the transformed space have nonzero circulation, resulting in regret proportional to the number of iterations.

In the bandit feedback setting, the regret decomposition isolates domain shrinkage, smoothing, gradient estimation bias, and perturbed OMD regret, with each term controlled via classical arguments and structural properties of the transformation.

Implications and Future Directions

The results formally answer open questions posed by [ghai-lu-hazan22], demonstrating that hidden convexity combined with Hessian compatibility suffices for optimal regret in online learning, even under bandit feedback. Practically, this provides strong guarantees for first-order online methods in structured nonconvex optimization regimes, potentially relevant for neural network training, reinforcement learning with transformed utilities, and nonconvex games.

Theoretically, the necessity of compatibility highlights a geometric barrier for algorithmic equivalence—expanding the transformation class, or robustifying OGD against incompatibility, is an open avenue. Future research may explore if algorithms other than OGD can achieve sublinear regret absent this structure, or if better-than-bandit rates can be achieved with refined estimators.

Conclusion

This work establishes that online gradient descent achieves $\mathcal{O}(T^{2/3})$ 3 regret for hidden-convex losses under precise geometric compatibility, strictly improving prior bounds and matching the best known rates for convex optimization. The Hessian compatibility condition is both necessary and sufficient for optimal performance. The findings extend to bandit settings, broadening practical applicability. These results deepen the understanding of algorithmic equivalence and hidden convexity in online nonconvex optimization (2605.26373).