Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback
Published 25 May 2026 in cs.LG, math.OC, and stat.ML | (2605.26373v1)
Abstract: We study adversarial online learning with hidden-convex losses, i.e., nonconvex losses that become convex after a nonlinear reparameterization. Ghai, Lu and Hazan (2022) proved that, under geometric and smoothness assumptions, online gradient descent (OGD) on such nonconvex losses approximately simulates online mirror descent (OMD) on the underlying convex losses with a suitable regularizer, yielding $\mathcal{O}(T{2/3})$ regret. They left open whether the optimal $Θ(\sqrt{T})$ regret from online convex optimization can be recovered in this hidden-convex setting. We answer this question affirmatively. More specifically, via a sharper discrete-time algorithmic equivalence argument, we prove that OGD achieves $\mathcal{O}(\sqrt{T})$ regret under the same assumptions, matching the optimal worst-case rate for adversarial online convex optimization. We also address another open question of Ghai, Lu and Hazan (2022) by clarifying the geometry required for this algorithmic equivalence. We replace the diagonal-Jacobian sufficient condition with a necessary-and-sufficient Hessian compatibility condition, thereby expanding the class of admissible reparameterizations. We complement our tight regret bound with a lower bound showing that the Hessian compatibility assumption is essential for OGD; when it fails, we construct a smooth reparameterization and an adversarial sequence of hidden-convex losses for which OGD suffers $Ω(T)$ regret. Finally, we extend our analysis to one-point bandit feedback and prove a $\mathcal{O}(T{3/4})$ expected regret bound for bandit OGD with spherical smoothing, matching its classical rate on convex losses.
The paper demonstrates that OGD attains the optimal O(√T) regret in the online hidden-convex setting by leveraging algorithmic equivalence with OMD.
It introduces the Hessian compatibility condition as a generalized geometric barrier ensuring that the inverse metric matches the Hessian of a convex regularizer.
The study also extends the analysis to bandit feedback, showing that despite limited gradient information, the algorithm achieves O(T^(3/4)) expected regret.
Optimal Regret and Algorithmic Equivalence in Online Learning with Hidden-Convex Losses
Problem Setting and Motivation
The paper addresses adversarial online learning with losses that exhibit hidden convexity, i.e., losses ℓt that are nonconvex in their native parameterization but become convex after a smooth, nonlinear bijection q. Such structural properties are pervasive in modern optimization, including deep learning, reinforcement learning, and nonconvex games, where objectives are nonconvex in original parameters but convex in suitably transformed coordinates. The central question is whether first-order online algorithms can recover the optimal regret guarantees from online convex optimization (OCO) in this more general hidden-convex structure, without oracle access or knowledge of the reparameterization. Prior work established only a O(T2/3) regret bound for OGD in this setting [ghai-lu-hazan22], leaving open whether the Θ(T) optimal rate from OCO is attainable and what geometric assumptions are necessary.
Main Contributions
1. Sharp Regret Bounds for OGD under Exact Gradient Feedback
The authors prove that OGD achieves O(T) regret for adversarial online learning with hidden-convex losses, provided the reparameterization q and the sequence of convex losses ht satisfy regularity and a geometric compatibility condition. This result matches the optimal worst-case rate for OCO and improves upon the previous O(T2/3) guarantee.
The core technical advance is a sharper discrete-time algorithmic equivalence argument between OGD in the original space and OMD in the reparameterized convex space. The analysis demonstrates that OGD mapped through q closely tracks OMD on convex losses, with discretization errors controlled at order O(η2) per step, rather than the previous q0. By leveraging perturbed OMD analysis, they show these errors do not impact the order of regret, enabling q1 performance.
Prior results required a "diagonal-Jacobian" assumption on q2 to ensure algorithmic equivalence. The present work generalizes this requirement to a necessary-and-sufficient "Hessian compatibility" condition: the inverse metric q3 in transformed coordinates must coincide with the Hessian of a convex regularizer q4. This characterization expands the class of admissible parameterizations, capturing a broader range of hidden-convexity transformations.
3. Impossibility Result: Linear Regret without Compatibility
The paper establishes that the Hessian compatibility condition is essential for OGD to achieve sublinear regret. If it is violated—i.e., the induced metric in the transformed space is not the Hessian of any convex regularizer—then adversarial losses can be constructed so that OGD incurs q5 regret, even under strong smoothness and positive-definiteness. The proof leverages geometric intuition: without compatibility, OGD trajectories can cycle nontrivially in the transformed space, accumulating regret proportional to the number of cycles.
4. Extension to Bandit Feedback
The analysis is extended to the one-point bandit setting, where only function values—not gradients—are observed per round. By adapting spherical smoothing for gradient estimation, the authors prove that OGD achieves q6 expected regret against oblivious adversaries. This matches the classical bandit OCO rate [flaxman-kalai-mcmahan05] for convex losses, showing that hidden convexity does not worsen bandit convergence rates.
Technical Approach
The regret analysis leverages algorithmic equivalence between OGD (in the original space) and OMD (in the transformed convex space). The Hessian compatibility is formalized: for q7, the inverse metric q8 must satisfy q9 for all O(T2/3)0, so O(T2/3)1 is a Hessian field. Examples and constructive verifications are provided for various parameterizations, including affine and rank-one nonlinear mixing.
In discrete time, the analysis bounds the difference between OGD and OMD iterates via first-order optimality, strong convexity, and error terms induced by discretization and Taylor expansion. The coupling error is shown to scale as O(T2/3)2.
For the impossibility result, adversarial losses are constructed so that the OGD dynamics in the transformed space have nonzero circulation, resulting in regret proportional to the number of iterations.
In the bandit feedback setting, the regret decomposition isolates domain shrinkage, smoothing, gradient estimation bias, and perturbed OMD regret, with each term controlled via classical arguments and structural properties of the transformation.
Implications and Future Directions
The results formally answer open questions posed by [ghai-lu-hazan22], demonstrating that hidden convexity combined with Hessian compatibility suffices for optimal regret in online learning, even under bandit feedback. Practically, this provides strong guarantees for first-order online methods in structured nonconvex optimization regimes, potentially relevant for neural network training, reinforcement learning with transformed utilities, and nonconvex games.
Theoretically, the necessity of compatibility highlights a geometric barrier for algorithmic equivalence—expanding the transformation class, or robustifying OGD against incompatibility, is an open avenue. Future research may explore if algorithms other than OGD can achieve sublinear regret absent this structure, or if better-than-bandit rates can be achieved with refined estimators.
Conclusion
This work establishes that online gradient descent achieves O(T2/3)3 regret for hidden-convex losses under precise geometric compatibility, strictly improving prior bounds and matching the best known rates for convex optimization. The Hessian compatibility condition is both necessary and sufficient for optimal performance. The findings extend to bandit settings, broadening practical applicability. These results deepen the understanding of algorithmic equivalence and hidden convexity in online nonconvex optimization (2605.26373).