Beyond $\tilde{O}(\sqrt{T})$ Constraint Violation for Online Convex Optimization with Adversarial Constraints (2505.06709v1)
Abstract: We revisit the Online Convex Optimization problem with adversarial constraints (COCO) where, in each round, a learner is presented with a convex cost function and a convex constraint function, both of which may be chosen adversarially. The learner selects actions from a convex decision set in an online fashion, with the goal of minimizing both regret and the cumulative constraint violation (CCV) over a horizon of $T$ rounds. The best-known policy for this problem achieves $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. In this paper, we present a surprising improvement that achieves a significantly smaller CCV by trading it off with regret. Specifically, for any bounded convex cost and constraint functions, we propose an online policy that achieves $\tilde{O}(\sqrt{dT}+ T\beta)$ regret and $\tilde{O}(dT{1-\beta})$ CCV, where $d$ is the dimension of the decision set and $\beta \in [0,1]$ is a tunable parameter. We achieve this result by first considering the special case of $\textsf{Constrained Expert}$ problem where the decision set is a probability simplex and the cost and constraint functions are linear. Leveraging a new adaptive small-loss regret bound, we propose an efficient policy for the $\textsf{Constrained Expert}$ problem, that attains $O(\sqrt{T\ln N}+T{\beta})$ regret and $\tilde{O}(T{1-\beta} \ln N)$ CCV, where $N$ is the number of experts. The original problem is then reduced to the $\textsf{Constrained Expert}$ problem via a covering argument. Finally, with an additional smoothness assumption, we propose an efficient gradient-based policy attaining $O(T{\max(\frac{1}{2},\beta)})$ regret and $\tilde{O}(T{1-\beta})$ CCV.