Proximal Regret: A Unified Framework

Updated 10 November 2025

Proximal Regret is a performance metric that evaluates online decisions by comparing them to locally optimal proximal deviations in adaptive control and optimization.
It underpins optimal dynamic regret bounds in convex, nonconvex, and stochastic settings, achieving rates like O(√T) for convex losses and O(log T) under strong convexity.
Its application in online games and adaptive systems leads to refined equilibrium concepts, such as proximal correlated equilibrium, enhancing algorithmic stability and convergence.

Proximal regret is a refined notion of performance evaluation for online algorithms and adaptive control policies that leverage proximal operations for regularization or constraint handling. It measures the discrepancy between the sequence of online decisions and the sequence that would be obtained by applying optimal local “proximal” deviations, and it arises in a wide spectrum of online convex and nonconvex optimization, game-theoretic learning, adaptive control, and estimation contexts. Proximal regret unifies and generalizes several classical and emerging regret notions, providing both sharper convergence guarantees and deeper insights into algorithmic equilibrium properties.

1. Formal Definitions and Taxonomy

Proximal regret is defined in relation to (generalized) proximal operators. For a function $f:X \rightarrow \mathbb{R}\cup\{+\infty\}$ , the proximal operator is

$\operatorname{prox}_f(x) = \arg\min_{y\in X} \left\{ f(y) + \frac{1}{2}\|y-x\|^2 \right\}.$

Given a sequence of online decisions $x^1,\ldots,x^T\in X$ and convex loss functions $\ell^t:X\rightarrow\mathbb{R}$ , the $f$ -proximal regret is

$\operatorname{Reg}_f^T = \sum_{t=1}^T \ell^t(x^t) - \ell^t(\operatorname{prox}_f(x^t)).$

Over a family $\mathcal{F}$ of weakly-convex functions, the $\mathcal{F}$ -proximal regret is

$\operatorname{Reg}_\mathcal{F}^T = \sup_{f\in\mathcal{F}} \operatorname{Reg}_f^T.$

This construction interpolates between external regret (comparing $x^t$ to a fixed $x^*$ for all $t$ ) and swap regret (comparing $x^t$ to an arbitrary map $\phi(x^t)$ ). When $\mathcal{F}$ contains only indicator functions, proximal regret recovers external regret. For all mappings $\phi:X\rightarrow X$ , it recovers swap regret. The concept expresses the “local” cost of not applying any proximal deviation at each round and thus strictly refines external regret.

2. Proximal Regret in Online Convex and Composite Optimization

Proximal regret has a direct operationalization in online convex and composite optimization settings. Let $F_t(x) = f_t(x) + r_t(x)$ be a composite loss, with $f_t$ convex (possibly nonsmooth) and $r_t$ a (possibly nondifferentiable) convex time-varying regularizer. The dynamic regret against a comparator sequence $\{u_t\}$ is

$R_T^{\mathrm{dyn}} = \sum_{t=1}^T [F_t(x_t) - F_t(u_t)].$

Most analyses introduce a path-variation metric to quantify the difficulty of tracking a fast-moving sequence: $D_\beta(T) = \sum_{t=2}^T t^{\beta-1} \|u_t - u_{t-1}\|.$ The optimal Proximal Online Gradient (POG) and Proximal Mirror Descent algorithms, which alternate mirror or Euclidean proximal steps, achieve dynamic regret rates that are optimal in $T$ and $D_\beta(T)$ . Specifically, with appropriately chosen, decaying step sizes:

For convex (nonsmooth) losses, $R_T^{\mathrm{dyn}} = \mathcal{O}\left( \sqrt{T^{1-\beta}D_\beta(T) + T} \right)$ .
For $\mu$ -strongly-convex losses, $R_T^{\mathrm{dyn}} = \mathcal{O}\left( \log T \cdot [1 + T^{-\beta}D_\beta(T)] \right)$ (Hou et al., 2023, Zhao et al., 2018).

Table: Proximal Dynamic Regret Rates

Problem Setting	Proximal Regret Bound
Convex, variable regularizer	$\mathcal{O}\left( \sqrt{T^{1-\beta}D_\beta(T) + T} \right)$
Strongly convex	$\mathcal{O}\left( \log T [1 + T^{-\beta}D_\beta(T)] \right)$

These rates, proven minimax-optimal, show that proximal methods are not only computationally efficient but also optimal for dynamic regret.

3. Proximal Regret in Online Games and Equilibrium Theory

In online convex games, proximal regret gives rise to new solution concepts and equilibrium refinements. Cai et al. (Cai et al., 3 Nov 2025) introduce proximal correlated equilibrium (PCE), a tractable refinement of coarse correlated equilibrium. If every player employs a no-proximal-regret algorithm against the class of weakly-convex deviations, the empirical play converges to PCE, formalized as: $\mathbb{E}_{x\sim\sigma}[u_i(x)] \geq \mathbb{E}_{x\sim\sigma}[u_i(\operatorname{prox}_f^i(x_i),x_{-i})] \quad \forall i, \forall f \in \mathcal{F}_{\mathrm{wc}}.$ Notably:

Online Gradient Descent (OGD) minimizes proximal regret at the optimal $\mathcal{O}(\sqrt{T})$ rate.
OGD convergence to PCE is certified at $O(1/\sqrt{T})$ in the empirical distribution of play.
The PCE concept interpolates between Nash (or correlated) equilibrium and more general equilibrium concepts such as gradient equilibrium and semicoarse correlated equilibrium (Cai et al., 3 Nov 2025).

This reveals that classical algorithms like OGD and Mirror Descent minimize a strictly stronger regret than external regret, with algorithmic convergence to refined equilibria realized via proximal operations.

4. Proximal Regret Beyond Convexity: Nonconvex, Stochastic, and Constraint Settings

Extensions of proximal regret to nonconvex and stochastic domains rely on proximal methods that regularize the objective through quadratic or Bregman penalties. In nonconvex online optimization with long-term constraints, proximal methods of multipliers (e.g., OPMM) allow for the definition and control of KKT-based regret metrics, such as:

Lagrangian gradient violation regret: average norm of aggregated KKT residuals.
Constraint violation regret: time-averaged constraint violations.
Complementarity residual regret: measures deviations from complementary slackness. Nonconvex, constrained settings yield sublinear regret rates (typically $O(T^{-1/8})$ for KKT violations and $O(T^{-1/4})$ for complementarity residuals) (Zhang et al., 2022). In convex stochastic programming, stochastic approximation proximal method of multipliers (PMMSopt) achieves expectation–constrained optimality gap and constraint-violation regrets of $O(T^{-1/2})$ in expectation and $O(T^{-1/4})$ , $O(T^{-1/8})$ with high probability (Zhang et al., 2019).

In adaptive control, recursive proximal updates for parameter estimation ensure contractive parameter error maps under finite excitation, yielding finite regret proportional only to the duration of sufficient excitation. This approach relaxes persistent-excitation requirements and tightly couples model adaptation with regret performance (Karapetyan et al., 2 Apr 2024).

5. Proximal Regret in Time-Varying and Inexact Settings

In online settings with time-varying objectives and inexact computation, proximal regret admits tight characterizations through path variation and error accumulation. Letting $x_k^* = \arg\min f_k(x)$ and defining the path length $\Sigma_T = \sum_{k=2}^T \|x_k^* - x_{k-1}^*\|$ and error sums $E_T = \sum_k \|e_k\|,\, P_T = \sum_k \epsilon_k$ for gradient and proximal step inexactness, the regret decomposes as

$\operatorname{Reg}_T^{\mathrm{dyn}} = O(1 + \Sigma_T + P_T + E_T),$

under strong convexity and suitable stepsize choices (Choi et al., 2023, Ajalloeian et al., 2019, Dixit et al., 2018). These bounds, free of $\mathcal{O}(T)$ terms, illustrate that regret remains controlled even for non-smooth, time-varying, and inexactly solved subproblems.

In receding-horizon SOCO, alternating proximal steps with prediction windows provide dynamic regret that decays exponentially in the prediction horizon $W$ :

$\operatorname{Reg}_N \leq C(\mathcal{P}_N) \exp(-\Theta(W)),$

with $\mathcal{P}_N$ the path length of per-stage cost minimizers, extending applicability to non-smooth and general-proximal losses (Senapati et al., 2022).

6. Practical Implications and Limitations

The proximal regret paradigm underlies several practical advances:

It enables the design of online algorithms with provable optimality guarantees on dynamic regret and equilibrium convergence properties.
It supports handling time-varying, nonsmooth, or composite objectives, and constraints—crucially important in dynamic learning, adaptive estimation, control, and online games.
In stochastic and nonconvex settings, it provides the structural framework for sublinear regret bounds in both expectation and high probability, linking statistical learning to robust algorithmic performance.
In multi-player and adversarial environments, it provides principled, tractable solution concepts beyond classical Nash or correlated equilibria.

However, several limitations and reliability concerns persist:

Dynamic Nash equilibrium regret, despite being bounded by the duality gap, may not guarantee that online iterates remain near instantaneous equilibria (Meng et al., 5 Jul 2024).
In nonconvex or constrained domains, minimizing proximal regret is generally harder, and rates degrade relative to unconstrained convex settings (Zhang et al., 2022, Zhang et al., 2019).
Tuning of stepsizes, proximal parameters, and regularization strength must carefully balance statistical efficiency, computational tractability, and model stability.

Thus, proximal regret is a central organizing principle in modern online optimization and game-theoretic learning, bridging first-order optimization, dynamic control, and equilibrium computation under a unified, minimax framework. Ongoing work aims to extend these results to more general classes of deviations, stochastic feedback, and partial-information regimes, further amplifying their significance in algorithmic and systems research.