Lower bounds on learning rates under partial safety and no-WER

Derive nontrivial lower bounds on the achievable learning rate (for example, sublinear regret as a function of the time horizon T) for any online learning algorithm that simultaneously satisfies partial safety—meaning that no adaptive environment can extract nearly the entire surplus from the learner in the long run—and no-weak-external-regret (vanishing regret against stationary environments).

Background

The paper introduces partial safety as a robustness criterion for online learning algorithms in strategic environments with adverse selection, and shows that standard algorithms like no-external-regret and common no-weak-external-regret methods are unsafe. The authors design the Explore-Exploit-Punish (EEP) and Explore-Signal-Exploit-Punish (ESEP) algorithms, which achieve no-weak-external-regret and partial safety (with ESEP also obtaining welfare efficiency).

While these constructions demonstrate feasibility, the authors note that characterizing fundamental limits—specifically, lower bounds on learning rates achievable under the joint constraints of partial safety and no-WER—remains unresolved. Establishing such lower bounds would clarify the trade-offs between efficiency in learning and robustness against adaptive exploitation.

References

Additionally, deriving lower bounds on the learning rate for algorithms that satisfy both partial safety and no-WER remains an open challenge.

— Robust Online Learning with Private Information (2505.05341 - Okumura, 8 May 2025) in Section: Concluding remarks

Lower bounds on learning rates under partial safety and no-WER

Sponsor

Background

References

Related Problems