Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Learning with Unknown Constraints (2403.04033v1)

Published 6 Mar 2024 in cs.LG, cs.AI, math.ST, stat.ML, and stat.TH

Abstract: We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round. The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round. We provide a general meta-algorithm that leverages an online regression oracle to estimate the unknown safety constraint, and converts the predictions of an online learning oracle to predictions that adhere to the unknown safety constraint. On the theoretical side, our algorithm's regret can be bounded by the regret of the online regression and online learning oracles, the eluder dimension of the model class containing the unknown safety constraint, and a novel complexity measure that captures the difficulty of safe learning. We complement our result with an asymptotic lower bound that shows that the aforementioned complexity measure is necessary. When the constraints are linear, we instantiate our result to provide a concrete algorithm with $\sqrt{T}$ regret using a scaling transformation that balances optimistic exploration with pessimistic constraint satisfaction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Learning to control in power systems: Design and analysis guidelines for concrete safety problems. Electric Power Systems Research, page 106615, 2020.
  2. Fairness constraints: A flexible approach for fair classification. Journal of Machine Learning Research, 2019.
  3. Learning with user-level privacy. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 12466–12479, 2021.
  4. Kinematic control of redundant robots with guaranteed joint limit avoidance. Robotics and Autonomous Systems, pages 122–131, 2016.
  5. Online learning with an unknown fairness metric. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 2605–2614, 2018.
  6. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annu. Rev. Control. Robotics Auton. Syst., pages 411–444, 2022.
  7. Trading regret for efficiency: online convex optimization with long term constraints. Journal of Machine Learning Research, pages 2503–2528, 2012.
  8. A low complexity algorithm with $o(\sqrt{T})$ regret and o(1) constraint violations for online convex optimization with long term constraints. Journal of Machine Learning Research, 2020.
  9. Online convex optimization with time-varying constraints, 2017.
  10. Safety-aware algorithms for adversarial contextual bandit. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 3280–3288. JMLR.org, 2017.
  11. Adaptive algorithms for online convex optimization with long-term constraints. In Proceedings of the 33nd International Conference on Machine Learning, pages 402–411. JMLR.org, 2016.
  12. Distributed online convex optimization with time-varying coupled inequality constraints. IEEE Transactions on Signal Processing, pages 1–1, 2020.
  13. Safe linear thompson sampling with side information. IEEE Transactions on Signal Processing, pages 3755–3767, 2021.
  14. Decentralized multi-agent linear bandits with safety constraints. Proceedings of the AAAI Conference on Artificial Intelligence, pages 6627–6635, 2021.
  15. Stochastic bandits with linear constraints. In The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, pages 2827–2835. PMLR, 2021.
  16. Exploiting problem geometry in safe linear bandits, 2023.
  17. Contextual bandits with stage-wise constraints, 2024.
  18. Safe convex learning under uncertain constraints. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, pages 2106–2114. PMLR, 2019.
  19. Safe learning under uncertain objectives and constraints, 2020.
  20. Safe online convex optimization with unknown linear safety constraints. In Thirty-Sixth AAAI Conference on Artificial Intelligence, pages 6175–6182. AAAI Press, 2022.
  21. The statistical complexity of interactive decision making, 2023.
  22. On the complexity of adversarial decision making. In Advances in Neural Information Processing Systems 35, 2022.
  23. Practical contextual bandits with regression oracles. In Proceedings of the 35th International Conference on Machine Learning, pages 1534–1543. JMLR.org, 2018.
  24. Beyond ucb: Optimal and efficient contextual bandits with regression oracles. In Proceedings of the 37th International Conference on Machine Learning, pages 3199–3210. PMLR, 2020.
  25. Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective. In Conference on Learning Theory, page 2059. PMLR, 2021.
  26. Contextual bandits and imitation learning via preference-based active queries, 2023a.
  27. Selective sampling and imitation learning via online regression. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
  28. Online non-parametric regression. In Proceedings of The 27th Conference on Learning Theory, pages 1232–1264. PMLR, 2014.
  29. Online learning: Random averages, combinatorial parameters, and learnability. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2010.
  30. Eluder dimension and the sample complexity of optimistic exploration. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pages 2256–2264, 2013.
  31. The nonstochastic multiarmed bandit problem. SIAM J. Comput., page 48–77, 2003.
  32. Volodya Vovk. Competitive on-line linear regression. In Advances in Neural Information Processing Systems 10, [NIPS Conference, Denver, Colorado, USA, 1997], pages 364–370. The MIT Press, 1997.
  33. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, pages 211–246, 2001.
  34. Alekh Agarwal. Selective sampling algorithms for cost-sensitive multiclass prediction. In Proceedings of the 30th International Conference on Machine Learning, pages 1220–1228. PMLR, 2013.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Karthik Sridharan (58 papers)
  2. Seung Won Wilson Yoo (1 paper)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets