Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Convex Optimization with Unbounded Memory (2210.09903v5)

Published 18 Oct 2022 in cs.LG, math.OC, and stat.ML

Abstract: Online convex optimization (OCO) is a widely used framework in online learning. In each round, the learner chooses a decision in a convex set and an adversary chooses a convex loss function, and then the learner suffers the loss associated with their current decision. However, in many applications the learner's loss depends not only on the current decision but on the entire history of decisions until that point. The OCO framework and its existing generalizations do not capture this, and they can only be applied to many settings of interest after a long series of approximation arguments. They also leave open the question of whether the dependence on memory is tight because there are no non-trivial lower bounds. In this work we introduce a generalization of the OCO framework, "Online Convex Optimization with Unbounded Memory", that captures long-term dependence on past decisions. We introduce the notion of $p$-effective memory capacity, $H_p$, that quantifies the maximum influence of past decisions on present losses. We prove an $O(\sqrt{H_p T})$ upper bound on the policy regret and a matching (worst-case) lower bound. As a special case, we prove the first non-trivial lower bound for OCO with finite memory \citep{anavaHM2015online}, which could be of independent interest, and also improve existing upper bounds. We demonstrate the broad applicability of our framework by using it to derive regret bounds, and to improve and simplify existing regret bound derivations, for a variety of online learning problems including online linear control and an online variant of performative prediction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Regret bounds for the adaptive control of linear quadratic systems. In Sham M. Kakade and Ulrike von Luxburg, editors, COLT 2011 - The 24th Annual Conference on Learning Theory, June 9-11, 2011, Budapest, Hungary, 2011.
  2. Competing in the dark: An efficient algorithm for bandit linear optimization. In Rocco A. Servedio and Tong Zhang, editors, 21st Annual Conference on Learning Theory - COLT 2008, Helsinki, Finland, July 9-12, 2008, pages 263–274. Omnipress, 2008.
  3. Reinforcement learning: Theory and algorithms. 2019a.
  4. Online control with adversarial disturbances. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 111–119. PMLR, 2019b.
  5. Logarithmic regret for online control. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 10175–10184, 2019c.
  6. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.
  7. Online learning over a finite action set with limited switching. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Conference On Learning Theory, COLT 2018, Stockholm, Sweden, 6-9 July 2018, 2018.
  8. Algorithms for hiring and outsourcing in the online labor market. In Yike Guo and Faisal Farooq, editors, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, pages 1109–1118. ACM, 2018.
  9. Online learning for adversaries with memory: Price of past mistakes. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 784–792, 2015.
  10. System level synthesis. Annual Reviews in Control, 47:364–393, 2019.
  11. Online linear optimization and adaptive routing. J. Comput. Syst. Sci., 74(1):97–114, 2008.
  12. Pricing in ride-share platforms: A queueing-theoretic approach. Available at SSRN 2568258, 2015.
  13. Online learning with dynamics: A minimax perspective. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  14. Comparison between linear and nonlinear control strategies for variable speed wind turbines. Control Engineering Practice, 18:1357–1368, 2010.
  15. Performative prediction in a stateful world. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event, volume 151 of Proceedings of Machine Learning Research, pages 6045–6061. PMLR, 2022.
  16. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5:1–122, 2012.
  17. Kernel-based methods for bandit convex optimization. J. ACM, 68(4):25:1–25:35, 2021.
  18. Bandit linear control. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  19. Minimax regret of switching-constrained online convex optimization: No phase transition. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  20. Thomas M Cover. Universal portfolios. Mathematical finance, 1(1):1–29, 1991.
  21. Regret bounds for robust adaptive control of the linear quadratic regulator. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 4192–4201, 2018.
  22. Online bandit learning against an adaptive adversary: from regret to policy regret. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress, 2012.
  23. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.
  24. Modeling curbside parking as a network of finite capacity queues. IEEE Trans. Intell. Transp. Syst., 21(3):1011–1022, 2020.
  25. Logarithmic regret for adversarial online control. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, 2020.
  26. Online learning with non-convex losses and non-stationary regret. In Amos J. Storkey and Fernando Pérez-Cruz, editors, International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, volume 84 of Proceedings of Machine Learning Research, pages 235–243. PMLR, 2018.
  27. Gene H. Golub and Charles F. Van Loan. Matrix Computations, Third Edition. John Hopkins University Press, 1996.
  28. Non-stochastic control with bandit feedback. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  29. Elad Hazan. Introduction to online convex optimization. CoRR, abs/1909.05207, 2019.
  30. An optimal algorithm for bandit convex optimization. CoRR, abs/1603.04350, 2016. URL http://arxiv.org/abs/1603.04350.
  31. The nonstochastic control problem. In Aryeh Kontorovich and Gergely Neu, editors, Algorithmic Learning Theory, ALT 2020, 8-11 February 2020, San Diego, CA, USA, volume 117 of Proceedings of Machine Learning Research, pages 408–421. PMLR, 2020.
  32. John J Horton. Online labor markets. In International workshop on internet and network economics, pages 515–522. Springer, 2010.
  33. Regret minimization with performative feedback. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, 2022.
  34. Anna Karlin. Lecture notes for cse 522: Algorithms and uncertainty, 2017. URL https://courses.cs.washington.edu/courses/cse522/17sp/lectures/Lecture9.pdf.
  35. Vladimír Kučera. Stability of discrete linear feedback systems. IFAC Proceedings Volumes, 8(1):573–578, 1975.
  36. Algebraic riccati equations. Clarendon press, 1995.
  37. Online optimal control with affine constraints. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 8527–8537. AAAI Press, 2021.
  38. Online adaptive controller selection in time-varying systems: No-regret via contractive perturbations. arXiv preprint arXiv:2210.12320, 2022.
  39. The weighted majority algorithm. In 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, USA, 30 October - 1 November 1989, pages 256–261. IEEE Computer Society, 1989.
  40. The weighted majority algorithm. Inf. Comput., 108(2):212–261, 1994.
  41. To predict and serve? Significance, 13(5):14–19, 2016.
  42. Stochastic optimization for performative prediction. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  43. Outside the echo chamber: Optimizing the performative risk. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 7710–7720. PMLR, 2021.
  44. Online control of unknown time-varying dynamical systems. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021.
  45. Francesco Orabona. A modern introduction to online learning. CoRR, abs/1912.13213, 2019.
  46. Performative prediction. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 7599–7609. PMLR, 2020.
  47. Sfpark: Pricing parking by demand. In Parking and the City, pages 344–353. Routledge, 2018.
  48. Stochastic contextual bandits with long horizon rewards. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Washington, DC, USA, February 7-14, 2023. AAAI Press, 2023.
  49. Decision-dependent risk minimization in geometrically decaying dynamic environments. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 8081–8088. AAAI Press, 2022.
  50. Shai Shalev-Shwartz. Online learning and online convex optimization. Found. Trends Mach. Learn., 4(2):107–194, 2012.
  51. Online learning meets optimization in the dual. In Gábor Lugosi and Hans Ulrich Simon, editors, Learning Theory, 19th Annual Conference on Learning Theory, COLT 2006, Pittsburgh, PA, USA, June 22-25, 2006, Proceedings, 2006.
  52. Online optimization with memory and competitive control. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  53. Naive exploration is optimal for online LQR. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, 2020.
  54. Aleksandrs Slivkins. Introduction to multi-armed bandits. Found. Trends Mach. Learn., 12:1–286, 2019.
  55. Online non-convex learning: Following the perturbed leader is optimal. In Aryeh Kontorovich and Gergely Neu, editors, Algorithmic Learning Theory, ALT 2020, 8-11 February 2020, San Diego, CA, USA, volume 117 of Proceedings of Machine Learning Research, pages 845–861. PMLR, 2020.
  56. Reinforcement learning: An introduction. MIT press, 2018.
  57. A system-level approach to controller synthesis. IEEE Trans. Autom. Control., 64(10):4079–4093, 2019.
  58. Modern wiener-hopf design of optimal controllers–part ii: The multivariable case. IEEE Transactions on Automatic Control, 21(3):319–338, 1976.
  59. Online bandit learning for a special class of non-convex losses. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 3158–3164. AAAI Press, 2015.
  60. Bandit convex optimization in non-stationary environments. J. Mach. Learn. Res., 22:125:1–125:45, 2021.
  61. Non-stationary online learning with memory and non-stochastic control. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event, volume 151 of Proceedings of Machine Learning Research, pages 2101–2133. PMLR, 2022.
  62. Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Tom Fawcett and Nina Mishra, editors, Proceedings of the 20th International Conference on Machine Learning, ICML 2003, August 21-24 2003, Washington, DC, USA, 2003.
Citations (6)

Summary

We haven't generated a summary for this paper yet.