Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making (2302.05430v2)

Published 10 Feb 2023 in stat.ML and cs.LG

Abstract: Smoothed online learning has emerged as a popular framework to mitigate the substantial loss in statistical and computational complexity that arises when one moves from classical to adversarial learning. Unfortunately, for some spaces, it has been shown that efficient algorithms suffer an exponentially worse regret than that which is minimax optimal, even when the learner has access to an optimization oracle over the space. To mitigate that exponential dependence, this work introduces a new notion of complexity, the generalized bracketing numbers, which marries constraints on the adversary to the size of the space, and shows that an instantiation of Follow-the-Perturbed-Leader can attain low regret with the number of calls to the optimization oracle scaling optimally with respect to average regret. We then instantiate our bounds in several problems of interest, including online prediction and planning of piecewise continuous functions, which has many applications in fields as diverse as econometrics and robotics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Regret bounds for the adaptive control of linear quadratic systems. In Proceedings of the 24th Annual Conference on Learning Theory, pages 1–26. JMLR Workshop and Conference Proceedings, 2011.
  2. Online linear optimization via smoothing. In Conference on Learning Theory, pages 807–823. PMLR, 2014.
  3. Fighting bandits with a new kind of smoothness. Advances in Neural Information Processing Systems, 28, 2015.
  4. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, pages 1638–1646. PMLR, 2014.
  5. Online control with adversarial disturbances. In International Conference on Machine Learning, pages 111–119. PMLR, 2019a.
  6. Learning in non-convex games with an optimization oracle. In Conference on Learning Theory, pages 18–29. PMLR, 2019b.
  7. David Angeli. A lyapunov approach to incremental stability. In Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No. 00CH37187), volume 3, pages 2947–2952. IEEE, 2000.
  8. Formulating dynamic multi-rigid-body contact problems with friction as solvable linear complementarity problems. Nonlinear Dynamics, 14:231–247, 1997.
  9. Stabilization of complementarity systems via contact-aware controllers. IEEE Transactions on Robotics, 38(3):1735–1754, 2021.
  10. Estimating and testing linear models with multiple structural changes. Econometrica, pages 47–78, 1998.
  11. Dispersion for data-driven algorithm design, online learning, and private optimization. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 603–614. IEEE, 2018.
  12. How much data is sufficient to learn high-performing algorithms? generalization guarantees for data-driven algorithm design. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 919–932, 2021.
  13. Efficient and near-optimal smoothed online learning for generalized linear functions. arXiv preprint arXiv:2205.13056, 2022.
  14. Smoothed online learning is as easy as statistical learning. In Conference on Learning Theory, pages 1716–1786. PMLR, 2022.
  15. Smoothed online learning for prediction in piecewise affine systems. arXiv preprint arXiv:2301.11187, 2023.
  16. JR Blum. On the convergence of empiric distribution functions. The Annals of Mathematical Statistics, pages 527–529, 1955.
  17. Francesco Borrelli. Constrained optimal control of linear and hybrid systems, volume 290. Springer, 2003.
  18. Distributional and lq norm inequalities for polynomials over convex bodies in rn. Mathematical research letters, 8(3):233–248, 2001.
  19. Prediction, learning, and games. Cambridge university press, 2006.
  20. Following the perturbed leader for online structured learning. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1034–1042, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/cohena15.html.
  21. Online linear quadratic control. In International Conference on Machine Learning, pages 1029–1038. PMLR, 2018.
  22. Follow-the-perturbed-leader for adversarial markov decision processes with bandit feedback. In Advances in Neural Information Processing Systems, 2022.
  23. Certainty equivalent perception-based control. In Learning for Dynamics and Control, pages 399–411. PMLR, 2021.
  24. On the sample complexity of the linear quadratic regulator. Foundations of Computational Mathematics, 20(4):633–679, 2020.
  25. John DeHardt. Generalizations of the glivenko-cantelli theorem. The Annals of Mathematical Statistics, 42(6):2050–2055, 1971.
  26. Consistency and relative efficiency of subspace methods. Automatica, 31(12):1865–1875, 1995.
  27. Richard M Dudley. Central limit theorems for empirical measures. The Annals of Probability, pages 899–929, 1978.
  28. Paul I Feder. On asymptotic distribution theory in segmented regression problems–identified case. The Annals of Statistics, 3(1):49–83, 1975.
  29. Learning nonlinear dynamical systems from a single trajectory. In Learning for Dynamics and Control, pages 851–861. PMLR, 2020.
  30. A survey on switched and piecewise affine system identification. IFAC Proceedings Volumes, 45(16):344–355, 2012.
  31. Mathematical foundations of infinite-dimensional statistical models. Cambridge university press, 2021.
  32. Anti-concentration of polynomials: Dimension-free covariance bounds and decay of fourier coefficients. Journal of Functional Analysis, 283(9):109639, 2022.
  33. Smoothed analysis of online and differentially private learning. Advances in Neural Information Processing Systems, 33:9203–9215, 2020.
  34. Oracle-efficient online learning for beyond worst-case adversaries. arXiv preprint arXiv:2202.08549, 2022a.
  35. Smoothed analysis with adaptive adversaries. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 942–953. IEEE, 2022b.
  36. The computational power of optimization in online learning. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 128–141, 2016.
  37. Introduction to online nonstochastic control. arXiv preprint arXiv:2211.09619, 2022.
  38. Learning linear dynamical systems via spectral filtering. Advances in Neural Information Processing Systems, 30, 2017.
  39. Hybrid Systems: Computation and Control: First International Workshop, HSCC’98, Berkeley, California, USA, April 13-15, 1998: Proceedings. Springer, 1998.
  40. Feedback control of the pusher-slider system: A story of hybrid and underactuated contact dynamics. arXiv preprint arXiv:1611.08268, 2016.
  41. On the stability of unconstrained receding horizon control with a general terminal cost. In Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No. 01CH37228), volume 5, pages 4826–4831. IEEE, 2001.
  42. Information theoretic regret bounds for online nonlinear control. Advances in Neural Information Processing Systems, 33:15312–15325, 2020.
  43. Sham Machandranath Kakade. On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom), 2003.
  44. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
  45. Deep learning. nature, 521(7553):436–444, 2015.
  46. Beyond the hazard rate: More perturbation algorithms for adversarial multi-armed bandits. J. Mach. Learn. Res., 18:183–1, 2017.
  47. Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine learning, 2:285–318, 1988.
  48. Lennart Ljung. System identification: Theory for the users," prentice hall, new jersey, 1999.
  49. Lennart Ljung and Bo Wahlberg. Asymptotic properties of the least-squares method for estimating transfer functions and disturbance spectra. Advances in Applied Probability, 24(2):412–440, 1992.
  50. Active learning for nonlinear system identification with guarantees. arXiv preprint arXiv:2006.10277, 2020.
  51. Mixed-integer formulations for optimal control of piecewise-affine systems. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 230–239, 2019.
  52. Learning the linear quadratic regulator from nonlinear observations. Advances in Neural Information Processing Systems, 33:14532–14543, 2020.
  53. Noise stability of functions with low influences: Invariance and optimality. Annals of Mathematics, pages 295–341, 2010.
  54. Bracketing metric entropy rates and empirical central limit theorems for function classes of besov-and sobolev-type. Journal of Theoretical Probability, 20:177–199, 2007.
  55. Non-asymptotic identification of lti systems from a single trajectory. In 2019 American control conference (ACC), pages 5655–5661. IEEE, 2019.
  56. Tasil: Taylor series imitation learning. arXiv preprint arXiv:2205.14812, 2022.
  57. A direct method for trajectory optimization of rigid bodies through contact. The International Journal of Robotics Research, 33(1):69–81, 2014.
  58. Stability analysis and control of rigid-body systems with impacts and friction. IEEE Transactions on Automatic Control, 61(6):1423–1437, 2015.
  59. Online learning: Stochastic, constrained, and smoothed adversaries. Advances in neural information processing systems, 24, 2011.
  60. Online learning via sequential complexities. J. Mach. Learn. Res., 16(1):155–186, 2015.
  61. Non-asymptotic and accurate learning of nonlinear dynamical systems. Journal of Machine Learning Research, 23(140):1–49, 2022.
  62. Identification and adaptive control of markov jump systems: Sample complexity and regret bounds. arXiv preprint arXiv:2111.07018, 2021.
  63. Naive exploration is optimal for online lqr. In International Conference on Machine Learning, pages 8937–8948. PMLR, 2020.
  64. Learning without mixing: Towards a sharp analysis of linear system identification. In Conference On Learning Theory, pages 439–473. PMLR, 2018.
  65. Improper learning for non-stochastic control. In Conference on Learning Theory, pages 3320–3436. PMLR, 2020.
  66. Online non-convex learning: Following the perturbed leader is optimal. In Algorithmic Learning Theory, pages 845–861. PMLR, 2020.
  67. Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668–20696. PMLR, 2022a.
  68. Bundled gradients through contact via randomized smoothing. IEEE Robotics and Automation Letters, 7(2):4000–4007, 2022b.
  69. Finite sample analysis of stochastic system identification. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 3648–3654. IEEE, 2019.
  70. Sample complexity of kalman filtering for unknown systems. In Learning for Dynamics and Control, pages 435–444. PMLR, 2020.
  71. Statistical learning theory for control: A finite sample perspective. arXiv preprint arXiv:2209.05423, 2022.
  72. Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  73. Estimating and testing multiple structural changes in linear models using band spectral regressions. The Econometrics Journal, 16(3):400–429, 2013.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com