Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms (2311.18736v2)

Published 30 Nov 2023 in eess.SY, cs.AI, cs.CE, cs.LG, cs.SY, and math.OC

Abstract: We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-dimensional partial differential equation (PDE)-based control problems. Integrated within the OpenAI Gym/Gymnasium (Gym) framework, controlgym allows direct applications of standard reinforcement learning (RL) algorithms like stable-baselines3. Our control environments complement those in Gym with continuous, unbounded action and observation spaces, motivated by real-world control applications. Moreover, the PDE control environments uniquely allow the users to extend the state dimensionality of the system to infinity while preserving the intrinsic dynamics. This feature is crucial for evaluating the scalability of RL algorithms for control. This project serves the learning for dynamics & control (L4DC) community, aiming to explore key questions: the convergence of RL algorithms in learning control policies; the stability and robustness issues of learning-based controllers; and the scalability of RL algorithms to high- and potentially infinite-dimensional systems. We open-source the controlgym project at https://github.com/xiangyuan-zhang/controlgym.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Online control with adversarial disturbances. In International Conference on Machine Learning, pages 111–119, 2019.
  2. TorchRL: A data-driven decision-making library for PyTorch. arXiv preprint arXiv:2306.00577, 2023.
  3. OpenAI Gym. arXiv preprint arXiv:1606.01540, 2016.
  4. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022.
  5. Closed-loop turbulence control: Progress and challenges. Applied Mechanics Reviews, 67(5):050801, 2015.
  6. LQR through the lens of first order methods: Discrete-time case. arXiv preprint arXiv:1907.08921, 2019a.
  7. Global convergence of policy gradient for sequential zero-sum linear quadratic dynamic games. arXiv preprint arXiv:1911.04672, 2019b.
  8. Control of chaotic systems by deep reinforcement learning. Proceedings of the Royal Society A, 475(2231):20190351, 2019.
  9. HydroGym. https://github.com/dynamicslab/hydrogym, 2023. Accessed: 2023-11-15.
  10. Black-box control for linear dynamical systems. In Conference on Learning Theory, pages 1114–1143, 2021.
  11. Exponential time differencing for stiff systems. Journal of Computational Physics, 176(2):430–455, 2002.
  12. Pattern formation outside of equilibrium. Reviews of modern physics, 65(3):851, 1993.
  13. A reinforcement learning look at risk-sensitive linear quadratic Gaussian control. In Learning for Dynamics and Control Conference, pages 534–546, 2023.
  14. On the state space geometry of the kuramoto–sivashinsky flow in a periodic domain. SIAM Journal on Applied Dynamical Systems, 9(1):1–33, 2010.
  15. On the sample complexity of the linear quadratic regulator. Foundations of Computational Mathematics, 20(4):633–679, 2020.
  16. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
  17. Optimization landscape of gradient descent for discrete-time static output feedback. In American Control Conference, pages 2932–2937, 2022.
  18. On the optimization landscape of dynamic output feedback linear quadratic control. IEEE Transactions on Automatic Control, 2023.
  19. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning, pages 1329–1338, 2016.
  20. An empirical investigation of the challenges of real-world reinforcement learning. arXiv preprint arXiv:2003.11881, 2020.
  21. Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476, 2018.
  22. Learning the globally optimal distributed LQ regulator. In Learning for Dynamics and Control, pages 287–297, 2020.
  23. Deluca–a differentiable control library: Environments, methods, and benchmarking. Differentiable Computer Vision, Graphics, and Physics in Machine Learning (Neurips 2020 Workshop), 2020.
  24. Learning robust controllers for linear quadratic systems with multiplicative noise via policy gradient. arXiv preprint arXiv:1907.03680, 2019.
  25. Global convergence of direct policy search for state-feedback h∞subscriptℎh_{\infty}italic_h start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT robust control: A revisit of nonsmooth synthesis with goldstein subdifferential. In 36th Conference on Neural Information Processing Systems, New Orleans, LA, Nov, volume 28, 2022.
  26. Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. SIAM Journal on Control and Optimization, 59(5):3359–3391, 2021.
  27. Toward a theoretical foundation of policy optimization for learning control policies. Annual Review of Control, Robotics, and Autonomous Systems, 6:123–158, 2023.
  28. Policy optimization for markovian jump linear quadratic control: Gradient method and global convergence. IEEE Transactions on Automatic Control, 68(4):2475–2482, 2022.
  29. A model-free first-order method for linear quadratic regulator with O~⁢(1/ε)~𝑂1𝜀\widetilde{O}(1/\varepsilon)over~ start_ARG italic_O end_ARG ( 1 / italic_ε ) sampling complexity. arXiv preprint arXiv:2212.00084, 2022.
  30. Sham M Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, pages 1531–1538, 2002.
  31. Fourth-order time-stepping for stiff pdes. SIAM Journal on Scientific Computing, 26(4):1214–1233, 2005.
  32. Model-free μ𝜇\muitalic_μ synthesis via adversarial reinforcement learning. In American Control Conference, pages 3335–3341, 2022.
  33. L4DC. L4DC 2024 conference. https://l4dc.web.ox.ac.uk/, 2023. Accessed: 2023-11-15.
  34. Reinforcement learning with fast stabilization in linear dynamical systems. In International Conference on Artificial Intelligence and Statistics, pages 5354–5390, 2022.
  35. F Leibfritz and W Lipinski. COMPlesubscript𝑙𝑒l_{e}italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPTib 1.0–user manual and quick reference. Department of Mathematics, University of Trier, D–54286 Trier, Germany, Tech. Rep, 2004.
  36. Friedemann Leibfritz. COMPlesubscript𝑙𝑒l_{e}italic_l start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPTib, COnstraint Matrix-optimization Problem l⁢i⁢b𝑙𝑖𝑏libitalic_l italic_i italic_brary-a collection of test examples for nonlinear semidefinite programs, control system design and related problems. Dept. Math., Univ. Trier, Trier, Germany, Tech. Rep, 2004, 2004.
  37. Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach. IEEE Transactions on Automatic Control, 67(12):6429–6444, 2021.
  38. RLlib: Abstractions for distributed reinforcement learning. In International conference on machine learning, pages 3053–3062, 2018.
  39. Physics-informed dyna-style model-based deep reinforcement learning for dynamic control. Proceedings of the Royal Society A, 477(2255):20210618, 2021.
  40. Derivative-free methods for policy optimization: Guarantees for linear quadratic systems. Journal of Machine Learning Research, 21(21):1–51, 2020.
  41. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pages 1928–1937, 2016.
  42. Convergence and sample complexity of gradient methods for the model-free linear–quadratic regulator problem. IEEE Transactions on Automatic Control, 67(5):2435–2450, 2021.
  43. Reinforcement learning-based estimation for partial differential equations. arXiv preprint arXiv:2302.01189, 2023.
  44. Computing stabilizing feedback gains via a model-free policy gradient method. IEEE Control Systems Letters, 7:407–412, 2022.
  45. Reinforcement learning with function-valued action spaces for partial differential equation control. In International Conference on Machine Learning, pages 3986–3995. PMLR, 2018.
  46. Distributed control of partial differential equations using convolutional reinforcement learning. arXiv preprint arXiv:2301.10737, 2023.
  47. Stabilizing dynamical systems via policy gradient methods. In Advances in Neural Information Processing Systems, pages 29274–29286, 2021.
  48. Stable-baselines3: Reliable reinforcement learning implementations. The Journal of Machine Learning Research, 22(1):12348–12355, 2021.
  49. The Transform and Data Compression Handbook. CRC Press, 2018.
  50. Benjamin Recht. A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems, 2:253–279, 2019.
  51. Stability and Transition in Shear Flows. Springer, 2001.
  52. Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897, 2015.
  53. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  54. Naive exploration is optimal for online lqr. In International Conference on Machine Learning, pages 8937–8948, 2020.
  55. Improper learning for non-stochastic control. In Conference on Learning Theory, pages 3320–3436, 2020.
  56. Reinforcement learning: An introduction. MIT press, 2018.
  57. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, pages 1057–1063, 2000.
  58. Analysis of the optimization landscape of linear quadratic Gaussian (LQG) control. Mathematical Programming, 2023. URL https://doi.org/10.1007/s10107-023-01938-4.
  59. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033, 2012.
  60. Gymnasium, March 2023. URL https://zenodo.org/record/8127025.
  61. Lloyd Nicholas Trefethen. Finite difference and spectral methods for ordinary and partial differential equations, 1996. URL http://people.maths.ox.ac.uk/trefethen/pdetext.html.
  62. Statistical learning theory for control: A finite sample perspective. arXiv preprint arXiv:2209.05423, 2022.
  63. The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Conference on Learning Theory, pages 3036–3083, 2019.
  64. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020. ISSN 2665-9638. URL https://www.sciencedirect.com/science/article/pii/S2665963820300099.
  65. Globally convergent policy search over dynamic filters for output estimation. arXiv preprint arXiv:2202.11659, 2022.
  66. Handbook of reinforcement learning and control. Springer, 2021.
  67. Recent advances in applying deep reinforcement learning for flow control: Perspectives and future directions. Physics of Fluids, 35(3), 2023.
  68. Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research, 23(267):1–6, 2022. URL http://jmlr.org/papers/v23/21-1127.html.
  69. Learning a model is paramount for sample efficiency in reinforcement learning control of pdes. arXiv preprint arXiv:2302.07160, 2023.
  70. h∞subscriptℎh_{\infty}italic_h start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT tracking control for linear discrete-time systems: model-free q-learning designs. IEEE Control Systems Letters, 5(1):175–180, 2020.
  71. Provably global convergence of actor-critic: A case for linear quadratic regulator with ergodic cost. Advances in neural information processing systems, 32, 2019.
  72. Data-driven control of spatiotemporal chaos with reduced-order neural ode-based models and reinforcement learning. Proceedings of the Royal Society A, 478(2267):20220297, 2022.
  73. Policy optimization provably converges to nash equilibria in zero-sum linear quadratic games. Advances in Neural Information Processing Systems, 32, 2019.
  74. On the stability and convergence of robust adversarial reinforcement learning: A case study on linear quadratic systems. In Advances in Neural Information Processing Systems, pages 22056–22068, 2020.
  75. Policy optimization for ℋ2subscriptℋ2\mathcal{H}_{2}caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT linear control with ℋ∞subscriptℋ\mathcal{H}_{\infty}caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT robustness guarantee: Implicit regularization and global convergence. SIAM Journal on Control and Optimization, 59(6):4081–4109, 2021a.
  76. Derivative-free policy optimization for linear risk-sensitive and robust control design: Implicit regularization and sample complexity. In Advances in Neural Information Processing Systems, pages 2949–2964, 2021b.
  77. Revisiting LQR control from the perspective of receding-horizon policy gradient. IEEE Control Systems Letters, 7:1664–1669, 2023.
  78. Learning the Kalman filter with fine-grained sample complexity. In American Control Conference, pages 4549–4554, 2023a.
  79. Global convergence of receding-horizon policy search in learning estimator designs. arXiv preprint arXiv:2309.04831, 2023b.
  80. Learning minimax-optimal terminal state estimators and smoothers. In 22nd IFAC World Congress, pages 12391–12396, 2023c.
  81. Provably efficient actor-critic for risk-sensitive and robust adversarial rl: A linear-quadratic case. In International Conference on Artificial Intelligence and Statistics, pages 2764–2772, 2021c.
  82. Primal-dual learning for the model-free risk-constrained linear quadratic regulator. In Learning for Dynamics and Control, pages 702–714, 2021.
  83. How are policy gradient methods affected by the limits of control? In Conference on Decision and Control, pages 5992–5999, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiangyuan Zhang (10 papers)
  2. Weichao Mao (11 papers)
  3. Saviz Mowlavi (16 papers)
  4. Mouhacine Benosman (28 papers)
  5. Tamer Başar (200 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com