Reinforcement Learning with Elastic Time Steps (2402.14961v4)
Abstract: Traditional Reinforcement Learning (RL) policies are typically implemented with fixed control rates, often disregarding the impact of control rate selection. This can lead to inefficiencies as the optimal control rate varies with task requirements. We propose the Multi-Objective Soft Elastic Actor-Critic (MOSEAC), an off-policy actor-critic algorithm that uses elastic time steps to dynamically adjust the control frequency. This approach minimizes computational resources by selecting the lowest viable frequency. We show that MOSEAC converges and produces stable policies at the theoretical level, and validate our findings in a real-time 3D racing game. MOSEAC significantly outperformed other variable time step approaches in terms of energy efficiency and task effectiveness. Additionally, MOSEAC demonstrated faster and more stable training, showcasing its potential for real-world RL applications in robotics.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
- tmrl, “tmrl main page,” https://github.com/trackmania-rl/tmrl, 2023.
- T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
- P. R. Wurman, S. Barrett, K. Kawamoto, J. MacGlashan, K. Subramanian, T. J. Walsh, R. Capobianco, A. Devlic, F. Eckert, F. Fuchs et al., “Outracing champion gran turismo drivers with deep reinforcement learning,” Nature, vol. 602, no. 7896, pp. 223–228, 2022.
- J. Li, J. Ding, T. Chai, F. L. Lewis, and S. Jagannathan, “Adaptive interleaved reinforcement learning: Robust stability of affine nonlinear systems with unknown uncertainty,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 1, pp. 270–280, 2020.
- Anonymous, “Reinforcement learning with elastic time steps,” 2024. [Online]. Available: https://openreview.net/forum?id=riQmzq5FaQ
- D. Wang and G. Beltrame, “Deployable reinforcement learning with variable control rate,” arXiv preprint arXiv:2401.09286, 2024.
- S. Amin, M. Gomrokchi, H. Aboutalebi, H. Satija, and D. Precup, “Locally persistent exploration in continuous control tasks with sparse rewards,” arXiv preprint arXiv:2012.13658, 2020.
- S. Park, J. Kim, and G. Kim, “Time discretization-invariant safe action repetition for policy gradient methods,” Advances in Neural Information Processing Systems, vol. 34, pp. 267–279, 2021.
- Y. Bouteiller, S. Ramstedt, G. Beltrame, C. Pal, and J. Binas, “Reinforcement learning with random delays,” in International conference on learning representations, 2021.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning. PMLR, 2018, pp. 1861–1870.
- A. Karimi, J. Jin, J. Luo, A. R. Mahmood, M. Jagersand, and S. Tosatto, “Dynamic decision frequency with continuous options,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 7545–7552.
- S. Sharma, A. Srinivas, and B. Ravindran, “Learning to repeat: Fine grained action repetition for deep reinforcement learning,” arXiv preprint arXiv:1702.06054, 2017.
- A. M. Metelli, F. Mazzolini, L. Bisi, L. Sabbioni, and M. Restelli, “Control frequency adaptation via action persistence in batch reinforcement learning,” in International Conference on Machine Learning. PMLR, 2020, pp. 6862–6873.
- J. Lee, B.-J. Lee, and K.-E. Kim, “Reinforcement learning for control with multiple frequencies,” Advances in Neural Information Processing Systems, vol. 33, pp. 3254–3264, 2020.
- T. G. Dietterich, “Hierarchical reinforcement learning with the maxq value function decomposition,” Journal of artificial intelligence research, vol. 13, pp. 227–303, 2000.
- U. T. 2023, “Trackmania main page,” https://www.ubisoft.com/en-us/game/trackmania/trackmania, 2023.
- Y. Chen, H. Wu, Y. Liang, and G. Lai, “Varlenmarl: A framework of variable-length time-step multi-agent reinforcement learning for cooperative charging in sensor networks,” in 2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 2021, pp. 1–9.
- S. Nasiriany, H. Liu, and Y. Zhu, “Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 7477–7484.
- F. Pardo, A. Tavakoli, V. Levdik, and P. Kormushev, “Time limits in reinforcement learning,” in International Conference on Machine Learning. PMLR, 2018, pp. 4045–4054.
- Z. Zhang, D. Zhang, and R. C. Qiu, “Deep reinforcement learning for power system applications: An overview,” CSEE Journal of Power and Energy Systems, vol. 6, no. 1, pp. 213–225, 2019.
- Z. Yang, K. Merrick, L. Jin, and H. A. Abbass, “Hierarchical deep reinforcement learning for continuous action control,” IEEE transactions on neural networks and learning systems, vol. 29, no. 11, pp. 5174–5184, 2018.
- Raspberrypi, “Raspberrypi main page,” https://www.raspberrypi.com/, 2023.
- E. Bregu, N. Casamassima, D. Cantoni, L. Mottola, and K. Whitehouse, “Reactive control of autonomous drones,” in Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, 2016, pp. 207–219.
- gymnasium, “Gymnasium main page,” https://gymnasium.farama.org/content/basic_usage/, 2023.
- Openplanet, “Openplanet main page,” https://openplanet.dev/, 2023.
- I. J. Balaban, “An optimal algorithm for finding segments intersections,” in Proceedings of the eleventh annual symposium on Computational geometry, 1995, pp. 211–219.
- Rtgym, “Rtgym main page,” https://github.com/yannbouteiller/rtgym/releases, 2023.
- K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015.
- T. K. Kim, “T test as a parametric statistic,” Korean journal of anesthesiology, vol. 68, no. 6, pp. 540–546, 2015.
- L. M. Lix, J. C. Keselman, and H. J. Keselman, “Consequences of assumption violations revisited: A quantitative review of alternatives to the one-way analysis of variance f test,” Review of educational research, vol. 66, no. 4, pp. 579–619, 1996.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.