Deployable Reinforcement Learning with Variable Control Rate (2401.09286v2)
Abstract: Deploying controllers trained with Reinforcement Learning (RL) on real robots can be challenging: RL relies on agents' policies being modeled as Markov Decision Processes (MDPs), which assume an inherently discrete passage of time. The use of MDPs results in that nearly all RL-based control systems employ a fixed-rate control strategy with a period (or time step) typically chosen based on the developer's experience or specific characteristics of the application environment. Unfortunately, the system should be controlled at the highest, worst-case frequency to ensure stability, which can demand significant computational and energy resources and hinder the deployability of the controller on onboard hardware. Adhering to the principles of reactive programming, we surmise that applying control actions only when necessary enables the use of simpler hardware and helps reduce energy consumption. We challenge the fixed frequency assumption by proposing a variant of RL with variable control rate. In this approach, the policy decides the action the agent should take as well as the duration of the time step associated with that action. In our new setting, we expand Soft Actor-Critic (SAC) to compute the optimal policy with a variable control rate, introducing the Soft Elastic Actor-Critic (SEAC) algorithm. We show the efficacy of SEAC through a proof-of-concept simulation driving an agent with Newtonian kinematics. Our experiments show higher average returns, shorter task completion times, and reduced computational resources when compared to fixed rate policies.
- Experience replay for real-time reinforcement learning control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(2): 201–212.
- Robust reinforcement learning-based autonomous driving agent for simulation and real world. In 2020 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.
- Reinforcement learning with random delays. In International conference on learning representations.
- Reactive control of autonomous drones. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, 207–219.
- VarLenMARL: A framework of variable-length time-step multi-agent reinforcement learning for cooperative charging in sensor networks. In 2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), 1–9. IEEE.
- An EKF-based fast tube MPC scheme for moving target tracking of a redundant underwater vehicle-manipulator system. IEEE/ASME Transactions on Mechatronics, 24(6): 2803–2814.
- Dietterich, T. G. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research, 13: 227–303.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, 1587–1596. PMLR.
- Learning to forget: Continual prediction with LSTM. Neural computation, 12(10): 2451–2471.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, 1861–1870. PMLR.
- Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
- RTMBA: A real-time model-based reinforcement learning architecture for robot control. In 2012 IEEE International Conference on Robotics and Automation, 85–90. IEEE.
- Texplore: real-time sample-efficient reinforcement learning for robots. Machine learning, 90: 385–429.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
- Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters, 2(4): 2096–2103.
- Pareto-optimality approach for flexible job-shop scheduling problems: hybridization of evolutionary algorithms and fuzzy logic. Mathematics and computers in simulation, 60(3-5): 245–276.
- Why tanh: choosing a sigmoidal function. In [Proceedings 1992] IJCNN International Joint Conference on Neural Networks, volume 4, 578–581. IEEE.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Adaptive interleaved reinforcement learning: Robust stability of affine nonlinear systems with unknown uncertainty. IEEE Transactions on Neural Networks and Learning Systems, 33(1): 270–280.
- Hierarchical reinforcement learning with advantage-based auxiliary rewards. Advances in Neural Information Processing Systems, 32.
- A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019.
- Setting up a reinforcement learning task with a real-world robot. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 4635–4640. IEEE.
- Pareto-optimal equilibrium points in non-cooperative multi-objective optimization problems. Expert Systems with Applications, 178: 114995.
- Neural networks: an introduction. Springer Science & Business Media.
- Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. In 2022 International Conference on Robotics and Automation (ICRA), 7477–7484. IEEE.
- Norris, J. R. 1998. Markov chains. 2. Cambridge university press.
- Time limits in reinforcement learning. In International Conference on Machine Learning, 4045–4054. PMLR.
- Real-time reinforcement learning. Advances in neural information processing systems, 32.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Learning to repeat: Fine grained action repetition for deep reinforcement learning. arXiv preprint arXiv:1702.06054.
- Mastering the game of Go with deep neural networks and tree search. nature, 529(7587): 484–489.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419): 1140–1144.
- System-wide hybrid MPC–PID control of a continuous pharmaceutical tablet manufacturing process via direct compaction. European Journal of Pharmaceutics and Biopharmaceutics, 85(3): 1164–1182.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12.
- tmrl. 2023. tmrl main page. https://github.com/trackmania-rl/tmrl.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 5026–5033. IEEE.
- Q-learning. Machine learning, 8: 279–292.
- Information theoretic MPC for model-based reinforcement learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), 1714–1721. IEEE.
- Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature, 602(7896): 223–228.
- Hierarchical deep reinforcement learning for continuous action control. IEEE transactions on neural networks and learning systems, 29(11): 5174–5184.
- Safe reinforcement learning using robust MPC. IEEE Transactions on Automatic Control, 66(8): 3638–3652.
- Recurrent neural network regularization. arXiv preprint arXiv:1409.2329.
- Deep reinforcement learning for power system applications: An overview. CSEE Journal of Power and Energy Systems, 6(1): 213–225.