Reduced Policy Optimization for Continuous Control with Hard Constraints (2310.09574v2)
Abstract: Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints. RPO partitions actions into basic actions and nonbasic actions following the GRG method and outputs the basic actions via a policy network. Subsequently, RPO calculates the nonbasic actions by solving equations based on equality constraints using the obtained basic actions. The policy network is then updated by implicitly differentiating nonbasic actions with respect to basic actions. Additionally, we introduce an action projection procedure based on the reduced gradient and apply a modified Lagrangian relaxation technique to ensure inequality constraints are satisfied. To the best of our knowledge, RPO is the first attempt that introduces GRG to RL as a way of efficiently handling both equality and inequality hard constraints. It is worth noting that there is currently a lack of RL environments with complex hard constraints, which motivates us to develop three new benchmarks: two robotics manipulation tasks and a smart grid operation control task. With these benchmarks, RPO achieves better performance than previous constrained RL algorithms in terms of both cumulative reward and constraint violation. We believe RPO, along with the new benchmarks, will open up new opportunities for applying RL to real-world problems with complex constraints.
- Electricity - u.s. energy information administration (eia). https://www.eia.gov/electricity/. Accessed November 10, 2022 [online].
- Nord pool. https://www.nordpoolgroup.com/. Accessed November 10, 2022 [online].
- Jean Abadie. Generalization of the wolfe reduced gradient method to the case of nonlinear constraints. Optimization, pages 37–47, 1969.
- Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
- Differentiable convex optimization layers. In Advances in Neural Information Processing Systems, 2019.
- Dima Waleed Hanna Alrabadi. Portfolio optimization using the generalized reduced gradient nonlinear algorithm: An application to amman stock exchange. International Journal of Islamic and Middle Eastern Finance and Management, 9(4):570–582, 2016.
- Eitan Altman. Constrained Markov decision processes: stochastic modeling. Routledge, 1999.
- Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning, pages 136–145. PMLR, 2017.
- Safe reinforcement learning via statistical model predictive shielding. In Robotics: Science and Systems, pages 1–13, 2021.
- Resource constrained deep reinforcement learning. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 29, pages 610–620, 2019.
- Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623, 2019.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1):6070–6120, 2017.
- A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018.
- Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031, 2019.
- Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
- An optimal load flow study by the generalized reduced gradient approach. Electric Power Systems Research, 10(1):47–53, 1986.
- Dc3: A learning method for optimization with hard constraints. In International Conference on Learning Representations, 2020.
- Razvan V Florian. Correct equations for the dynamics of the cart-pole system. Center for Cognitive and Neural Studies (Coneural), Romania, 2007.
- Learning to walk in the real world with minimal human effort. In Conference on Robot Learning, pages 1110–1120. PMLR, 2021.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Accelerated primal-dual policy optimization for safe reinforcement learning. arXiv preprint arXiv:1802.06480, 2018.
- Continuous control with deep reinforcement learning. In ICLR (Poster), 2016.
- Robot reinforcement learning on the constraint manifold. In Conference on Robot Learning, pages 1357–1366. PMLR, 2022.
- Ipo: Interior-point policy optimization under constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4940–4947, 2020.
- Constrained variational policy optimization for safe reinforcement learning. In International Conference on Machine Learning, pages 13644–13668. PMLR, 2022.
- Linear and nonlinear programming, volume 2. Springer, 1984.
- Feasible actor-critic: Constrained reinforcement learning for ensuring statewise safety. arXiv preprint arXiv:2105.10682, 2021.
- Optlayer-practical constrained optimization for deep reinforcement learning in the real world. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 6236–6243. IEEE, 2018.
- Optimal power flow management for grid connected pv systems with batteries. IEEE Transactions on sustainable energy, 2(3):309–320, 2011.
- A generalized reduced gradient method for the optimal control of very-large-scale robotic systems. IEEE transactions on robotics, 33(5):1226–1232, 2017.
- Solving online threat screening games using constrained action space reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 2226–2235, 2020.
- Feasibility constrained online calculation for real-time optimal power flow: A convex constrained deep reinforcement learning approach. IEEE Transactions on Power Systems, 2022.
- Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
- Penalized proximal policy optimization for safe reinforcement learning. arXiv preprint arXiv:2205.11814, 2022.
- Model predictive control for smart grids with multiple electric-vehicle charging stations. IEEE Transactions on Smart Grid, 10(2):2127–2136, 2018.
- Estimation of settling velocity using generalized reduced gradient (grg) and hybrid generalized reduced gradient–genetic algorithm (hybrid grg-ga). Acta Geophysica, 70(5):2487–2497, 2022.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, pages 9133–9143. PMLR, 2020.
- Alternating differentiation for optimization layers. International Conference on Learning Representations, 2023.
- Reinforcement learning: An introduction. MIT press, 2018.
- Reward constrained policy optimization. In International Conference on Learning Representations, 2018.
- Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3):4915–4922, 2021.
- Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pages 9797–9806. PMLR, 2020.
- Safe exploration and optimization of constrained mdps using gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pages 36593–36604. PMLR, 2023.
- A hybrid data-driven method for fast solution of security-constrained optimal power flow. IEEE Transactions on Power Systems, 2022.
- Constrained update projection approach to safe policy optimization. arXiv preprint arXiv:2209.07089, 2022.
- Projection-based constrained policy optimization. In International Conference on Learning Representations, 2019.
- Evaluating model-free reinforcement learning toward safety-critical tasks. arXiv preprint arXiv:2212.05727, 2022.
- First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338–15349, 2020.
- A data-driven method for fast ac optimal power flow solutions via deep reinforcement learning. Journal of Modern Power Systems and Clean Energy, 8(6):1128–1139, 2020.
- Shutong Ding (8 papers)
- Jingya Wang (68 papers)
- Yali Du (63 papers)
- Ye Shi (51 papers)