Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LTL-Constrained Policy Optimization with Cycle Experience Replay (2404.11578v2)

Published 17 Apr 2024 in cs.LG, cs.AI, and cs.FL

Abstract: Linear Temporal Logic (LTL) offers a precise means for constraining the behavior of reinforcement learning agents. However, in many tasks, LTL is insufficient for task specification; LTL-constrained policy optimization, where the goal is to optimize a scalar reward under LTL constraints, is needed. Prior methods for this constrained problem are restricted to finite state spaces. In this work, we present Cycle Experience Replay (CyclER), a reward-shaping approach to this problem that allows continuous state and action spaces and the use of function approximations. CyclER guides a policy towards satisfaction by encouraging partial behaviors compliant with the LTL constraint, using the structure of the constraint. In doing so, it addresses the optimization challenges stemming from the sparse nature of LTL satisfaction. We evaluate CyclER in three continuous control domains. On these tasks, CyclER outperforms existing reward-shaping methods at finding performant and LTL-satisfying policies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. On the expressivity of markov reward. In Advances in Neural Information Processing Systems, 2021. URL https://proceedings.neurips.cc/paper/2021/file/4079016d940210b4ae9ae7d41c4a2065-Paper.pdf.
  2. Constrained policy optimization. In International conference on machine learning, pp.  22–31. PMLR, 2017.
  3. Q-learning for robust satisfaction of signal temporal logic specifications. In 55th IEEE Conference on Decision and Control, CDC 2016, Las Vegas, NV, USA, December 12-14, 2016, pp.  6565–6570. IEEE, 2016. doi: 10.1109/CDC.2016.7799279. URL https://doi.org/10.1109/CDC.2016.7799279.
  4. Altman, E. Constrained Markov decision processes. Routledge, 2021.
  5. A framework for transforming specifications in reinforcement learning. In Raskin, J., Chatterjee, K., Doyen, L., and Majumdar, R. (eds.), Principles of Systems Design - Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday, volume 13660 of Lecture Notes in Computer Science, pp.  604–624. Springer, 2022. doi: 10.1007/978-3-031-22337-2_29. URL https://doi.org/10.1007/978-3-031-22337-2_29.
  6. Policy synthesis and reinforcement learning for discounted LTL. In Enea, C. and Lal, A. (eds.), Computer Aided Verification - 35th International Conference, CAV 2023, Paris, France, July 17-22, 2023, Proceedings, Part I, volume 13964 of Lecture Notes in Computer Science, pp.  415–435. Springer, 2023. doi: 10.1007/978-3-031-37706-8_21. URL https://doi.org/10.1007/978-3-031-37706-8_21.
  7. Principles of model checking. The MIT Press, Cambridge, Mass, 2008. ISBN 978-0-262-02649-9.
  8. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp.  10349–10355. IEEE, 2020.
  9. Optimal probabilistic motion planning with potential infeasible ltl constraints. IEEE Transactions on Automatic Control, 68(1):301–316, 2021.
  10. Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications. IEEE Robotics and Automation Letters, 8(4):2158–2165, 2023.
  11. Ltl and beyond: Formal languages for reward function specification in reinforcement learning. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp.  6065–6073. International Joint Conferences on Artificial Intelligence Organization, 7 2019. doi: 10.24963/ijcai.2019/840. URL https://doi.org/10.24963/ijcai.2019/840.
  12. Optimal control of markov decision processes with linear temporal logic constraints. IEEE Transactions on Automatic Control, 59(5):1244–1257, May 2014. ISSN 0018-9286, 1558-2523. doi: 10.1109/TAC.2014.2298143.
  13. Robustness of temporal logic specifications for continuous-time signals. Theor. Comput. Sci., 410(42):4262–4291, 2009. doi: 10.1016/J.TCS.2009.06.021. URL https://doi.org/10.1016/j.tcs.2009.06.021.
  14. Probably approximately correct MDP learning and control with temporal logic constraints. In Fox, D., Kavraki, L. E., and Kurniawati, H. (eds.), Robotics: Science and Systems X, University of California, Berkeley, USA, July 12-16, 2014, 2014. doi: 10.15607/RSS.2014.X.039. URL http://www.roboticsproceedings.org/rss10/p39.html.
  15. Lazy probabilistic model checking without determinisation. arXiv preprint arXiv:1311.2928, 2013.
  16. Logically-constrained reinforcement learning, 2018. URL https://arxiv.org/abs/1801.08099.
  17. Deep reinforcement learning with temporal logics. In International Conference on Formal Modeling and Analysis of Timed Systems, pp.  1–22. Springer, 2020.
  18. Deep reinforcement learning with double q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp.  2094–2100. AAAI Press, 2016.
  19. Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73:173–208, 2022.
  20. Safety-gymnasium: A unified safe reinforcement learning benchmark. arXiv preprint arXiv:2310.12567, 2023.
  21. A composable specification language for reinforcement learning tasks. Advances in Neural Information Processing Systems, 32, 2019.
  22. Cost-optimal control of markov decision processes under signal temporal logic constraints. In 2021 Seventh Indian Control Conference (ICC), pp.  317–322, 2021. doi: 10.1109/ICC54714.2021.9703164.
  23. Owl: a library for ω𝜔\omegaitalic_ω-words, automata, and ltl. In International Symposium on Automated Technology for Verification and Analysis, pp.  543–550. Springer, 2018.
  24. Batch policy learning under constraints. In International Conference on Machine Learning, pp.  3703–3712. PMLR, 2019.
  25. Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.  3834–3839. IEEE, 2017.
  26. Reinforcement learning of control policy for linear temporal logic specifications using limit-deterministic generalized büchi automata. IEEE Control. Syst. Lett., 4(3):761–766, 2020. doi: 10.1109/LCSYS.2020.2980552. URL https://doi.org/10.1109/LCSYS.2020.2980552.
  27. A PAC learning algorithm for LTL and omega-regular objectives in mdps. CoRR, abs/2310.12248, 2023. doi: 10.48550/ARXIV.2310.12248. URL https://doi.org/10.48550/arXiv.2310.12248.
  28. Pnueli, A. The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pp.  46–57. ieee, 1977.
  29. A learning based approach to control synthesis of markov decision processes for linear temporal logic specifications. In 53rd IEEE Conference on Decision and Control, pp.  1091–1096. IEEE, 2014.
  30. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017. URL http://arxiv.org/abs/1707.06347.
  31. Limit-deterministic büchi automata for linear temporal logic. In Chaudhuri, S. and Farzan, A. (eds.), Computer Aided Verification, pp.  312–332, Cham, 2016. Springer International Publishing. ISBN 978-3-319-41540-6.
  32. Reward machines: Exploiting reward function structure in reinforcement learning. J. Artif. Int. Res., 73, may 2022. ISSN 1076-9757. doi: 10.1613/jair.1.12440. URL https://doi.org/10.1613/jair.1.12440.
  33. Ltl2action: Generalizing LTL instructions for multi-task RL. In Proceedings of the 38th International Conference on Machine Learning, ICML, volume 139 of Proceedings of Machine Learning Research, pp.  10497–10508, 2021. URL http://proceedings.mlr.press/v139/vaezipoor21a.html.
  34. Policy optimization with linear temporal logic constraints. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=yZcPRIZEwOG.
  35. Eventual discounting temporal logic counterfactual experience replay, 2023.
  36. Continuous motion planning with temporal logic specifications using deep neural networks. arXiv preprint arXiv:2004.02610, 2020.
  37. Robust control of uncertain markov decision processes with temporal logic specifications. In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp.  3372–3379, 2012. doi: 10.1109/CDC.2012.6426174.
  38. On the (in)tractability of reinforcement learning for ltl objectives. In Raedt, L. D. (ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp.  3650–3658. International Joint Conferences on Artificial Intelligence Organization, 7 2022. doi: 10.24963/ijcai.2022/507. URL https://doi.org/10.24963/ijcai.2022/507. Main Track.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ameesh Shah (9 papers)
  2. Cameron Voloshin (6 papers)
  3. Chenxi Yang (14 papers)
  4. Abhinav Verma (12 papers)
  5. Swarat Chaudhuri (61 papers)
  6. Sanjit A. Seshia (105 papers)
Citations (1)