2000 character limit reached
Model-based Reinforcement Learning for Parameterized Action Spaces (2404.03037v3)
Published 3 Apr 2024 in cs.LG and cs.AI
Abstract: We propose a novel model-based reinforcement learning algorithm -- Dynamics Learning and predictive control with Parameterized Actions (DLPA) -- for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. We theoretically quantify the difference between the generated trajectory and the optimal trajectory during planning in terms of the value they achieved through the lens of Lipschitz Continuity. Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and asymptotic performance than state-of-the-art PAMDP methods.
- Maximum a posteriori policy optimisation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Lipschitz continuity in model-based reinforcement learning. In Dy, J. G. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML, 2018, volume 80 of Proceedings of Machine Learning Research, pp. 264–273. PMLR, 2018.
- Multi-pass q-networks for deep reinforcement learning with parameterised action spaces. ArXiv, abs/1905.04388, 2019.
- Information theoretic model predictive q-learning. In L4DC, volume 120 of Proceedings of Machine Learning Research, pp. 840–850. PMLR, 2020.
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp. 4759–4770, 2018.
- Hidden parameter markov decision processes: A semiparametric regression approach for discovering latent task parametrizations. In Kambhampati, S. (ed.), Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pp. 1432–1440. IJCAI/AAAI Press, 2016.
- Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. CoRR, abs/1812.00568, 2018.
- Hybrid actor-critic reinforcement learning in parameterized action space. In IJCAI, 2019.
- Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces. In IJCAI, 2019.
- Performance bounds for model and policy transfer in hidden-parameter mdps. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023a.
- Meta-learning parameterized skills. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 10461–10481. PMLR, 2023b.
- Addressing function approximation error in actor-critic methods. In Dy, J. G. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp. 1582–1591. PMLR, 2018.
- Model predictive control: Theory and practice - a survey. Autom., 25:335–348, 1989.
- Deepmdp: Learning continuous latent space models for representation learning. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, volume 97 of Proceedings of Machine Learning Research, pp. 2170–2179. PMLR, 2019.
- Recurrent world models facilitate policy evolution. In NeurIPS, pp. 2455–2467, 2018.
- Learning latent dynamics for planning from pixels. In ICML, volume 97 of Proceedings of Machine Learning Research, pp. 2555–2565. PMLR, 2019.
- Dream to control: Learning behaviors by latent imagination. In ICLR. OpenReview.net, 2020.
- Mastering atari with discrete world models. In ICLR. OpenReview.net, 2021.
- Mastering diverse domains through world models. CoRR, abs/2301.04104, 2023.
- Temporal difference learning for model predictive control. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp. 8387–8406. PMLR, 2022.
- Half field offense: An environment for multiagent learning and ad hoc teamwork. 2016a. URL https://api.semanticscholar.org/CorpusID:501883.
- Deep reinforcement learning in parameterized action space. In Bengio, Y. and LeCun, Y. (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016b.
- When to trust your model: Model-based policy optimization. In NeurIPS, pp. 12498–12509, 2019.
- Model based reinforcement learning for atari. In ICLR. OpenReview.net, 2020.
- Robust and efficient transfer learning with hidden parameter markov decision processes. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 6250–6261, 2017.
- Auto-encoding variational bayes. In Bengio, Y. and LeCun, Y. (eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
- Hyar: Addressing discrete-continuous action reinforcement learning via hybrid action representation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Continuous control with deep reinforcement learning. In Bengio, Y. and LeCun, Y. (eds.), 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.
- Plan online, learn offline: Efficient learning and exploration via model-based control. In ICLR (Poster). OpenReview.net, 2019.
- Reinforcement learning with parameterized actions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
- Human-level control through deep reinforcement learning. Nat., 518(7540):529–533, 2015.
- Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In ICRA, pp. 7559–7566. IEEE, 2018.
- Continuous-discrete reinforcement learning for hybrid control in robotics. ArXiv, abs/2001.00449, 2019.
- Temporal predictive coding for model-based planning in latent space. In ICML, volume 139 of Proceedings of Machine Learning Research, pp. 8130–8139. PMLR, 2021.
- Variational inference MPC for bayesian model-based reinforcement learning. In Kaelbling, L. P., Kragic, D., and Sugiura, K. (eds.), 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, volume 100 of Proceedings of Machine Learning Research, pp. 258–272. PMLR, 2019.
- Policy gradient in lipschitz markov decision processes. Mach. Learn., 100(2-3):255–283, 2015.
- Temporal difference models: Model-free deep RL for model-based control. In ICLR (Poster). OpenReview.net, 2018.
- On the locality of action domination in sequential decision making. In International Symposium on Artificial Intelligence and Mathematics, ISAIM 2010, 2010, 2010.
- Rubinstein, R. Y. Optimization of computer simulation models with rare events. European Journal of Operational Research, 99:89–112, 1997.
- Mastering atari, go, chess and shogi by planning with a learned model. Nat., 588(7839):604–609, 2020.
- Trust region policy optimization. In Bach, F. R. and Blei, D. M. (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pp. 1889–1897. JMLR.org, 2015.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
- Planning to explore via self-supervised world models. In ICML, volume 119 of Proceedings of Machine Learning Research, pp. 8583–8592. PMLR, 2020.
- Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Porter, B. W. and Mooney, R. J. (eds.), Machine Learning, Proceedings of the Seventh International Conference on Machine Learning, Austin, Texas, USA, June 21-23, 1990, pp. 216–224. Morgan Kaufmann, 1990.
- Model predictive path integral control using covariance variable importance sampling. CoRR, abs/1509.01149, 2015.
- Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space. ArXiv, abs/1810.06394, 2018.
- MOPO: model-based offline policy optimization. In NeurIPS, 2020.
- SOLAR: deep structured latent representations for model-based reinforcement learning. CoRR, abs/1808.09105, 2018.
- Renhao Zhang (2 papers)
- Haotian Fu (22 papers)
- Yilin Miao (1 paper)
- George Konidaris (71 papers)