Papers
Topics
Authors
Recent
2000 character limit reached

Dynamic Neighborhood Construction for Structured Large Discrete Action Spaces (2305.19891v4)

Published 31 May 2023 in cs.LG and cs.AI

Abstract: Large discrete action spaces (LDAS) remain a central challenge in reinforcement learning. Existing solution approaches can handle unstructured LDAS with up to a few million actions. However, many real-world applications in logistics, production, and transportation systems have combinatorial action spaces, whose size grows well beyond millions of actions, even on small instances. Fortunately, such action spaces exhibit structure, e.g., equally spaced discrete resource units. With this work, we focus on handling structured LDAS (SLDAS) with sizes that cannot be handled by current benchmarks: we propose Dynamic Neighborhood Construction (DNC), a novel exploitation paradigm for SLDAS. We present a scalable neighborhood exploration heuristic that utilizes this paradigm and efficiently explores the discrete neighborhood around the continuous proxy action in structured action spaces with up to $10{73}$ actions. We demonstrate the performance of our method by benchmarking it against three state-of-the-art approaches designed for large discrete action spaces across two distinct environments. Our results show that DNC matches or outperforms state-of-the-art approaches while being computationally more efficient. Furthermore, our method scales to action spaces that so far remained computationally intractable for existing methodologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Simulated annealing. Statistical Science, 8(1):10–15, 1993.
  2. Deep reinforcement learning for inventory control: A roadmap. European Journal of Operational Research, 298(2):401–412, 2022.
  3. A unified view of piecewise linear neural network verification. In Neural Information Processing Systems, 2017.
  4. Learning action representations for reinforcement learning. In International conference on machine learning, pp. 941–950. PMLR, 2019a.
  5. Lifelong learning with a changing action set. In AAAI Conference on Artificial Intelligence, 2019b.
  6. Online symbolic gradient-based optimization for factored action mdps. In International Joint Conference on Artificial Intelligence, pp.  3075–3081. IJCAI/AAAI Press, 2016.
  7. Lifted stochastic planning, belief propagation and marginal map. In AAAI Workshops, volume WS-18 of AAAI Technical Report, pp.  658–664. AAAI Press, 2018.
  8. Fast reinforcement learning with large action sets using error-correcting output codes for MDP factorization. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp.  180–194. Springer, 2012.
  9. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679, 2015.
  10. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning, 110(9):2419–2468, 2021.
  11. Hybrid multi-agent deep reinforcement learning for autonomous mobility on demand systems. In Nikolai Matni, Manfred Morari, and George J. Pappas (eds.), Learning for Dynamics and Control Conference, volume 211 of Proceedings of Machine Learning Research, pp.  1284–1296. PMLR, 2023.
  12. Local branching. Mathematical Programming, 98(1):23–47, 2003.
  13. Learning to navigate the synthetically accessible chemical space using reinforcement learning. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  3668–3679. PMLR, 2020.
  14. Learning pseudometric-based action representations for offline reinforcement learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  7902–7918. PMLR, 2022.
  15. Deep reinforcement learning with a natural language action space. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1621–1630, Berlin, Germany, 2016. Association for Computational Linguistics.
  16. Generalization to new actions in reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, 2020.
  17. Know your action set: Learning action relations for reinforcement learning. In International Conference on Learning Representations, 2022.
  18. Learning collaborative policies to solve np-hard routing problems. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  10418–10430. Curran Associates, Inc., 2021.
  19. Algorithms for Optimization. The MIT Press, 2019. ISBN 0262039427.
  20. Value function approximation in reinforcement learning using the fourier basis. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1):380–385, 2011.
  21. A deep reinforcement learning approach for traffic signal control optimization, 2021.
  22. Reinforcement learning in factored action spaces using tensor decompositions. ArXiv, abs/2110.14538, 2021.
  23. Playing atari with deep reinforcement learning, 2013.
  24. Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11):2227–2240, 2014.
  25. Continuous-discrete reinforcement learning for hybrid control in robotics. In Leslie Pack Kaelbling, Danica Kragic, and Komei Sugiura (eds.), Proceedings of the Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, pp.  735–751. PMLR, 2020.
  26. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Proceedings of the 33rd International Conference on Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  27. Generalized value functions for large action sets. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp.  1185–1192, 2011.
  28. Facmac: Factored multi-agent centralised policy gradients. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  12208–12221. Curran Associates, Inc., 2021.
  29. Jointly-learned state-action embedding for efficient reinforcement learning. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM ’21, pp.  1447–1456, New York, NY, USA, 2021. Association for Computing Machinery.
  30. Reinforcement learning with factored states and actions. J. Mach. Learn. Res., 5:1063–1088, 2004.
  31. Proximal policy optimization algorithms, 2017.
  32. Learning to factor policies and action-value functions: Factored action space representations for deep reinforcement learning. CoRR, abs/1705.07269, 2017.
  33. Deterministic policy gradient algorithms. In Eric P. Xing and Tony Jebara (eds.), Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pp.  387–395, Bejing, China, 2014. PMLR.
  34. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, USA, second edition, 2018.
  35. Leveraging factored action spaces for efficient offline reinforcement learning in healthcare. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
  36. Action branching architectures for deep reinforcement learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18. AAAI Press, 2018.
  37. The natural language of actions. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 6196–6205. PMLR, 2019.
  38. Motor primitive discovery. In 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), pp.  1–8, 2012.
  39. Hado van Hasselt and Marco A Wiering. Using continuous action spaces to solve discrete problems. 2009 International Joint Conference on Neural Networks, pp. 1149–1156, 2009.
  40. Wouter van Heeswijk and Han La Poutré. Deep reinforcement learning in linear discrete action spaces. In 2020 Winter Simulation Conference (WSC), pp.  1063–1074, 2020.
  41. The use of continuous action representations to scale deep reinforcement learning for inventory control. SSRN, 2022.
  42. Bic-ddpg: Bidirectionally-coordinated nets for deep multi-agent reinforcement learning. In Honghao Gao, Xinheng Wang, Muddesar Iqbal, Yuyu Yin, Jianwei Yin, and Ning Gu (eds.), Collaborative Computing: Networking, Applications and Worksharing, pp.  337–354, Cham, 2021. Springer International Publishing.
  43. Network slice reconfiguration by exploiting deep reinforcement learning with large action space. IEEE Transactions on Network and Service Management, 17(4):2197–2211, 2020.
  44. Dynamics-aware embeddings. In International Conference on Learning Representations, 2020.
  45. A spatial pyramid pooling-based deep reinforcement learning model for dynamic job-shop scheduling problem. Computers & Operations Research, 160:106401, 2023.
  46. Two-sided deep reinforcement learning for dynamic mobility-on-demand management with mixed autonomy. Transportation Science, 57(4):1019–1046, 2023.
  47. Learn what not to learn: Action elimination with deep reinforcement learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  48. Learning to dispatch for job shop scheduling via deep reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1621–1632. Curran Associates, Inc., 2020a.
  49. Generating adjacency-constrained subgoals in hierarchical reinforcement learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020b. Curran Associates Inc.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.