Stochastic Q-learning for Large Discrete Action Spaces (2405.10310v1)
Abstract: In complex environments with large discrete action spaces, effective decision-making is critical in reinforcement learning (RL). Despite the widespread use of value-based RL approaches like Q-learning, they come with a computational burden, necessitating the maximization of a value function over all actions in each iteration. This burden becomes particularly challenging when addressing large-scale problems and using deep neural networks as function approximators. In this paper, we present stochastic value-based RL approaches which, in each iteration, as opposed to optimizing over the entire set of $n$ actions, only consider a variable stochastic set of a sublinear number of actions, possibly as small as $\mathcal{O}(\log(n))$. The presented stochastic value-based RL methods include, among others, Stochastic Q-learning, StochDQN, and StochDDQN, all of which integrate this stochastic approach for both value-function updates and action selection. The theoretical convergence of Stochastic Q-learning is established, while an analysis of stochastic maximization is provided. Moreover, through empirical validation, we illustrate that the various proposed approaches outperform the baseline methods across diverse environments, including different control problems, achieving near-optimal average returns in significantly reduced time.
- Handling large discrete action spaces via dynamic neighborhood construction. arXiv preprint arXiv:2305.19891, 2023.
- Deeppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 20(12):4714–4727, 2019.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Online symbolic gradient-based optimization for factored action mdps. In IJCAI, pp. 3075–3081, 2016.
- Reinforcement learning with combinatorial actions: An application to vehicle routing. Advances in Neural Information Processing Systems, 33:609–620, 2020.
- Fast reinforcement learning with large action sets using error-correcting output codes for mdp factorization. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 180–194. Springer, 2012.
- Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679, 2015.
- Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning, 110(9):2419–2468, 2021.
- Hybrid multi-agent deep reinforcement learning for autonomous mobility on demand systems. In Learning for Dynamics and Control Conference, pp. 1284–1296. PMLR, 2023.
- Growing action spaces. In International Conference on Machine Learning, pp. 3040–3051. PMLR, 2020.
- Artificial intelligence for satellite communication: A review. Intelligent and Converged Networks, 2(3):213–243, 2021.
- Randomized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback. In International Conference on Artificial Intelligence and Statistics, pp. 7455–7471. PMLR, 2023.
- Federated combinatorial multi-agent multi-armed bandits. arXiv preprint arXiv:2405.05950, 2024a.
- Combinatorial stochastic-greedy bandit. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 12052–12060, 2024b.
- Asap: A semi-autonomous precise system for telesurgery during communication delays. IEEE Transactions on Medical Robotics and Bionics, 5(1):66–78, 2023.
- A distributed model-free ride-sharing approach for joint matching, pricing, and dispatching using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 22(12):7931–7942, 2021.
- Hasselt, H. Double q-learning. Advances in neural information processing systems, 23, 2010.
- Deep reinforcement learning with an unbounded action space. arXiv preprint arXiv:1511.04636, 5, 2015.
- Deep reinforcement learning with a natural language action space. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1621–1630, Berlin, Germany, August 2016a. Association for Computational Linguistics. doi: 10.18653/v1/P16-1153. URL https://aclanthology.org/P16-1153.
- Deep reinforcement learning with a combinatorial action space for predicting popular Reddit threads. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1838–1848, Austin, Texas, November 2016b. Association for Computational Linguistics. doi: 10.18653/v1/D16-1189. URL https://aclanthology.org/D16-1189.
- Stable baselines. https://github.com/hill-a/stable-baselines, 2018.
- Revalued: Regularised ensemble value-decomposition for factorisable markov decision processes. arXiv preprint arXiv:2401.08850, 2024.
- Convergence of stochastic iterative dynamic programming algorithms. Advances in neural information processing systems, 6, 1993.
- Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293, 2018.
- Learning collaborative policies to solve np-hard routing problems. Advances in Neural Information Processing Systems, 34:10418–10430, 2021.
- Reinforcement learning as classification: Leveraging modern classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 424–431, 2003.
- Maxmin q-learning: Controlling the estimation bias of q-learning. arXiv preprint arXiv:2002.06487, 2020.
- Combining decision making and trajectory planning for lane changing using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 23(9):16110–16136, 2022.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Applications of deep reinforcement learning in communications and networking: A survey. IEEE Communications Surveys & Tutorials, 21(4):3133–3174, 2019.
- Reinforcement learning in factored action spaces using tensor decompositions. arXiv preprint arXiv:2110.14538, 2021.
- Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research, 134:105400, 2021.
- Discrete sequential prediction of continuous actions for deep rl. arXiv preprint arXiv:1705.05035, 2017.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp. 1928–1937. PMLR, 2016.
- Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814, 2010.
- Generalized value functions for large action sets. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1185–1192, 2011.
- Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34:12208–12221, 2021.
- Deep reinforcement learning for vision-based robotic grasping: A simulated comparative evaluation of off-policy methods. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6284–6291. IEEE, 2018.
- Rubinstein, R. The cross-entropy method for combinatorial and continuous optimization. Methodology and computing in applied probability, 1:127–190, 1999.
- On-line Q-learning using connectionist systems, volume 37. University of Cambridge, Department of Engineering Cambridge, UK, 1994.
- Reinforcement learning with factored states and actions. The Journal of Machine Learning Research, 5:1063–1088, 2004.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Is bang-bang control all you need? solving continuous control with bernoulli policies. Advances in Neural Information Processing Systems, 34:27209–27221, 2021.
- Solving continuous control via q-learning. arXiv preprint arXiv:2210.12566, 2022.
- Reinforcement learning: An introduction. MIT press, 2018.
- Discretizing continuous action space for on-policy optimization. In Proceedings of the aaai conference on artificial intelligence, volume 34, pp. 5981–5988, 2020.
- Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI conference on Artificial Intelligence, volume 32, 2018.
- The natural language of actions. In International Conference on Machine Learning, pp. 6196–6205. PMLR, 2019.
- Action assembly: Sparse imitation learning for text based games with combinatorial action spaces. arXiv preprint arXiv:1905.09700, 2019.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. IEEE, 2012.
- Q-learning in enormous action spaces via amortized approximate maximization. arXiv preprint arXiv:2001.08116, 2020.
- Using continuous action spaces to solve discrete problems. In 2009 International Joint Conference on Neural Networks, pp. 1149–1156. IEEE, 2009.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Bic-ddpg: Bidirectionally-coordinated nets for deep multi-agent reinforcement learning. In International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 337–354. Springer, 2020.
- Adaptive ensemble q-learning: Minimizing estimation bias via error feedback. Advances in Neural Information Processing Systems, 34:24778–24790, 2021.
- Deep reinforcement learning: a survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Q-learning. Machine learning, 8:279–292, 1992.
- Learn what not to learn: Action elimination with deep reinforcement learning. Advances in neural information processing systems, 31, 2018.
- Generating adjacency-constrained subgoals in hierarchical reinforcement learning. Advances in Neural Information Processing Systems, 33:21579–21590, 2020.
- Weighted double q-learning. In IJCAI, pp. 3455–3461, 2017.
- Fares Fourati (12 papers)
- Vaneet Aggarwal (222 papers)
- Mohamed-Slim Alouini (524 papers)