Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes (2310.11684v3)

Published 18 Oct 2023 in cs.LG, cs.AI, and quant-ph

Abstract: This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent's engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm that harnesses quantum signals acquired by the agent through efficient quantum mean estimation techniques. Through thorough theoretical analysis, we demonstrate that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm achieves a regret bound of $\tilde{\mathcal{O}}(1)$, a significant improvement over the $\tilde{\mathcal{O}}(\sqrt{T})$ bound exhibited by classical counterparts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Reinforcement learning: Theory and algorithms.  CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, 32.
  2. Reinforcement learning for joint optimization of multiple rewards.  Journal of Machine Learning Research, 24(49), 1–41.
  3. Multi-objective reinforcement learning with non-linear scalarization.  In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 9–17.
  4. Concave utility reinforcement learning with zero-constraint violations.  Transactions on Machine Learning Research.
  5. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds.  Advances in Neural Information Processing Systems, 30.
  6. Deeppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning.  IEEE Transactions on Intelligent Transportation Systems, 20(12), 4714–4727.
  7. Near-optimal regret bounds for reinforcement learning.  Advances in neural information processing systems, 21.
  8. Minimax regret bounds for reinforcement learning.  In International Conference on Machine Learning, pp. 263–272. PMLR.
  9. Quantum machine learning.  Nature, 549(7671), 195–202.
  10. Decision making in monopoly using a hybrid deep reinforcement learning approach.  IEEE Transactions on Emerging Topics in Computational Intelligence, 6(6), 1335–1344.
  11. Quantum speedups for zero-sum games via improved dynamic gibbs sampling.  In International Conference on Machine Learning, pp. 2932–2952. PMLR.
  12. Quantum amplitude amplification and estimation.  Contemporary Mathematics, 305, 53–74.
  13. Quantum bandits.  Quantum Machine Intelligence, 2, 1–7.
  14. Deepfreight: A model-free deep-reinforcement-learning-based algorithm for multi-transfer freight delivery.  In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 31, pp. 510–518.
  15. Near-optimal quantum algorithms for multivariate mean estimation.  In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pp. 33–43.
  16. Advances in quantum reinforcement learning.  In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 282–287. IEEE.
  17. A quantum approximate optimization algorithm.  arXiv preprint arXiv:1411.4028.
  18. Efficient bias-span-constrained exploration-exploitation in reinforcement learning.  In International Conference on Machine Learning, pp. 1578–1586. PMLR.
  19. Quantum computing provides exponential regret improvement in episodic reinforcement learning.  arXiv preprint arXiv:2302.08617.
  20. Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics.  In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pp. 193–204.
  21. Grover, L. K. (1996). A fast quantum mechanical algorithm for database search.  In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pp. 212–219.
  22. Hamoudi, Y. (2021). Quantum sub-gaussian mean estimator.  arXiv preprint arXiv:2108.12172.
  23. Quantum algorithm for linear systems of equations.  Physical review letters, 103(15), 150502.
  24. Near-optimal regret bounds for reinforcement learning.  Journal of Machine Learning Research, 11, 1563–1600.
  25. Quantum policy gradient algorithms.  arXiv e-prints, arXiv–2212.
  26. Quantum enhancements for deep reinforcement learning in large spaces.  PRX Quantum, 2(1), 010328.
  27. Kitaev, A. Y. (1995). Quantum measurements and the abelian stabilizer problem.  arXiv preprint quant-ph/9511026.
  28. Quantum algorithms for supervised and unsupervised machine learning.  arXiv preprint arXiv:1307.0411.
  29. Montanaro, A. (2015). Quantum speedup of monte carlo methods.  Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2181), 20150301.
  30. Quantum computation and quantum information. Cambridge university press.
  31. (more) efficient reinforcement learning via posterior sampling.  Advances in Neural Information Processing Systems, 26.
  32. Quantum speedup for active learning agents.  Physical Review X, 4(3), 031002.
  33. Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
  34. Online convex optimization in adversarial markov decision processes.  In International Conference on Machine Learning, pp. 5478–5486. PMLR.
  35. Mastering the game of go without human knowledge.  nature, 550(7676), 354–359.
  36. Reinforcement learning: An introduction. MIT press.
  37. Quantum algorithms for reinforcement learning with a generative model.  In International Conference on Machine Learning, pp. 10916–10926. PMLR.
  38. Quantum exploration algorithms for multi-armed bandits.  In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, pp. 10102–10110.
  39. Inequalities for the l1 deviation of the empirical distribution.  Hewlett-Packard Labs, Tech. Rep.
  40. Quantum policy iteration via amplitude estimation and grover search–towards quantum advantage for reinforcement learning.  Transactions on Machine Learning Research.
  41. Quantum heavy-tailed bandits.  arXiv preprint arXiv:2301.09680.

Summary

We haven't generated a summary for this paper yet.