Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Representation Complexity of Model-based and Model-free Reinforcement Learning (2310.01706v2)

Published 3 Oct 2023 in cs.LG

Abstract: We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity. We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with polynomial size, while the optimal $Q$-function suffers an exponential circuit complexity in constant-depth circuits. By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective: in some cases, the ground-truth rule (model) of the environment is simple to represent, while other quantities, such as $Q$-function, appear complex. We empirically corroborate our theory by comparing the approximation error of the transition kernel, reward function, and optimal $Q$-function in various Mujoco environments, which demonstrates that the approximation errors of the transition kernel and reward function are consistently lower than those of the optimal $Q$-function. To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, 2019.
  2. Optimality and approximation with policy gradient methods in Markov decision processes. In Conference on Learning Theory, pages 64–66. PMLR, 2020.
  3. Computational complexity: a modern approach. Cambridge University Press, 2009.
  4. Near-optimal regret bounds for reinforcement learning. Advances in neural information processing systems, 21, 2008.
  5. Minimax regret bounds for reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 263–272. JMLR. org, 2017.
  6. Approximation analysis of convolutional neural networks. work, 65, 2014.
  7. Lukas Biewald. Experiment tracking with weights and biases. Software available from wandb.com, 2:233, 2020.
  8. The complexity of finite functions. In Algorithms and complexity, pages 757–804. Elsevier, 1990.
  9. Openai gym, 2016.
  10. Sample-efficient reinforcement learning with stochastic ensemble value expansion. Advances in neural information processing systems, 31, 2018.
  11. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems, 31, 2018.
  12. Model-based reinforcement learning via meta-policy optimization. In Conference on Robot Learning, pages 617–629. PMLR, 2018.
  13. Regret bounds for robust adaptive control of the linear quadratic regulator. Advances in Neural Information Processing Systems, 31, 2018.
  14. On the sample complexity of the linear quadratic regulator. Foundations of Computational Mathematics, 20(4):633–679, 2020.
  15. On the expressivity of neural networks for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2020a.
  16. On the expressivity of neural networks for deep reinforcement learning. In International conference on machine learning, pages 2627–2637. PMLR, 2020b.
  17. Model predictive control using neural networks. IEEE Control Systems Magazine, 15(5):61–66, 1995.
  18. Task-agnostic dynamics priors for deep reinforcement learning. In International Conference on Machine Learning, pages 1696–1705. PMLR, 2019.
  19. Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101, 2018.
  20. Parity, circuits, and the polynomial-time hierarchy. Mathematical systems theory, 17(1):13–27, 1984.
  21. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  22. John Hastad. Almost optimal lower bounds for small depth circuits. In Proceedings of the eighteenth annual ACM symposium on Theory of computing, pages 6–20, 1986.
  23. Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems, 28, 2015.
  24. Going beyond linear rl: Sample efficient neural function approximation. Advances in Neural Information Processing Systems, 34:8968–8983, 2021.
  25. When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32, 2019.
  26. Is Q-learning provably efficient? In Advances in Neural Information Processing Systems, pages 4863–4873, 2018.
  27. Provably efficient reinforcement learning with linear function approximation. arXiv preprint arXiv:1907.05388, 2019.
  28. SM Kakade. On the sample complexity of reinforcement learning. PhD thesis, University of London, 2003.
  29. Richard Karp. Turing machines that take advice. Enseign. Math., 28:191–209, 1982.
  30. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  31. Reinforcement learning in robotics: A survey. Springer Tracts in Advanced Robotics, 97:9–67, 2014.
  32. F Thomson Leighton. Introduction to parallel algorithms and architectures: Arrays· trees· hypercubes. Elsevier, 2014.
  33. Guided policy search. In International conference on machine learning, pages 1–9. PMLR, 2013.
  34. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  35. Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021.
  36. The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30, 2017.
  37. Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. arXiv preprint arXiv:1807.03858, 2018.
  38. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  39. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  40. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
  41. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
  42. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 7559–7566. IEEE, 2018.
  43. Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning, pages 1101–1112. PMLR, 2020.
  44. Value prediction network. Advances in neural information processing systems, 30, 2017.
  45. Probabilistic planning with sequential monte carlo methods. In International Conference on Learning Representations, 2018.
  46. Epopt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283, 2016.
  47. Optimal conservative offline rl with general function approximation via augmented lagrangian. arXiv preprint arXiv:2211.00716, 2022.
  48. James B Rawlings. Tutorial overview of model predictive control. IEEE control systems magazine, 20(3):38–52, 2000.
  49. Alexander A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical notes of the Academy of Sciences of the USSR, 41:333–338, 1987. URL https://api.semanticscholar.org/CorpusID:121744639.
  50. Alexander A Razborov. On the method of approximations. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 167–176, 1989.
  51. John E Savage. Computational work and time on finite machines. Journal of the ACM (JACM), 19(4):660–674, 1972.
  52. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
  53. Claude E Shannon. The synthesis of two-terminal switching circuits. The Bell System Technical Journal, 28(1):59–98, 1949.
  54. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
  55. The predictron: End-to-end learning and planning. In International Conference on Machine Learning, pages 3191–3199. PMLR, 2017.
  56. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine learning, 38:287–308, 2000.
  57. Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 77–82, 1987.
  58. Roman Smolensky. On representations by low-degree polynomials. In Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science, pages 130–138. IEEE, 1993.
  59. PAC model-free reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pages 881–888, 2006.
  60. Model-based rl in contextual decision processes: Pac bounds and exponential improvements over model-free approaches. In Conference on learning theory, pages 2898–2933. PMLR, 2019.
  61. Boris A Trakhtenbrot. A survey of russian approaches to perebor (brute-force searches) algorithms. Annals of the History of Computing, 6(4):384–400, 1984.
  62. The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. arXiv preprint arXiv:1812.03565, 2018.
  63. The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Conference on Learning Theory, pages 3036–3083, 2019.
  64. Leslie G Valiant. On non-linear lower bounds in computational complexity. In Proceedings of the seventh annual ACM symposium on Theory of computing, pages 45–53, 1975.
  65. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  66. Heribert Vollmer. Introduction to circuit complexity: a uniform approach. Springer Science & Business Media, 1999.
  67. Provably efficient reinforcement learning with general value function approximation. arXiv preprint arXiv:2005.10804, 2020.
  68. Exploring model-based planning with policy networks. arXiv preprint arXiv:1906.08649, 2019.
  69. Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203, 2017.
  70. Reinforcement learning in healthcare: A survey. In arXiv preprint arXiv:1908.08796, 2020.
  71. A survey of autonomous driving: Common practices and emerging technologies. IEEE access, 8:58443–58469, 2020.
  72. Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds, 2019.
  73. Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence. SIAM Journal on Optimization, 33(2):1061–1091, 2023.
  74. Provably efficient offline goal-conditioned reinforcement learning with general function approximation and single-policy concentrability, 2023.
  75. Importance weighted actor-critic for optimal conservative offline reinforcement learning. arXiv preprint arXiv:2301.12714, 2023a.
  76. Provably efficient reinforcement learning via surprise bound. In International Conference on Artificial Intelligence and Statistics, pages 4006–4032. PMLR, 2023b.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com