On Representation Complexity of Model-based and Model-free Reinforcement Learning (2310.01706v2)
Abstract: We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity. We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with polynomial size, while the optimal $Q$-function suffers an exponential circuit complexity in constant-depth circuits. By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective: in some cases, the ground-truth rule (model) of the environment is simple to represent, while other quantities, such as $Q$-function, appear complex. We empirically corroborate our theory by comparing the approximation error of the transition kernel, reward function, and optimal $Q$-function in various Mujoco environments, which demonstrates that the approximation errors of the transition kernel and reward function are consistently lower than those of the optimal $Q$-function. To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.
- Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, 2019.
- Optimality and approximation with policy gradient methods in Markov decision processes. In Conference on Learning Theory, pages 64–66. PMLR, 2020.
- Computational complexity: a modern approach. Cambridge University Press, 2009.
- Near-optimal regret bounds for reinforcement learning. Advances in neural information processing systems, 21, 2008.
- Minimax regret bounds for reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 263–272. JMLR. org, 2017.
- Approximation analysis of convolutional neural networks. work, 65, 2014.
- Lukas Biewald. Experiment tracking with weights and biases. Software available from wandb.com, 2:233, 2020.
- The complexity of finite functions. In Algorithms and complexity, pages 757–804. Elsevier, 1990.
- Openai gym, 2016.
- Sample-efficient reinforcement learning with stochastic ensemble value expansion. Advances in neural information processing systems, 31, 2018.
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems, 31, 2018.
- Model-based reinforcement learning via meta-policy optimization. In Conference on Robot Learning, pages 617–629. PMLR, 2018.
- Regret bounds for robust adaptive control of the linear quadratic regulator. Advances in Neural Information Processing Systems, 31, 2018.
- On the sample complexity of the linear quadratic regulator. Foundations of Computational Mathematics, 20(4):633–679, 2020.
- On the expressivity of neural networks for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2020a.
- On the expressivity of neural networks for deep reinforcement learning. In International conference on machine learning, pages 2627–2637. PMLR, 2020b.
- Model predictive control using neural networks. IEEE Control Systems Magazine, 15(5):61–66, 1995.
- Task-agnostic dynamics priors for deep reinforcement learning. In International Conference on Machine Learning, pages 1696–1705. PMLR, 2019.
- Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101, 2018.
- Parity, circuits, and the polynomial-time hierarchy. Mathematical systems theory, 17(1):13–27, 1984.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- John Hastad. Almost optimal lower bounds for small depth circuits. In Proceedings of the eighteenth annual ACM symposium on Theory of computing, pages 6–20, 1986.
- Learning continuous control policies by stochastic value gradients. Advances in neural information processing systems, 28, 2015.
- Going beyond linear rl: Sample efficient neural function approximation. Advances in Neural Information Processing Systems, 34:8968–8983, 2021.
- When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32, 2019.
- Is Q-learning provably efficient? In Advances in Neural Information Processing Systems, pages 4863–4873, 2018.
- Provably efficient reinforcement learning with linear function approximation. arXiv preprint arXiv:1907.05388, 2019.
- SM Kakade. On the sample complexity of reinforcement learning. PhD thesis, University of London, 2003.
- Richard Karp. Turing machines that take advice. Enseign. Math., 28:191–209, 1982.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Reinforcement learning in robotics: A survey. Springer Tracts in Advanced Robotics, 97:9–67, 2014.
- F Thomson Leighton. Introduction to parallel algorithms and architectures: Arrays· trees· hypercubes. Elsevier, 2014.
- Guided policy search. In International conference on machine learning, pages 1–9. PMLR, 2013.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021.
- The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30, 2017.
- Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. arXiv preprint arXiv:1807.03858, 2018.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
- Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
- Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE international conference on robotics and automation (ICRA), pages 7559–7566. IEEE, 2018.
- Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning, pages 1101–1112. PMLR, 2020.
- Value prediction network. Advances in neural information processing systems, 30, 2017.
- Probabilistic planning with sequential monte carlo methods. In International Conference on Learning Representations, 2018.
- Epopt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283, 2016.
- Optimal conservative offline rl with general function approximation via augmented lagrangian. arXiv preprint arXiv:2211.00716, 2022.
- James B Rawlings. Tutorial overview of model predictive control. IEEE control systems magazine, 20(3):38–52, 2000.
- Alexander A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mathematical notes of the Academy of Sciences of the USSR, 41:333–338, 1987. URL https://api.semanticscholar.org/CorpusID:121744639.
- Alexander A Razborov. On the method of approximations. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 167–176, 1989.
- John E Savage. Computational work and time on finite machines. Journal of the ACM (JACM), 19(4):660–674, 1972.
- Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
- Claude E Shannon. The synthesis of two-terminal switching circuits. The Bell System Technical Journal, 28(1):59–98, 1949.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
- The predictron: End-to-end learning and planning. In International Conference on Machine Learning, pages 3191–3199. PMLR, 2017.
- Convergence results for single-step on-policy reinforcement-learning algorithms. Machine learning, 38:287–308, 2000.
- Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 77–82, 1987.
- Roman Smolensky. On representations by low-degree polynomials. In Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science, pages 130–138. IEEE, 1993.
- PAC model-free reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pages 881–888, 2006.
- Model-based rl in contextual decision processes: Pac bounds and exponential improvements over model-free approaches. In Conference on learning theory, pages 2898–2933. PMLR, 2019.
- Boris A Trakhtenbrot. A survey of russian approaches to perebor (brute-force searches) algorithms. Annals of the History of Computing, 6(4):384–400, 1984.
- The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. arXiv preprint arXiv:1812.03565, 2018.
- The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint. In Conference on Learning Theory, pages 3036–3083, 2019.
- Leslie G Valiant. On non-linear lower bounds in computational complexity. In Proceedings of the seventh annual ACM symposium on Theory of computing, pages 45–53, 1975.
- Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Heribert Vollmer. Introduction to circuit complexity: a uniform approach. Springer Science & Business Media, 1999.
- Provably efficient reinforcement learning with general value function approximation. arXiv preprint arXiv:2005.10804, 2020.
- Exploring model-based planning with policy networks. arXiv preprint arXiv:1906.08649, 2019.
- Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203, 2017.
- Reinforcement learning in healthcare: A survey. In arXiv preprint arXiv:1908.08796, 2020.
- A survey of autonomous driving: Common practices and emerging technologies. IEEE access, 8:58443–58469, 2020.
- Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds, 2019.
- Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence. SIAM Journal on Optimization, 33(2):1061–1091, 2023.
- Provably efficient offline goal-conditioned reinforcement learning with general function approximation and single-policy concentrability, 2023.
- Importance weighted actor-critic for optimal conservative offline reinforcement learning. arXiv preprint arXiv:2301.12714, 2023a.
- Provably efficient reinforcement learning via surprise bound. In International Conference on Artificial Intelligence and Statistics, pages 4006–4032. PMLR, 2023b.