Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency (2205.13476v2)

Published 26 May 2022 in cs.LG, cs.AI, cs.SY, eess.SY, and stat.ML

Abstract: Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges. (i) It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon. (ii) The observation and state spaces are often continuous, which induces a sample complexity that scales exponentially with the extrinsic dimension. Addressing such challenges requires learning a minimal but sufficient representation of the observation and state histories by exploiting the structure of the POMDP. To this end, we propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.~(i) For each step, ETC learns to represent the state with a low-dimensional feature, which factorizes the transition kernel. (ii) Across multiple steps, ETC learns to represent the full history with a low-dimensional embedding, which assembles the per-step feature. We integrate (i) and (ii) in a unified framework that allows a variety of estimators (including maximum likelihood estimators and generative adversarial networks). For a class of POMDPs with a low-rank structure in the transition kernel, ETC attains an $O(1/\epsilon2)$ sample complexity that scales polynomially with the horizon and the intrinsic dimension (that is, the rank). Here $\epsilon$ is the optimality gap. To our best knowledge, ETC is the first sample-efficient algorithm that bridges representation learning and policy optimization in POMDPs with infinite observation and state spaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Flambe: Structural complexity and representation learning of low rank MDPs. Advances in Neural Information Processing Systems.
  2. A method of moments for mixture models and hidden markov models. In Conference on Learning Theory. JMLR Workshop and Conference Proceedings.
  3. Model-based reinforcement learning with value-targeted regression. In International Conference on Machine Learning.
  4. Reinforcement learning of POMDPs using spectral methods. In Conference on Learning Theory.
  5. Provably efficient exploration in policy optimization. In International Conference on Machine Learning.
  6. Sample-efficient reinforcement learning for POMDPs with linear function approximations. arXiv preprint arXiv:2204.09787.
  7. Learning to control partially observed systems with finite memory. arXiv preprint arXiv:2202.09753.
  8. Information-theoretic considerations in batch reinforcement learning. In International Conference on Machine Learning.
  9. Learning for control from multiple demonstrations. In International Conference on Machine Learning.
  10. Provable reinforcement learning with a short-term memory. arXiv preprint arXiv:2202.03983.
  11. Empirical Processes in M-estimation, vol. 6. Cambridge university press.
  12. Dynamical variational autoencoders: A comprehensive review. arXiv preprint arXiv:2008.12595.
  13. Measuring statistical dependence with Hilbert-Schmidt norms. In International conference on algorithmic learning theory. Springer.
  14. A PAC RL algorithm for episodic POMDPs. In Artificial Intelligence and Statistics.
  15. Deep recurrent Q-learning for partially observable MDPs. In 2015 aaai fall symposium series.
  16. Supervised learning for dynamical system learning. Advances in Neural Information Processing Systems.
  17. Sample-efficient reinforcement learning of undercomplete POMDPs. arXiv preprint arXiv:2006.12484.
  18. A short note on concentration inequalities for random vectors with subgaussian norm. arXiv preprint arXiv:1902.03736.
  19. Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory.
  20. Recurrent reinforcement learning: A hybrid approach. arXiv preprint arXiv:1509.03044.
  21. When is partially observable reinforcement learning not scary? arXiv preprint arXiv:2204.08967.
  22. Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673.
  23. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
  24. Human-level control through deep reinforcement learning. nature, 518 529–533.
  25. Model-free representation learning and exploration in low-rank MDPs. arXiv preprint arXiv:2102.07035.
  26. Munos, R. (2003). Error bounds for approximate policy iteration. In ICML, vol. 3.
  27. The complexity of Markov decision processes. Mathematics of operations research, 12 441–450.
  28. A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems, 27 1–51.
  29. Mastering the game of go with deep neural networks and tree search. nature, 529 484–489.
  30. Mastering the game of go without human knowledge. nature, 550 354–359.
  31. A Hilbert space embedding for distributions. In International Conference on Algorithmic Learning Theory.
  32. Sondik, E. J. (1971). The optimal control of partially observable Markov processes. Stanford University.
  33. Learning to filter with predictive state inference machines. In International Conference on Machine Learning.
  34. Representation learning for online and offline RL in low-rank MDPs. In International Conference on Learning Representations.
  35. On the computational complexity of stochastic controller optimization in POMDPs. ACM Transactions on Computation Theory (TOCT), 4 1–8.
  36. Zhang, T. (2006). From ϵitalic-ϵ\epsilonitalic_ϵ-entropy to KL-entropy: Analysis of minimum information complexity density estimation. The Annals of Statistics, 34 2180–2210.
Citations (16)

Summary

We haven't generated a summary for this paper yet.