Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning (2402.02429v2)

Published 4 Feb 2024 in cs.LG

Abstract: As a marriage between offline RL and meta-RL, the advent of offline meta-reinforcement learning (OMRL) has shown great promise in enabling RL agents to multi-task and quickly adapt while acquiring knowledge safely. Among which, context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations. In this work, by examining several key milestones in the field of COMRL, we propose to integrate these seemingly independent methodologies into a unified framework. Most importantly, we show that the pre-existing COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds. Such theoretical insight offers ample design freedom for novel algorithms. As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures. This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
  2. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  3. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  4. Club: A contrastive log-ratio upper bound of mutual information. In International conference on machine learning, pages 1779–1788. PMLR, 2020.
  5. Offline meta reinforcement learning–identifiability challenges and effective data collection strategies. Advances in Neural Information Processing Systems, 34:4607–4618, 2021.
  6. Meta-q-learning. In International Conference on Learning Representations, 2020.
  7. Designing biological sequences via meta-reinforcement learning and bayesian optimization. arXiv preprint arXiv:2209.06259, 2022.
  8. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  9. Value penalized q-learning for recommender systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2008–2012, 2022.
  10. Context shift reduction for offline meta-reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  11. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
  12. Guidelines for reinforcement learning in healthcare. Nature medicine, 25(1):16–18, 2019.
  13. Improved training of wasserstein gans. Advances in neural information processing systems, 30, 2017.
  14. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  15. Contextual markov decision processes. arXiv preprint arXiv:1502.02259, 2015.
  16. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
  17. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
  18. Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. arXiv preprint arXiv:1907.00456, 2019.
  19. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
  20. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  21. D. P. Kingma and M. Welling. Auto-encoding variational bayes. In Y. Bengio and Y. LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. URL http://arxiv.org/abs/1312.6114.
  22. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, 32, 2019.
  23. Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021.
  24. Multi-game decision transformers. Advances in Neural Information Processing Systems, 35:27921–27936, 2022.
  25. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021.
  26. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  27. Multi-task batch reinforcement learning with metric learning. Advances in Neural Information Processing Systems, 33:6197–6210, 2020.
  28. Provably improved context-based offline meta-rl with attention and contrastive learning. arXiv preprint arXiv:2102.10774, 2021a.
  29. Focal: Efficient fully-offline meta-reinforcement learning via distance metric learning and behavior regularization. In International Conference on Learning Representations, 2021b.
  30. X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
  31. Meta-learning with temporal convolutions. arXiv preprint arXiv:1707.03141, 2(7):23, 2017.
  32. Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pages 7780–7791. PMLR, 2021.
  33. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  34. Offline meta-reinforcement learning with online self-supervision. In International Conference on Machine Learning, pages 17811–17829. PMLR, 2022.
  35. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  36. Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International conference on machine learning, pages 5331–5340. PMLR, 2019.
  37. A generalist agent. Transactions on Machine Learning Research, 2022.
  38. Universal value function approximators. In International conference on machine learning, pages 1312–1320. PMLR, 2015.
  39. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
  40. N. Tishby and N. Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE, 2015.
  41. Proceedings of the 37th annual allerton conference on communication, control and computing. In Proceedings ao the 37-th Annual Allerton Conference on Communication, Control and Computing, chapter The information bottleneck method, pages 368–377, 1999.
  42. The information bottleneck method. arXiv preprint physics/0004057, 2000.
  43. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
  44. L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  45. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
  46. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  47. Prompting decision transformer for few-shot policy generalization. In international conference on machine learning, pages 24631–24645. PMLR, 2022.
  48. R. W. Yeung. Information theory and network coding. Springer Science & Business Media, 2008.
  49. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020a.
  50. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 33:14129–14142, 2020b.
  51. H. Yuan and Z. Lu. Robust task representations for offline meta-reinforcement learning via contrastive learning. In International Conference on Machine Learning, pages 25747–25759. PMLR, 2022.
  52. How to fine-tune the model: Unified model shift and model bias policy optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  53. On the effectiveness of fine-tuning versus meta-reinforcement learning. Advances in Neural Information Processing Systems, 35:26519–26531, 2022a.
  54. Offline meta-reinforcement learning for industrial insertion. In 2022 International Conference on Robotics and Automation (ICRA), pages 6386–6393. IEEE, 2022b.
  55. Generalizable task representation learning for offline meta-reinforcement learning with data limitations. arXiv preprint arXiv:2312.15909, 2023.
  56. Fast context adaptation via meta-learning. In International Conference on Machine Learning, pages 7693–7702. PMLR, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Lanqing Li (21 papers)
  2. Hai Zhang (69 papers)
  3. Xinyu Zhang (296 papers)
  4. Shatong Zhu (2 papers)
  5. Junqiao Zhao (32 papers)
  6. Pheng-Ann Heng (196 papers)
  7. Yang Yu (385 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.