Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning (2402.02429v2)
Abstract: As a marriage between offline RL and meta-RL, the advent of offline meta-reinforcement learning (OMRL) has shown great promise in enabling RL agents to multi-task and quickly adapt while acquiring knowledge safely. Among which, context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations. In this work, by examining several key milestones in the field of COMRL, we propose to integrate these seemingly independent methodologies into a unified framework. Most importantly, we show that the pre-existing COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds. Such theoretical insight offers ample design freedom for novel algorithms. As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures. This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning.
- A survey of meta-reinforcement learning. arXiv preprint arXiv:2301.08028, 2023.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- Club: A contrastive log-ratio upper bound of mutual information. In International conference on machine learning, pages 1779–1788. PMLR, 2020.
- Offline meta reinforcement learning–identifiability challenges and effective data collection strategies. Advances in Neural Information Processing Systems, 34:4607–4618, 2021.
- Meta-q-learning. In International Conference on Learning Representations, 2020.
- Designing biological sequences via meta-reinforcement learning and bayesian optimization. arXiv preprint arXiv:2209.06259, 2022.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
- Value penalized q-learning for recommender systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2008–2012, 2022.
- Context shift reduction for offline meta-reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
- Guidelines for reinforcement learning in healthcare. Nature medicine, 25(1):16–18, 2019.
- Improved training of wasserstein gans. Advances in neural information processing systems, 30, 2017.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- Contextual markov decision processes. arXiv preprint arXiv:1502.02259, 2015.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
- Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
- Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. arXiv preprint arXiv:1907.00456, 2019.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- D. P. Kingma and M. Welling. Auto-encoding variational bayes. In Y. Bengio and Y. LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. URL http://arxiv.org/abs/1312.6114.
- Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, 32, 2019.
- Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021.
- Multi-game decision transformers. Advances in Neural Information Processing Systems, 35:27921–27936, 2022.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Multi-task batch reinforcement learning with metric learning. Advances in Neural Information Processing Systems, 33:6197–6210, 2020.
- Provably improved context-based offline meta-rl with attention and contrastive learning. arXiv preprint arXiv:2102.10774, 2021a.
- Focal: Efficient fully-offline meta-reinforcement learning via distance metric learning and behavior regularization. In International Conference on Learning Representations, 2021b.
- X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
- Meta-learning with temporal convolutions. arXiv preprint arXiv:1707.03141, 2(7):23, 2017.
- Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pages 7780–7791. PMLR, 2021.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Offline meta-reinforcement learning with online self-supervision. In International Conference on Machine Learning, pages 17811–17829. PMLR, 2022.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Efficient off-policy meta-reinforcement learning via probabilistic context variables. In International conference on machine learning, pages 5331–5340. PMLR, 2019.
- A generalist agent. Transactions on Machine Learning Research, 2022.
- Universal value function approximators. In International conference on machine learning, pages 1312–1320. PMLR, 2015.
- Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
- N. Tishby and N. Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE, 2015.
- Proceedings of the 37th annual allerton conference on communication, control and computing. In Proceedings ao the 37-th Annual Allerton Conference on Communication, Control and Computing, chapter The information bottleneck method, pages 368–377, 1999.
- The information bottleneck method. arXiv preprint physics/0004057, 2000.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
- L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
- Prompting decision transformer for few-shot policy generalization. In international conference on machine learning, pages 24631–24645. PMLR, 2022.
- R. W. Yeung. Information theory and network coding. Springer Science & Business Media, 2008.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020a.
- Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 33:14129–14142, 2020b.
- H. Yuan and Z. Lu. Robust task representations for offline meta-reinforcement learning via contrastive learning. In International Conference on Machine Learning, pages 25747–25759. PMLR, 2022.
- How to fine-tune the model: Unified model shift and model bias policy optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- On the effectiveness of fine-tuning versus meta-reinforcement learning. Advances in Neural Information Processing Systems, 35:26519–26531, 2022a.
- Offline meta-reinforcement learning for industrial insertion. In 2022 International Conference on Robotics and Automation (ICRA), pages 6386–6393. IEEE, 2022b.
- Generalizable task representation learning for offline meta-reinforcement learning with data limitations. arXiv preprint arXiv:2312.15909, 2023.
- Fast context adaptation via meta-learning. In International Conference on Machine Learning, pages 7693–7702. PMLR, 2019.
- Lanqing Li (21 papers)
- Hai Zhang (69 papers)
- Xinyu Zhang (296 papers)
- Shatong Zhu (2 papers)
- Junqiao Zhao (32 papers)
- Pheng-Ann Heng (196 papers)
- Yang Yu (385 papers)