Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning (2312.15863v1)

Published 26 Dec 2023 in cs.LG, cs.AI, cs.RO, cs.SY, and eess.SY

Abstract: Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at the patch level, whereas the deciding one pays attention to \emph{the decision-making} by conditioning on the history of the desired returns, the perceiver's outputs, and the actions. Such a network design is generally applicable to a lot of deep RL settings, e.g., both the online and offline RL algorithms under environments with either image observations, proprioception observations, or hybrid image-language observations. Extensive experiments show that PDiT can not only achieve superior performance than strong baselines in different settings but also extract explainable feature representations. Our code is available at \url{https://github.com/maohangyu/PDiT}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Gosthipaty Aritra Roy and Paul Sayak. 2023. Investigating Vision Transformer representations. https://keras.io/examples/vision/probing_vits/
  2. ViViT: A Video Vision Transformer. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 6816–6826. https://doi.org/10.1109/ICCV48922.2021.00676
  3. CoBERL: Contrastive BERT for Reinforcement Learning. In International Conference on Learning Representations.
  4. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817 (2022).
  5. Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13734–13744.
  6. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems 34 (2021), 15084–15097.
  7. André Correia and Luís A Alexandre. 2022. Hierarchical Decision Transformer. arXiv preprint arXiv:2209.10447 (2022).
  8. Catformer: Designing stable transformers via sensitivity analysis. In International Conference on Machine Learning. PMLR, 2489–2499.
  9. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  10. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).
  11. RvS: What is Essential for Offline RL via Supervised Learning?. In International Conference on Learning Representations.
  12. Richard Evans and Jim Gao. 2016. Deepmind AI reduces Google data centre cooling bill by 40%. DeepMind blog 20 (2016), 158.
  13. Contrastive learning as goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems 35 (2022), 35603–35620.
  14. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 [cs.LG]
  15. PyTorch library for CAM methods. https://github.com/jacobgil/pytorch-grad-cam.
  16. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104 (2023).
  17. Transformer in transformer. Advances in Neural Information Processing Systems 34 (2021), 15908–15919.
  18. Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. In 2015 aaai fall symposium series.
  19. On Transforming Reinforcement Learning by Transformer: The Development Trajectory. arXiv preprint arXiv:2212.14164 (2022).
  20. Graph Decision Transformer. arXiv preprint arXiv:2303.03747 (2023).
  21. Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261–2269. https://doi.org/10.1109/CVPR.2017.243
  22. Reinforcement learning as one big sequence modeling problem. In ICML 2021 Workshop on Unsupervised Reinforcement Learning.
  23. Vilt: Vision-and-language transformer without convolution or region supervision. In International Conference on Machine Learning. PMLR, 5583–5594.
  24. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
  25. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
  26. From word embeddings to document distances. In International conference on machine learning. PMLR, 957–966.
  27. Multi-game decision transformers. Advances in Neural Information Processing Systems 35 (2022), 27921–27936.
  28. Uni-perceiver v2: A generalist model for large-scale vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2691–2700.
  29. A Survey on Transformers in Reinforcement Learning. arXiv preprint arXiv:2301.03044 (2023).
  30. DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition. In Computer Vision – ECCV 2022, Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer Nature Switzerland, Cham, 577–595.
  31. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
  32. Transformer in Transformer as Backbone for Deep Reinforcement Learning. arXiv preprint arXiv:2212.14538 (2022).
  33. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
  34. Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 7424–7433. https://proceedings.mlr.press/v119/ota20a.html
  35. Stabilizing transformers for reinforcement learning. In International conference on machine learning. PMLR, 7487–7498.
  36. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177 (2019).
  37. A generalist agent. arXiv preprint arXiv:2205.06175 (2022).
  38. Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347
  39. Rutav M Shah and Vikash Kumar. 2021. RRL: Resnet as representation for Reinforcement Learning. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 9465–9476. https://proceedings.mlr.press/v139/shah21a.html
  40. StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX. Springer, 462–479.
  41. D2rl: Deep dense architectures in reinforcement learning. arXiv preprint arXiv:2010.09163 (2020).
  42. Matthijs TJ Spaan. 2012. Partially observable Markov decision processes. In Reinforcement Learning. Springer, 387–414.
  43. Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning. PMLR, 9870–9879.
  44. Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
  45. Attention is all you need. Advances in neural information processing systems 30 (2017).
  46. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019).
  47. Cascade Transformers for End-to-End Person Search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7267–7276.
  48. Transformer in Reinforcement Learning for Decision-Making: A Survey. (2023).
  49. Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems. arXiv preprint arXiv:2305.07856 (2023).
Citations (7)

Summary

We haven't generated a summary for this paper yet.