Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformers in Reinforcement Learning: A Survey (2307.05979v1)

Published 12 Jul 2023 in cs.LG, cs.AI, and cs.CV

Abstract: Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, LLMing, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (219)
  1. Goal-Directed Story Generation: Augmenting Generative Language Models with Reinforcement Learning. CoRR abs/2112.08593 (2021).
  2. Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes 11, 2 (2023), 349.
  3. Ali Allahverdi. 2016. A survey of scheduling problems with no-wait in process. Eur. J. Oper. Res. 255, 3 (2016), 665–686.
  4. Fadi AlMahamid and Katarina Grolinger. 2021. Reinforcement Learning Algorithms: An Overview and Classification. In 34th IEEE Canadian Conference on Electrical and Computer Engineering, CCECE 2021. IEEE, 1–7.
  5. All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL. CoRR abs/2202.11960 (2022).
  6. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 34, 6 (2017), 26–38.
  7. Tackling the Credit Assignment Problem in Reinforcement Learning-Induced Pedagogical Policies with Neural Networks. In Artificial Intelligence in Education - 22nd International Conference, AIED, Utrecht, Netherlands, Vol. 12748. Springer, 356–368.
  8. Why Attentions May Not Be Interpretable?. In KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,. ACM, 25–34.
  9. CoBERL: Contrastive BERT for Reinforcement Learning. In The Tenth International Conference on Learning Representations, ICLR.
  10. Andrew G. Barto and Sridhar Mahadevan. 2003. Recent Advances in Hierarchical Reinforcement Learning. Discret. Event Dyn. Syst. 13, 1-2 (2003), 41–77.
  11. Interpretable, Verifiable, and Robust Reinforcement Learning via Program Synthesis. In xxAI - Beyond Explainable AI - International Workshop, Held in Conjunction with ICML, Vol. 13200. Springer, 207–228.
  12. Relational inductive biases, deep learning, and graph networks. CoRR abs/1806.01261 (2018).
  13. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract). In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, Qiang Yang and Michael J. Wooldridge (Eds.). AAAI Press, 4148–4152. http://ijcai.org/Abstract/15/585
  14. Contextualize Me - The Case for Context in Reinforcement Learning. CoRR abs/2202.04500 (2022).
  15. James Bergstra and Yoshua Bengio. 2012. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 13 (2012), 281–305.
  16. On the Computational Power of Transformers and Its Implications in Sequence Modeling. In Proceedings of the 24th Conference on Computational Natural Language Learning, CoNLL 2020, Online, November 19-20, 2020. Association for Computational Linguistics, 455–475.
  17. Transfer learning with causal counterfactual reasoning in Decision Transformers. CoRR abs/2110.14355 (2021).
  18. Language Models are Few-Shot Learners. (2020).
  19. VisualHints: A Visual-Lingual Environment for Multimodal Reinforcement Learning. CoRR abs/2010.13839 (2020).
  20. Learning To Explore Using Active Neural SLAM. In 8th International Conference on Learning Representations, ICLR.
  21. Transformer Interpretability Beyond Attention Visualization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. Computer Vision Foundation / IEEE, 782–791.
  22. TransDreamer: Reinforcement Learning with Transformer World Models. CoRR abs/2202.09481 (2022).
  23. Decision Transformer: Reinforcement Learning via Sequence Modeling. (2021), 15084–15097.
  24. A Deep Reinforcement Learning Framework Based on an Attention Mechanism and Disjunctive Graph Embedding for the Job-Shop Scheduling Problem. IEEE Trans. Ind. Informatics 19, 2 (2023), 1322–1331.
  25. Renlong Chen and Ying Tan. 2023. Credit assignment with predictive contribution measurement in multi-agent reinforcement learning. Neural Networks 164 (2023), 681–690. https://doi.org/10.1016/j.neunet.2023.05.021
  26. Deep Reinforcement Learning for Internet of Things: A Comprehensive Survey. IEEE Commun. Surv. Tutorials 23, 3 (2021), 1659–1692.
  27. DxFormer: A Decoupled Automatic Diagnostic System Based on Decoder-Encoder Transformer with Dense Symptom Representations. CoRR abs/2205.03755 (2022).
  28. Encoding Musical Style with Transformer Autoencoders. In Proceedings of the 37th International Conference on Machine Learning, ICML, Vol. 119. PMLR, 1899–1908.
  29. Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning. CoRR abs/2208.02294 (2022).
  30. Meshed-Memory Transformer for Image Captioning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. Computer Vision Foundation / IEEE, 10575–10584.
  31. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL. Association for Computational Linguistics, 2978–2988.
  32. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases. In Proceedings of the 38th ICML, Vol. 139. PMLR, 2286–2296.
  33. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.
  34. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018. IEEE, 5884–5888.
  35. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR.
  36. Vijay Prakash Dwivedi and Xavier Bresson. 2020. A Generalization of Transformer Networks to Graphs. CoRR abs/2012.09699 (2020).
  37. Deep Transformer Q-Networks for Partially Observable Reinforcement Learning. CoRR abs/2206.01078 (2022).
  38. Reward modeling for mitigating toxicity in transformer-based language models. Appl. Intell. 53, 7 (2023), 8421–8435.
  39. Roy Featherstone. 2014. Rigid body dynamics algorithms. Springer.
  40. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th ICML, Vol. 70. PMLR, 1126–1135.
  41. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In NeurIPS. 2137–2145.
  42. Object Memory Transformer for Object Goal Navigation. arXiv preprint arXiv:2203.14708 (2022).
  43. Dibya Ghosh and Marc G. Bellemare. 2020. Representations for Stable Off-Policy Reinforcement Learning. In Proceedings of the 37th ICML, Vol. 119. PMLR, 3556–3565.
  44. A Survey on Interpretable Reinforcement Learning. CoRR abs/2112.13112 (2021).
  45. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010 (JMLR Proceedings, Vol. 9), Yee Whye Teh and D. Mike Titterington (Eds.). JMLR.org, 249–256. http://proceedings.mlr.press/v9/glorot10a.html
  46. Naveen Kumar Gondhi and Ayushi Gupta. 2017. Survey on machine learning based scheduling in cloud computing. In Proceedings of the 2017 International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence. 57–61.
  47. Generative Adversarial Networks. CoRR abs/1406.2661 (2014).
  48. Unsupervised Learning of Temporal Abstractions With Slot-Based Transformers. Neural Comput. 35, 4 (2023), 593–626.
  49. Manuel Goulão and Arlindo L. Oliveira. 2022. Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning. CoRR abs/2209.10901 (2022).
  50. Sven Gronauer and Klaus Diepold. 2022. Multi-agent deep reinforcement learning: a survey. Artif. Intell. Rev. 55, 2 (2022), 895–943.
  51. Benjamin Guedj. 2019. A Primer on PAC-Bayesian Learning. CoRR abs/1901.05353 (2019).
  52. HAMMER: Multi-Level Coordination of Reinforcement Learning Agents via Learned Messaging. CoRR abs/2102.00824 (2021).
  53. David Ha and Jürgen Schmidhuber. 2018. World Models. CoRR abs/1803.10122 (2018). arXiv:1803.10122 http://arxiv.org/abs/1803.10122
  54. Mastering Atari with Discrete World Models. In 9th International Conference on Learning Representations, ICLR.
  55. Martin B. Haugh and Andrew W. Lo. 2001. Computational challenges in portfolio management. Comput. Sci. Eng. 3, 3 (2001), 54–59.
  56. GPT3-to-plan: Extracting plans from text using GPT-3. CoRR abs/2106.07131 (2021).
  57. On Inductive Biases in Deep Reinforcement Learning. CoRR abs/1907.02908 (2019).
  58. Explainability in deep reinforcement learning. Knowl. Based Syst. 214 (2021), 106685.
  59. Le Trung Hieu. 2020. Deep Reinforcement Learning for Stock Portfolio Optimization. CoRR abs/2012.06325 (2020).
  60. Axial Attention in Multidimensional Transformers. CoRR abs/1912.12180 (2019).
  61. Sepp Hochreiter. 1998. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6, 2 (1998), 107–116.
  62. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (1997), 1735–1780.
  63. UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers. CoRR abs/2101.08001 (2021).
  64. One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control. In Proceedings of the 37th International Conference on Machine Learning, ICML, Vol. 119. PMLR, 4455–4464.
  65. Improving Transformer Optimization Through Better Initialization. In Proceedings of the 37th International Conference on Machine Learning, ICML, Vol. 119. PMLR, 4475–4483.
  66. Optimizing Agent Behavior over Long Time Scales by Transporting Value. CoRR abs/1810.06721 (2018).
  67. The act of remembering: a study in partially observable reinforcement learning. CoRR abs/2010.01753 (2020).
  68. Perceiver IO: A General Architecture for Structured Inputs & Outputs. (2022).
  69. Offline Reinforcement Learning as One Big Sequence Modeling Problem. In NeurIPS. 1273–1286.
  70. Selective Token Generation for Few-shot Natural Language Generation. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022. International Committee on Computational Linguistics, 5837–5856.
  71. Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. In 2nd Annual Conference on Robot Learning, CoRL 2018, Zürich, Vol. 87. PMLR, 651–673.
  72. Scaling Laws for Neural Language Models. CoRR abs/2001.08361 (2020).
  73. Eshagh Kargar and Ville Kyrki. 2021. Vision Transformer for Learning Driving Policies in Complex Multi-Agent Environments. CoRR abs/2109.06514 (2021).
  74. Eshagh Kargar and Ville Kyrki. 2022. Vision Transformer for Learning Driving Policies in Complex and Dynamic Environments. In IEEE Intelligent Vehicles Symposium. IEEE, 1558–1564.
  75. Large-Scale Video Classification with Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 1725–1732.
  76. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. In Proceedings of the 37th ICML, Vol. 119. PMLR, 5156–5165.
  77. On The Computational Complexity of Self-Attention. In International Conference on Algorithmic Learning Theory, Vol. 201. PMLR, 597–619.
  78. Transformer-Based Value Function Decomposition for Cooperative Multi-Agent Reinforcement Learning in StarCraft. (2022), 113–119.
  79. Transformers in Vision: A Survey. CoRR abs/2101.01169 (2021).
  80. Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning. Trans. Mach. Learn. Res. 2023 (2023).
  81. Preference Transformer: Modeling Human Preferences using Transformers for RL. CoRR abs/2303.00957 (2023).
  82. Diederik P. Kingma and Max Welling. 2019. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn. 12, 4 (2019), 307–392. https://doi.org/10.1561/2200000056
  83. Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Trans. Intell. Transp. Syst. 23, 6 (2022), 4909–4926.
  84. Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning. In 10th International Conference on Learning Representations, ICLR.
  85. My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control. In 9th International Conference on Learning Representations, ICLR.
  86. Building machines that learn and think like people. Behavioral and Brain Sciences 40 (2016).
  87. Comparing Deep Reinforcement Learning Algorithms’ Ability to Safely Navigate Challenging Waters. Frontiers Robotics AI 8 (2021), 738113.
  88. CURL: Contrastive Unsupervised Representations for Reinforcement Learning. In Proceedings of the 37th ICML, Vol. 119. PMLR, 5639–5650.
  89. Neural network based reinforcement learning for audio-visual gaze control in human-robot interaction. Pattern Recognit. Lett. 118 (2019), 61–71.
  90. A survey on deep reinforcement learning for audio-based applications. Artif. Intell. Rev. 56, 3 (2023), 2193–2240.
  91. Multi-Game Decision Transformers. In NeurIPS.
  92. State representation learning for control: An overview. Neural Networks 108 (2018), 379–392.
  93. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. CoRR abs/2005.01643 (2020).
  94. Transformer-based Objective-reinforced Generative Adversarial Network to Generate Desired Molecules. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI. ijcai.org, 3884–3890.
  95. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting. In NeurIPS. 5244–5254.
  96. Learning to Navigate in Interactive Environments with the Transformer-based Memory. (2022).
  97. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings.
  98. A survey of transformers. AI Open 3 (2022), 111–132.
  99. BATFormer: Towards Boundary-Aware Lightweight Transformer for Efficient Medical Image Segmentation. IEEE Journal of Biomedical and Health Informatics (2023).
  100. Augmenting Reinforcement Learning with Transformer-based Scene Representation Learning for Decision-making of Autonomous Driving. CoRR abs/2208.12263 (2022).
  101. Taming MAML: Efficient unbiased meta-reinforcement learning. In Proceedings of the 36th ICML, Vol. 97. PMLR, 4061–4071.
  102. Understanding the Difficulty of Training Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. Association for Computational Linguistics, 5747–5763.
  103. When Is Partially Observable Reinforcement Learning Not Scary?. In Conference on Learning Theory, Vol. 178. PMLR, 5175–5220.
  104. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J. Cheminformatics 15, 1, 24.
  105. A ConvNet for the 2020s. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 11966–11976.
  106. Object-Centric Learning with Slot Attention. In NeurIPS.
  107. Implementing an Online Scheduling Approach for Production with Multi Agent Proximal Policy Optimization (MAPPO). In Advances in Production Management Systems. Artificial Intelligence for Sustainable and Resilient Production Systems: IFIP WG 5.7 International Conference, APMS 2021, Nantes, France, September 5–9, 2021, Proceedings, Part V. Springer, 586–595.
  108. Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations. CoRR abs/2206.04779 (2022).
  109. A Learning-based Iterative Method for Solving Vehicle Routing Problems. In 8th International Conference on Learning Representations, ICLR.
  110. SOFT: Softmax-free Transformer with Linear Complexity. In NeurIPS. 21297–21309.
  111. Long-Term Credit Assignment via Model-based Temporal Shortcuts. In Deep RL Workshop NeurIPS 2021.
  112. Investigating Transfer Learning Capabilities of Vision Transformers and CNNs by Fine-Tuning a Single Trainable Block. CoRR abs/2110.05270 (2021).
  113. Reinforcement Learning with Attention that Works: A Self-Supervised Approach. In Neural Information Processing - 26th International Conference, ICONIP, Vol. 1143. Springer, 223–230.
  114. Reinforcement learning for combinatorial optimization: A survey. Comput. Oper. Res. 134 (2021), 105400.
  115. Luckeciano C. Melo. 2022. Transformers are Meta-Reinforcement Learners. In International Conference on Machine Learning, ICML, Vol. 162. PMLR, 15340–15359.
  116. Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks. CoRR abs/2112.02845 (2021).
  117. Counterfactual Credit Assignment in Model-Free Reinforcement Learning. In Proceedings of the 38th ICML, Vol. 139. PMLR, 7654–7664.
  118. Transformers are Sample Efficient World Models. CoRR abs/2209.00588 (2022).
  119. A Survey of Explainable Reinforcement Learning. CoRR abs/2202.08434 (2022).
  120. Nikola Milosevic. 2016. Equity forecast: Predicting long term stock price movement using machine learning. CoRR abs/1603.00751 (2016).
  121. Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021. Association for Computational Linguistics, 5288–5304.
  122. Playing Atari with Deep Reinforcement Learning. CoRR abs/1312.5602 (2013).
  123. Model-based Reinforcement Learning: A Survey. CoRR abs/2006.16712 (2020).
  124. Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark. In NeurIPS Competition and Demonstration Track, Vol. 133. PMLR, 361–395.
  125. Foundation models for generalist medical artificial intelligence. Nature 616, 7956 (2023), 259–265.
  126. Andrea Mor and Maria Grazia Speranza. 2022. Vehicle routing problems over time: a survey. Ann. Oper. Res. 314, 1 (2022), 255–275.
  127. POPGym: Benchmarking Partially Observable Reinforcement Learning. CoRR abs/2303.01859 (2023). https://doi.org/10.48550/arXiv.2303.01859 arXiv:2303.01859
  128. Yoshinari Motokawa and Toshiharu Sugawara. 2021. MAT-DQN: Toward Interpretable Multi-agent Deep Reinforcement Learning for Coordinated Activities. In 30th International Conference on Artificial Neural Networks, Vol. 12894. Springer, 556–567.
  129. Comparing BERT-based Reward Functions for Deep Reinforcement Learning in Machine Translation. In Proceedings of the 9th Workshop on Asian Translation, WAT@COLING 2022, Gyeongju, Republic of Korea, October 17, 2022. International Conference on Computational Linguistics, 37–43.
  130. Yusuf Nasir and Louis J. Durlofsky. 2023. Deep reinforcement learning for optimal well control in subsurface systems with uncertain geology. J. Comput. Phys. 477 (2023), 111945.
  131. Learning Beam Search Policies via Imitation Learning. In NeurIPS. 10675–10684.
  132. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 50, 9 (2020), 3826–3839.
  133. Improving stability in deep reinforcement learning with weight averaging. In Uncertainty in artificial intelligence workshop on uncertainty in Deep learning.
  134. Johan Samir Obando-Ceron and Pablo Samuel Castro. 2021. Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research. In Proceedings of the 38th International Conference on Machine Learning, ICML, Vol. 139. PMLR, 1373–1383.
  135. Solving Rubik’s Cube with a Robot Hand. CoRR abs/1910.07113 (2019).
  136. Afshin Oroojlooy and Davood Hajinezhad. 2023. A review of cooperative multi-agent deep reinforcement learning. Appl. Intell. 53, 11 (2023), 13677–13722.
  137. Training language models to follow instructions with human feedback. In NeurIPS.
  138. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL, 311–318.
  139. Stabilizing Transformers for Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML, Vol. 119. PMLR, 7487–7498.
  140. Axel Parmentier and Vincent T’kindt. 2023. Structured learning based heuristics to solve the single machine scheduling problem with release times and sum of completion times. Eur. J. Oper. Res. 305, 3 (2023), 1032–1041.
  141. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, ICML, Vol. 28. JMLR.org, 1310–1318.
  142. You Can’t Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments. (2022).
  143. U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. In Machine Learning in Medical Imaging - 12th International Workshop, MLMI, MICCAI, Vol. 12966. Springer, 267–276.
  144. Pretraining for Language-Conditioned Imitation with Transformers.
  145. Interpretable Navigation Agents Using Attention-Augmented Memory. In IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022, Prague, Czech Republic, October 9-12, 2022. IEEE, 2575–2582.
  146. Improving language understanding by generative pre-training. (2018).
  147. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. In Proceedings of the 36th International Conference on Machine Learning, ICML, Vol. 97. PMLR, 5331–5340.
  148. Dhanesh Ramachandram and Graham W. Taylor. 2017. Deep Multimodal Learning: A Survey on Recent Advances and Trends. IEEE Signal Process. Mag. 34, 6 (2017), 96–108.
  149. Combiner: Full Attention Transformer with Sparse Computation Cost. In NeurIPS. 22470–22482.
  150. Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness. In The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS, Vol. 108. PMLR, 2370–2380.
  151. Transformer-based World Models Are Happy With 100k Interactions. CoRR abs/2303.07109 (2023).
  152. Habitat: A Platform for Embodied AI Research. In IEEE/CVF International Conference on Computer Vision, ICCV. IEEE, 9338–9346.
  153. Jürgen Schmidhuber. 2019. Reinforcement Learning Upside Down: Don’t Predict Rewards - Just Map Them to Actions. CoRR abs/1912.02875 (2019).
  154. Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017).
  155. Rewards Prediction-Based Credit Assignment for Reinforcement Learning With Sparse Binary Rewards. IEEE Access 7 (2019), 118776–118791.
  156. Masked World Models for Visual Control. In Conference on Robot Learning, CoRL, Vol. 205. PMLR, 1332–1344.
  157. Sofia Serrano and Noah A. Smith. 2019. Is Attention Interpretable?. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL. Association for Computational Linguistics, 2931–2951.
  158. StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning. In Computer Vision - ECCV - 17th European Conference, Vol. 13699. Springer, 462–479.
  159. A Survey of Deep Reinforcement Learning in Video Games. CoRR abs/1912.10944 (2019).
  160. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning. IEEE Trans. Emerg. Top. Comput. Intell. 3, 1 (2019), 73–84.
  161. Survey of Recent Multi-Agent Reinforcement Learning Algorithms Utilizing Centralized Training. CoRR abs/2107.14316 (2021).
  162. On automating hyperparameter optimization for deep learning applications. In 2021 IEEE Signal Processing in Medicine and Biology Symposium (SPMB). IEEE, 1–7.
  163. Offline Reinforcement Learning for Autonomous Driving with Safety and Exploration Enhancement. CoRR abs/2110.07067 (2021).
  164. Carson Smith. 2022. Attention-Based Learning for Combinatorial Optimization. Ph. D. Dissertation. Massachusetts Institute of Technology.
  165. Scalable Bayesian Optimization Using Deep Neural Networks. In Proceedings of the 32nd ICML, Vol. 37. JMLR.org, 2171–2180.
  166. Multimodal Deep Reinforcement Learning with Auxiliary Task for Obstacle Avoidance of Indoor Mobile Robot. Sensors 21, 4 (2021), 1363.
  167. Deep reinforcement learning in World-Earth system models to discover sustainable management strategies. CoRR abs/1908.05567 (2019).
  168. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement learning - an introduction. MIT Press.
  169. Portfolio optimization-based stock prediction using long-short term memory network in quantitative trading. Applied Sciences 10, 2 (2020), 437.
  170. Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels. CoRR abs/2204.04905 (2022).
  171. Ankit Thakkar and Kinjal Chaudhari. 2021. A comprehensive survey on portfolio optimization, stock price and trend prediction using particle swarm optimization. Archives of Computational Methods in Engineering 28 (2021), 2133–2164.
  172. Natural language processing with transformers. " O’Reilly Media, Inc.".
  173. Attention is All you Need. In 30th Annual Conference on Neural Information Processing Systems. 5998–6008.
  174. CIDEr: Consensus-based image description evaluation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE Computer Society, 4566–4575.
  175. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nat. 575, 7782 (2019), 350–354.
  176. StarCraft II: A New Challenge for Reinforcement Learning. CoRR abs/1708.04782 (2017).
  177. Multi-channel LSTM-CNN model for Vietnamese sentiment analysis. In 9th International Conference on Knowledge and Systems Engineering, KSE 2017, Hue, Vietnam, October 19-21, 2017. IEEE, 24–29.
  178. Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning. Nat. Mach. Intell. 3, 10 (2021), 914–922.
  179. Boundary-Aware Transformers for Skin Lesion Segmentation. In Medical Image Computing and Computer Assisted Intervention - MICCAI 24th International Conference, Vol. 12901. Springer, 206–216.
  180. A Distributed Vehicle-assisted Computation Offloading Scheme based on DRL in Vehicular Networks. In 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, May 16-19, 2022. IEEE, 200–209.
  181. Bootstrapped Transformer for Offline Reinforcement Learning. In NeurIPS.
  182. Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, ICML, Vol. 119. PMLR, 10092–10103.
  183. Stabilizing Voltage in Power Distribution Networks via Multi-Agent Reinforcement Learning with Transformer. In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022. ACM, 1899–1909.
  184. Solving combinatorial optimization problems over graphs with BERT-Based Deep Reinforcement Learning. Inf. Sci. 619 (2023), 930–946.
  185. NerveNet: Learning Structured Policy with Graph Neural Networks. In 6th International Conference on Learning Representations, ICLR.
  186. From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning. In The Thirty-Fourth Conference on Artificial Intelligence, AAAI. AAAI Press, 7293–7300.
  187. Yang Wang and Zhibin Chen. 2022. A Deep Reinforcement Learning Algorithm Using A New Graph Transformer Model for Routing Problems. In Intelligent Systems and Applications - Proceedings of the Intelligent Systems Conference, IntelliSys, Vol. 544. Springer, 365–379.
  188. Local-Global-Aware Convolutional Transformer for Hyperspectral Image Classification. In 23rd Int Conf on High Performance Computing. IEEE, 1188–1194.
  189. Uformer: A General U-Shaped Transformer for Image Restoration. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 17662–17672.
  190. Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic. In Thirty-Sixth AAAI Conference on Artificial Intelligence. AAAI Press, 8612–8620.
  191. Multi-Agent Reinforcement Learning is a Sequence Modeling Problem. In NeurIPS.
  192. Ronald J. Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn. 8 (1992), 229–256.
  193. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. CoRR abs/2301.00808 (2023).
  194. Learning Improvement Heuristics for Solving Routing Problems. IEEE Trans. Neural Networks Learn. Syst. 33, 9 (2022), 5057–5069.
  195. Reinforced Transformer for Medical Image Captioning. In Machine Learning in Medical Imaging - 10th International Workshop, MLMI, MICCAI, Vol. 11861. Springer, 673–680.
  196. Relation-Aware Transformer for Portfolio Policy Learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020. ijcai.org, 4647–4653.
  197. Prompting Decision Transformer for Few-Shot Policy Generalization. In International Conference on Machine Learning, ICML, Vol. 162. PMLR, 24631–24645.
  198. AME: Attention and Memory Enhancement in Hyper-Parameter Optimization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 480–489.
  199. Multimodal Learning with Transformers: A Survey. CoRR abs/2206.06488 (2022).
  200. Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL. CoRR abs/2209.03993 (2022).
  201. Multi-granularity scenarios understanding network for trajectory prediction. Complex & Intelligent Systems (2022), 1–14.
  202. Transformer-Based Generative Model Accelerating the Development of Novel BRAF Inhibitors. ACS Omega 6 (2021), 33864 – 33873.
  203. Improving Sample Efficiency in Model-Free Reinforcement Learning from Images. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI. AAAI Press, 10674–10681.
  204. Do Transformers Really Perform Badly for Graph Representation?. In NeurIPS. 28877–28888.
  205. Kenny Young and Richard S. Sutton. 2020. Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning. CoRR abs/2010.15268 (2020).
  206. Reinforcement Learning in Healthcare: A Survey. ACM Comput. Surv. 55, 2 (2023), 5:1–5:36.
  207. Learning Efficient Multi-agent Cooperative Visual Exploration. In Computer Vision - ECCV - 17th European Conference, Vol. 13699. Springer, 497–515.
  208. Scalable lifelong reinforcement learning. Pattern Recognit. 72 (2017), 407–418.
  209. Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL. In NeurIPS.
  210. Exploiting Transformer in Reinforcement Learning for Interpretable Temporal Logic Motion Planning. CoRR abs/2209.13220 (2022).
  211. Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, Melbourne, Australia. Association for Computational Linguistics, 140–150.
  212. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents. In Proceedings of the 35th ICML, Vol. 80. PMLR, 5867–5876.
  213. Visual Representation Learning with Transformer: A Sequence-to-Sequence Perspective. CoRR abs/2207.09339 (2022).
  214. TVENet: Transformer-Based Visual Exploration Network for Mobile Robot in Unseen Environment. IEEE Access 10 (2022), 62056–62072.
  215. One for all: One-stage referring expression comprehension with dynamic reasoning. Neurocomputing 518 (2023), 523–532.
  216. Self-Adaptive Neural Module Transformer for Visual Question Answering. IEEE Trans. Multim. 23 (2021), 1264–1273.
  217. A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT. CoRR abs/2302.09419 (2023).
  218. ConvNets vs. Transformers: Whose Visual Representations are More Transferable?. In IEEE/CVF International Conference on Computer Vision Workshops, ICCVW. IEEE, 2230–2238.
  219. Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning. In NeurIPS.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Pranav Agarwal (9 papers)
  2. Aamer Abdul Rahman (3 papers)
  3. Pierre-Luc St-Charles (7 papers)
  4. Simon J. D. Prince (4 papers)
  5. Samira Ebrahimi Kahou (50 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com