Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spatially-Aware Transformer for Embodied Agents (2402.15160v3)

Published 23 Feb 2024 in cs.LG and cs.AI

Abstract: Episodic memory plays a crucial role in various cognitive processes, such as the ability to mentally recall past events. While cognitive science emphasizes the significance of spatial context in the formation and retrieval of episodic memory, the current primary approach to implementing episodic memory in AI systems is through transformers that store temporally ordered experiences, which overlooks the spatial dimension. As a result, it is unclear how the underlying structure could be extended to incorporate the spatial axis beyond temporal order alone and thereby what benefits can be obtained. To address this, this paper explores the use of Spatially-Aware Transformer models that incorporate spatial information. These models enable the creation of place-centric episodic memory that considers both temporal and spatial dimensions. Adopting this approach, we demonstrate that memory utilization efficiency can be improved, leading to enhanced accuracy in various place-centric downstream tasks. Additionally, we propose the Adaptive Memory Allocator, a memory management method based on reinforcement learning that aims to optimize efficiency of memory utilization. Our experiments demonstrate the advantages of our proposed model in various environments and across multiple downstream tasks, including prediction, generation, reasoning, and reinforcement learning. The source code for our models and experiments will be available at https://github.com/junmokane/spatially-aware-transformer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
  2. Space and time: the hippocampus as a sequence generator. Trends in cognitive sciences, 22(10):853–869, 2018.
  3. Transdreamer: Reinforcement learning with transformer world models. arXiv preprint arXiv:2202.09481, 2022.
  4. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  5. Maxime Chevalier-Boisvert. Miniworld: Minimalistic 3d environment for rl & robotics research. https://github.com/maximecb/gym-miniworld, 2018.
  6. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860, 2019.
  7. Peter J Denning. The locality principle. Communications of the ACM, 48(7):19–24, 2005.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  10. Space, time, and episodic memory: The hippocampus is all over the cognitive map. Hippocampus, 28(9):680–687, 2018.
  11. Ahmed El-Rabbany. Introduction to GPS: the global positioning system. Artech house, 2002.
  12. Neural scene representation and rendering. Science, 360(6394):1204–1210, 2018. doi: 10.1126/science.aar6170. URL https://www.science.org/doi/abs/10.1126/science.aar6170.
  13. Location fingerprinting with bluetooth low energy beacons. IEEE journal on Selected Areas in Communications, 33(11):2418–2428, 2015.
  14. Generative temporal models with spatial memory for partially observed environments. In International conference on machine learning, pp. 1549–1558. PMLR, 2018.
  15. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
  16. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471–476, 2016.
  17. Shaping belief states with generative environment models for rl. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/2c048d74b3410237704eb7f93a10c9d7-Paper.pdf.
  18. Minerl: A large-scale dataset of minecraft demonstrations. arXiv preprint arXiv:1907.13440, 2019.
  19. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  20. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16000–16009, 2022.
  21. Optimizing agent behavior over long time scales by transporting value. Nature communications, 10(1):5223, 2019.
  22. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  23. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  24. Towards mental time travel: a hierarchical memory for reinforcement learning agents. Advances in Neural Information Processing Systems, 34:28182–28195, 2021.
  25. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, pp.  3. Atlanta, Georgia, USA, 2013.
  26. Transformers are sample efficient world models. arXiv preprint arXiv:2209.00588, 2022.
  27. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  28. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp.  807–814, 2010.
  29. Efficient transformers in reinforcement learning using actor-learner distillation. arXiv preprint arXiv:2104.01655, 2021.
  30. Stabilizing transformers for reinforcement learning. In International conference on machine learning, pp. 7487–7498. PMLR, 2020.
  31. Space and time in episodic memory: Effects of linearity and directionality on memory for spatial location and temporal order in children and adults. PLoS One, 13(11):e0206999, 2018.
  32. Memory as mental time travel. The Routledge handbook of philosophy of memory, pp. 228–239, 2017.
  33. Learning models for visual 3d localization with implicit mapping. arXiv preprint arXiv:1807.03149, 2018.
  34. Habitat: A Platform for Embodied AI Research. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  35. Episodic future thinking: Mechanisms and functions. Current opinion in behavioral sciences, 17:41–50, 2017.
  36. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  37. Sequential neural processes. Advances in Neural Information Processing Systems, 32, 2019.
  38. Mental time travel and the shaping of the human mind. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1521):1317–1324, 2009.
  39. Visual slam algorithms: A survey from 2010 to 2016. IPSJ Transactions on Computer Vision and Applications, 9(1):1–11, 2017.
  40. The spatial memory pipeline: a model of egocentric to allocentric understanding in mammalian brains. bioRxiv, 2020. doi: 10.1101/2020.11.11.378141. URL https://www.biorxiv.org/content/early/2020/11/12/2020.11.11.378141.
  41. Are space and time automatically integrated in episodic memory? Memory, 14(2):232–240, 2006.
  42. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  43. Relating transformers to models and neural representations of the hippocampal formation. 2022.
  44. Robustifying sequential neural processes. International Conference on Machine Learning, 2020.
  45. Neural slam: Learning to explore with external memory. arXiv preprint arXiv:1706.09520, 2017.

Summary

We haven't generated a summary for this paper yet.