Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Where Do We Go from Here? Multi-scale Allocentric Relational Inference from Natural Spatial Descriptions (2402.16364v2)

Published 26 Feb 2024 in cs.CL, cs.LG, and cs.MM

Abstract: When communicating routes in natural language, the concept of acquired spatial knowledge is crucial for geographic information retrieval (GIR) and in spatial cognitive research. However, NLP navigation studies often overlook the impact of such acquired knowledge on textual descriptions. Current navigation studies concentrate on egocentric local descriptions (e.g., it will be on your right') that require reasoning over the agent's local perception. These instructions are typically given as a sequence of steps, with each action-step explicitly mentioning and being followed by a landmark that the agent can use to verify they are on the right path (e.g.,turn right and then you will see...'). In contrast, descriptions based on knowledge acquired through a map provide a complete view of the environment and capture its overall structure. These instructions (e.g., `it is south of Central Park and a block north of a police station') are typically non-sequential, contain allocentric relations, with multiple spatial relations and implicit actions, without any explicit verification. This paper introduces the Rendezvous (RVS) task and dataset, which includes 10,404 examples of English geospatial instructions for reaching a target location using map-knowledge. Our analysis reveals that RVS exhibits a richer use of spatial allocentric relations, and requires resolving more spatial relations simultaneously compared to previous text-based navigation benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Kwasi Abebrese. 2019. Implementing street addressing system in an evolving urban center. A case study of the Kumasi metropolitan area in Ghana. Ph.D. thesis, Iowa State University.
  2. The hcrc map task corpus. Language and speech, 34(4):351--366.
  3. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3674--3683.
  4. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision, pages 2425--2433.
  5. Following high-level navigation instructions on a simulated quadcopter with imitation learning. arXiv preprint arXiv:1806.00047.
  6. Touchdown: Natural language navigation and spatial reasoning in visual street environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  7. Talk the walk: Navigating new york city through grounded dialogue. arXiv preprint arXiv:1807.03367.
  8. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 855--864.
  9. William G Hayward and Michael J Tarr. 1995. Spatial language and spatial representation. Cognition, 55(1):39--84.
  10. David Hilbert. 1935. Über die stetige abbildung einer linie auf ein flächenstück. In Dritter Band: Analysis· Grundlagen der Mathematik· Physik Verschiedenes, pages 1--2. Springer.
  11. Geo-knowledge-guided gpt models improve the extraction of location descriptions from disaster-related social media messages. International Journal of Geographical Information Science, pages 1--30.
  12. Learning to execute instructions in a minecraft dialogue. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 2589--2602.
  13. Abstract visual reasoning with tangram shapes. arXiv preprint arXiv:2211.16492.
  14. Amir Krause and Sara Cohen. 2020. Deriving geolocations in wikipedia. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 3293--3296.
  15. Amir Krause and Sara Cohen. 2023. Geographic information retrieval using wikipedia articles. In Proceedings of the ACM Web Conference 2023, pages 3331--3341.
  16. Room-Across-Room: Multilingual vision-and-language navigation with dense spatiotemporal grounding. In Conference on Empirical Methods for Natural Language Processing (EMNLP).
  17. Benjamin Kuipers. 1978. Modeling spatial knowledge. Cognitive science, 2(2):129--153.
  18. Draw me a flower: Processing and grounding abstraction in natural language. Transactions of the Association for Computational Linguistics, 10:1341--1356.
  19. Michael Levit and Deb Roy. 2007. Interpretation of spatial language in a map navigation task. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(3):667--679.
  20. Ilya Loshchilov and Frank Hutter. 2017. Fixing weight decay regularization in adam. CoRR, abs/1711.05101.
  21. Unified-io: A unified model for vision, language, and multi-modal tasks. arXiv preprint arXiv:2206.08916.
  22. Walk the talk: Connecting language, knowledge, and action in route instructions. Def, 2(6):4.
  23. The streetlearn environment and dataset. arXiv preprint arXiv:1903.01292.
  24. Mapping instructions to actions in 3d environments with visual goal prediction. arXiv preprint arXiv:1809.00786.
  25. Zest: Zero-shot learning from text descriptions using textual similarity and visual summarization. arXiv preprint arXiv:2010.03276.
  26. Hegel: A novel dataset for geo-location from hebrew text. arXiv preprint arXiv:2307.00509.
  27. Tzuf Paz-Argaman and Reut Tsarfaty. 2019. RUN through the streets: A new dataset and baseline models for realistic urban navigation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6449--6455, Hong Kong, China. Association for Computational Linguistics.
  28. The emerging spatial mind. Oxford University Press.
  29. Reverie: Remote embodied visual referring expression in real indoor environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  30. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1--67.
  31. Mark Sanderson and Janet Kohler. 2004. Analyzing geographic queries. In SIGIR workshop on geographic information retrieval, volume 2, pages 8--10.
  32. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10740--10749.
  33. Alexander W Siegel and Sheldon H White. 1975. The development of spatial representations of large-scale environments. Advances in child development and behavior, 10:9--55.
  34. From e-sex to e-commerce: Web search changes. Computer, 35(3):107--109.
  35. Holly A Taylor and Barbara Tversky. 1992a. Descriptions and depictions of environments. Memory & cognition, 20(5):483--496.
  36. Holly A Taylor and Barbara Tversky. 1992b. Spatial mental models derived from survey and route descriptions. Journal of Memory and language, 31(2):261--292.
  37. Holly A Taylor and Barbara Tversky. 1996. Perspective in spatial descriptions. Journal of memory and language, 35(3):371--391.
  38. P Thorndyke. 1981. Spatial cognition and reasoning. Cognition Social Behavior, and The Environment, 7.
  39. Perry W Thorndyke and Barbara Hayes-Roth. 1982. Differences in spatial knowledge acquired from maps and navigation. Cognitive psychology, 14(4):560--589.
  40. Michael Tlauka and Paul N Wilson. 1994. The effect of landmarks on route-learning in a computer-simulated environment. Journal of Environmental Psychology, 14(4):305--313.
  41. Edward C Tolman. 1948. Cognitive maps in rats and men. Psychological review, 55(4):189.
  42. Barbara Tversky. 1996. Spatial perspective in descriptions. Language and space, 3:463--491.
  43. Barbara Tversky. 2005. Functional significance of visuospatial representations. Handbook of higher-level visuospatial thinking, pages 1--34.
  44. UPU. 2012. Addressing the world – An address for everyone.
  45. David H Uttal. 2000. Seeing the big picture: Map use and the development of spatial cognition. Developmental Science, 3(3):247--264.
  46. Talk2nav: Long-range vision-and-language navigation with dual attention and spatial memory. International Journal of Computer Vision, 129:246--266.
  47. Adam Vogel and Dan Jurafsky. 2010. Learning to follow navigational directions. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 806--814. Association for Computational Linguistics.
  48. Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation. arXiv preprint arXiv:1803.07729.
  49. Benjamin Wing and Jason Baldridge. 2014. Hierarchical discriminative classification for text-based geolocation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 336--348.
  50. Chalet: Cornell house agent learning environment. arXiv preprint arXiv:1801.07357.
  51. Vector-quantized image modeling with improved vqgan. arXiv preprint arXiv:2110.04627.
  52. Michael JQ Zhang and Eunsol Choi. 2021. Situatedqa: Incorporating extra-linguistic contexts into qa. arXiv preprint arXiv:2109.06157.
Citations (2)

Summary

We haven't generated a summary for this paper yet.