Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People (2407.08219v1)

Published 11 Jul 2024 in cs.CL and cs.HC

Abstract: Navigating unfamiliar environments presents significant challenges for blind and low-vision (BLV) individuals. In this work, we construct a dataset of images and goals across different scenarios such as searching through kitchens or navigating outdoors. We then investigate how grounded instruction generation methods can provide contextually-relevant navigational guidance to users in these instances. Through a sighted user study, we demonstrate that large pretrained LLMs can produce correct and useful instructions perceived as beneficial for BLV users. We also conduct a survey and interview with 4 BLV users and observe useful insights on preferences for different instructions based on the scenario.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. P. Ackland, S. Resnikoff, and R. Bourne, “World blindness and visual impairment: despite many successes, the problem is growing,” Community eye health, 2017.
  2. Be My Eyes, “Be my eyes,” https://www.bemyeyes.com/.
  3. H. Hwang, H.-T. Jung, N. A. Giudice, J. Biswas, S. I. Lee, and D. Kim, “Towards robotic companions: Understanding handler-guide dog interactions for informed guide dog robot design,” Conference on Human Factors in Computing Systems (CHI), 2024.
  4. S. Real and A. Araujo, “Navigation systems for the blind and visually impaired: Past work, challenges, and open problems,” Sensors, 2019. [Online]. Available: https://www.mdpi.com/1424-8220/19/15/3404
  5. A. Cassinelli, C. Reynolds, and M. Ishikawa, “Augmenting spatial awareness with haptic radar,” in IEEE International Symposium on Wearable Computers, 2006.
  6. R. N. Kandalan and K. Namuduri, “Techniques for constructing indoor navigation systems for the visually impaired: A review,” IEEE Transactions on Human-Machine Systems, 2020.
  7. Y. Lin, K. Wang, W. Yi, and S. Lian, “Deep learning based wearable assistive system for visually impaired people,” in ICCV Workshops, Oct 2019.
  8. Z. Bauer, A. Dominguez, E. Cruz, F. Gomez-Donoso, S. Orts-Escolano, and M. Cazorla, “Enhancing perception for the visually impaired with deep learning techniques and low-cost wearable sensors,” Pattern Recognition Letters, 2020.
  9. M. Leo, A. Furnari, G. G. Medioni, M. Trivedi, and G. M. Farinella, “Deep learning for assistive computer vision,” in ECCV Workshops, September 2018.
  10. H. Walle, C. De Runz, B. Serres, and G. Venturini, “A survey on recent advances in ai and vision-based methods for helping and guiding visually impaired people,” Applied Sciences, 2022. [Online]. Available: https://www.mdpi.com/2076-3417/12/5/2308
  11. K. Thakoor, N. Mante, C. Zhang, C. Siagian, J. Weiland, L. Itti, and G. Medioni, “A system for assisting the visually impaired in localization and grasp of desired objects,” in ECCV Workshops, L. Agapito, M. M. Bronstein, and C. Rother, Eds., 2014.
  12. S. Malek, F. Melgani, M. L. Mekhalfi, and Y. Bazi, “Real-time indoor scene description for the visually impaired using autoencoder fusion strategies with visible cameras,” Sensors, 2017. [Online]. Available: https://www.mdpi.com/1424-8220/17/11/2641
  13. K. M. P. Hoogsteen, S. Szpiro, G. Kreiman, and E. Peli, “Beyond the cane: Describing urban scenes to blind people for mobility tasks,” ACM Trans. Access. Comput., 2022. [Online]. Available: https://doi.org/10.1145/3522757
  14. D. Gurari, Q. Li, A. J. Stangl, A. Guo, C. Lin, K. Grauman, J. Luo, and J. P. Bigham, “Vizwiz grand challenge: Answering visual questions from blind people,” in CVPR, 2018.
  15. E. Kreiss, C. Bennett, S. Hooshmand, E. Zelikman, M. R. Morris, and C. Potts, “Context matters for image descriptions for accessibility: Challenges for referenceless evaluation metrics,” EMNLP, 2022.
  16. M. K. Scheuerman, W. Easley, A. Abdolrahmani, A. Hurst, and S. Branham, “Learning the language: The importance of studying written directions in designing navigational technologies for the blind,” in CHI Extended Abstracts on Human Factors in Computing Systems, 2017. [Online]. Available: https://doi.org/10.1145/3027063.3053260
  17. M. A. Williams, C. Galbraith, S. K. Kane, and A. Hurst, “”just let the cane hit it”: how the blind and sighted see navigation differently,” in ACM SIGACCESS Conference on Computers & Accessibility, ser. ASSETS, 2014. [Online]. Available: https://doi.org/10.1145/2661334.2661380
  18. M. Minderer, A. Gritsenko, A. Stone, M. Neumann, D. Weissenborn, A. Dosovitskiy, A. Mahendran, A. Arnab, M. Dehghani, Z. Shen et al., “Simple open-vocabulary object detection,” in ECCV, 2022.
  19. A. Zeng, M. Attarian, B. Ichter, K. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. Ryoo, V. Sindhwani, J. Lee, V. Vanhoucke, and P. Florence, “Socratic models: Composing zero-shot multimodal reasoning with language,” 2022.
  20. D. Kim, W. Ga, P. Ahn, D. Joo, S. Chun, and J. Kim, “Global-local path networks for monocular depth estimation with vertical cutdepth,” CoRR, vol. abs/2201.07436, 2022. [Online]. Available: https://arxiv.org/abs/2201.07436
  21. J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  22. T. Srinivasan and Y. Bisk, “Worst of both worlds: Biases compound in pre-trained vision-and-language models,” NAACL Workshop on Gender Bias in Natural Language Processing, 2021.

Summary

We haven't generated a summary for this paper yet.