2000 character limit reached
Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People (2407.08219v1)
Published 11 Jul 2024 in cs.CL and cs.HC
Abstract: Navigating unfamiliar environments presents significant challenges for blind and low-vision (BLV) individuals. In this work, we construct a dataset of images and goals across different scenarios such as searching through kitchens or navigating outdoors. We then investigate how grounded instruction generation methods can provide contextually-relevant navigational guidance to users in these instances. Through a sighted user study, we demonstrate that large pretrained LLMs can produce correct and useful instructions perceived as beneficial for BLV users. We also conduct a survey and interview with 4 BLV users and observe useful insights on preferences for different instructions based on the scenario.
- P. Ackland, S. Resnikoff, and R. Bourne, “World blindness and visual impairment: despite many successes, the problem is growing,” Community eye health, 2017.
- Be My Eyes, “Be my eyes,” https://www.bemyeyes.com/.
- H. Hwang, H.-T. Jung, N. A. Giudice, J. Biswas, S. I. Lee, and D. Kim, “Towards robotic companions: Understanding handler-guide dog interactions for informed guide dog robot design,” Conference on Human Factors in Computing Systems (CHI), 2024.
- S. Real and A. Araujo, “Navigation systems for the blind and visually impaired: Past work, challenges, and open problems,” Sensors, 2019. [Online]. Available: https://www.mdpi.com/1424-8220/19/15/3404
- A. Cassinelli, C. Reynolds, and M. Ishikawa, “Augmenting spatial awareness with haptic radar,” in IEEE International Symposium on Wearable Computers, 2006.
- R. N. Kandalan and K. Namuduri, “Techniques for constructing indoor navigation systems for the visually impaired: A review,” IEEE Transactions on Human-Machine Systems, 2020.
- Y. Lin, K. Wang, W. Yi, and S. Lian, “Deep learning based wearable assistive system for visually impaired people,” in ICCV Workshops, Oct 2019.
- Z. Bauer, A. Dominguez, E. Cruz, F. Gomez-Donoso, S. Orts-Escolano, and M. Cazorla, “Enhancing perception for the visually impaired with deep learning techniques and low-cost wearable sensors,” Pattern Recognition Letters, 2020.
- M. Leo, A. Furnari, G. G. Medioni, M. Trivedi, and G. M. Farinella, “Deep learning for assistive computer vision,” in ECCV Workshops, September 2018.
- H. Walle, C. De Runz, B. Serres, and G. Venturini, “A survey on recent advances in ai and vision-based methods for helping and guiding visually impaired people,” Applied Sciences, 2022. [Online]. Available: https://www.mdpi.com/2076-3417/12/5/2308
- K. Thakoor, N. Mante, C. Zhang, C. Siagian, J. Weiland, L. Itti, and G. Medioni, “A system for assisting the visually impaired in localization and grasp of desired objects,” in ECCV Workshops, L. Agapito, M. M. Bronstein, and C. Rother, Eds., 2014.
- S. Malek, F. Melgani, M. L. Mekhalfi, and Y. Bazi, “Real-time indoor scene description for the visually impaired using autoencoder fusion strategies with visible cameras,” Sensors, 2017. [Online]. Available: https://www.mdpi.com/1424-8220/17/11/2641
- K. M. P. Hoogsteen, S. Szpiro, G. Kreiman, and E. Peli, “Beyond the cane: Describing urban scenes to blind people for mobility tasks,” ACM Trans. Access. Comput., 2022. [Online]. Available: https://doi.org/10.1145/3522757
- D. Gurari, Q. Li, A. J. Stangl, A. Guo, C. Lin, K. Grauman, J. Luo, and J. P. Bigham, “Vizwiz grand challenge: Answering visual questions from blind people,” in CVPR, 2018.
- E. Kreiss, C. Bennett, S. Hooshmand, E. Zelikman, M. R. Morris, and C. Potts, “Context matters for image descriptions for accessibility: Challenges for referenceless evaluation metrics,” EMNLP, 2022.
- M. K. Scheuerman, W. Easley, A. Abdolrahmani, A. Hurst, and S. Branham, “Learning the language: The importance of studying written directions in designing navigational technologies for the blind,” in CHI Extended Abstracts on Human Factors in Computing Systems, 2017. [Online]. Available: https://doi.org/10.1145/3027063.3053260
- M. A. Williams, C. Galbraith, S. K. Kane, and A. Hurst, “”just let the cane hit it”: how the blind and sighted see navigation differently,” in ACM SIGACCESS Conference on Computers & Accessibility, ser. ASSETS, 2014. [Online]. Available: https://doi.org/10.1145/2661334.2661380
- M. Minderer, A. Gritsenko, A. Stone, M. Neumann, D. Weissenborn, A. Dosovitskiy, A. Mahendran, A. Arnab, M. Dehghani, Z. Shen et al., “Simple open-vocabulary object detection,” in ECCV, 2022.
- A. Zeng, M. Attarian, B. Ichter, K. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. Ryoo, V. Sindhwani, J. Lee, V. Vanhoucke, and P. Florence, “Socratic models: Composing zero-shot multimodal reasoning with language,” 2022.
- D. Kim, W. Ga, P. Ahn, D. Joo, S. Chun, and J. Kim, “Global-local path networks for monocular depth estimation with vertical cutdepth,” CoRR, vol. abs/2201.07436, 2022. [Online]. Available: https://arxiv.org/abs/2201.07436
- J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
- T. Srinivasan and Y. Bisk, “Worst of both worlds: Biases compound in pre-trained vision-and-language models,” NAACL Workshop on Gender Bias in Natural Language Processing, 2021.