Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

"It's Kind of Context Dependent": Understanding Blind and Low Vision People's Video Accessibility Preferences Across Viewing Scenarios (2403.10792v1)

Published 16 Mar 2024 in cs.HC

Abstract: While audio description (AD) is the standard approach for making videos accessible to blind and low vision (BLV) people, existing AD guidelines do not consider BLV users' varied preferences across viewing scenarios. These scenarios range from how-to videos on YouTube, where users seek to learn new skills, to historical dramas on Netflix, where a user's goal is entertainment. Additionally, the increase in video watching on mobile devices provides an opportunity to integrate nonverbal output modalities (e.g., audio cues, tactile elements, and visual enhancements). Through a formative survey and 15 semi-structured interviews, we identified BLV people's video accessibility preferences across diverse scenarios. For example, participants valued action and equipment details for how-to videos, tactile graphics for learning scenarios, and 3D models for fantastical content. We define a six-dimensional video accessibility design space to guide future innovation and discuss how to move from "one-size-fits-all" paradigms to scenario-specific approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (138)
  1. Hussam Alkaissi and Samy I McFarlane. 2023. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 15, 2 (2023). https://doi.org/10.7759/cureus.35179
  2. An independent and interactive museum experience for blind people. In Proceedings of the 16th International Web for All Conference. 1–9. https://doi.org/10.1145/3315002.3317557
  3. Audio Description Coalition. 2009. Standards for Audio Description and Code of Professional Conduct for Describers. https://www.perkins.org/wp-content/uploads/elearning-media/adc_standards.pdf
  4. “It’s Complicated”: Negotiating Accessibility and (Mis) Representation in Image Descriptions of Race, Gender, and Disability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–19. https://doi.org/10.1145/3411764.3445498
  5. Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 333–342. https://doi.org/10.1145/1866029.1866080
  6. Automated Video Description for Blind and Low Vision Users. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7. https://doi.org/10.1145/3411763.3451810
  7. Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101. https://doi.org/10.1191/1478088706qp063oa
  8. Craig Brown and Amy Hurst. 2012. VizTouch: automatically generated tactile visualizations of coordinate spaces. In Proceedings of the Sixth International Conference on Tangible, Embedded and Embodied Interaction. 131–138. https://doi.org/10.1145/2148131.2148160
  9. Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities. In Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility. 135–142. https://doi.org/10.1145/2384916.2384941
  10. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1209–1218. https://doi.org/10.1109/CVPR.2018.00132
  11. CineAD: a system for automated audio description script generation for the visually impaired. Universal Access in the Information Society 19 (2020), 99–111. https://doi.org/10.1007/s10209-018-0634-4
  12. Sensory design in games: Beyond visual-based experiences. In Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Communication, Organization and Work: 11th International Conference, DHM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part II 22. Springer, 322–333. https://doi.org/10.1007/978-3-030-49907-5_23
  13. John M Carroll. 2003. Scenario-based design. MIT Press. https://arl.human.cornell.edu/linked%20docs/Scenario-Based%20Design%20John%20Carrol.pdf
  14. Accessible visual artworks for blind and visually impaired people: comparing a multimodal approach with tactile graphics. Electronics 10, 3 (2021), 297. https://doi.org/10.3390/electronics10030297
  15. An interactive multimodal guide to improve art accessibility for blind people. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 346–348. https://doi.org/10.1145/3234695.3241033
  16. ClassInFocus: enabling improved visual attention strategies for deaf and hard of hearing students. In Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility. 67–74. https://doi.org/10.1145/1639642.1639656
  17. Diagram Center. 2019. Image Description Guidelines. http://diagramcenter.org/table-of-contents-2.html
  18. OmniScribe: Authoring Immersive Audio Descriptions for 360 Videos. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–14. https://doi.org/10.1145/3526113.3545613
  19. Agnieszka Chmiel and Iwona Mazur. 2016. Researching preferences of audio description users—Limitations and solutions. Across Languages and Cultures 17, 2 (2016), 271–288. https://doi.org/10.1556/084.2016.17.2.7
  20. Agnieszka Chmiel and Iwona Mazur. 2022. A homogenous or heterogeneous audience? Audio description preferences of persons with congenital blindness, non-congenital blindness and low vision. Perspectives 30, 3 (2022), 552–567. https://doi.org/10.1080/0907676X.2021.1913198
  21. Attend to you: Personalized image captioning with context sequence memory networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 895–903. https://doi.org/10.1109/CVPR.2017.681
  22. Social Audio Description Collective. 2021. Spider Man No Way Home Trailer with Expanded Audio Description. https://www.youtube.com/watch?v=5CuHs-yVMLw
  23. Milagros Costabel. 2023. I’m Totally Blind. Artificial Intelligence Is Helping Me Rediscover the World. https://slate.com/technology/2023/10/ai-image-tools-blind-low-vision.html
  24. Google DeepMind. 2023. Gemini. https://deepmind.google/technologies/gemini/#build-with-gemini
  25. Dianna Delling. 2024. This ‘pictureless’ film is visionary cinema for those who can’t see. https://www.mastercard.com/news/perspectives/2024/australia-touch-film/
  26. Audio Description Tip Sheet. https://dcmp.org/learn/227-audio-description-tip-sheet
  27. Description Key. https://dcmp.org/learn/descriptionkey
  28. Description Key - How to Describe. https://dcmp.org/learn/617-description-key---how-to-describe
  29. Josh Dzieza. 2022. The Great Fiction of AI. https://www.theverge.com/c/23194235/ai-fiction-writing-amazon-kindle-sudowrite-jasper
  30. Odd Job Jack described: a universal design approach to described video. Universal Access in the Information society 5 (2006), 73–81. https://doi.org/10.1007/s10209-006-0025-0
  31. Anna Fernández-Torné and Anna Matamala. 2015. Text-to-speech vs. human voiced audio descriptions: a reception study in films dubbed into Catalan. The Journal of Specialised Translation 24 (2015), 61–88. https://core.ac.uk/download/pdf/78531939.pdf
  32. John C Flanagan. 1954. The critical incident technique. Psychological bulletin 51, 4 (1954), 327. https://doi.org/10.1037/h0061470
  33. Chancey Fleet. 2017. Announcing Dimensions: Community Tools for Creating Tactile Graphics & Objects. https://www.nypl.org/blog/2017/10/18/dimensions-tactile-graphics-objects
  34. Louise Fryer. 2016. An introduction to audio description: A practical guide. Routledge. https://doi.org/10.4324/9781315707228
  35. Stylenet: Generating attractive visual captions with styles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3137–3146. https://doi.org/10.1109/CVPR.2017.108
  36. An Autoethnographic Case Study of Generative Artificial Intelligence’s Utility for Accessibility. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–8. https://doi.org/10.1145/3597638.3614548
  37. “It’s almost like they’re trying to hide it”: How User-Provided Image Descriptions Have Failed to Make Twitter Accessible. In The World Wide Web Conference. 549–559. https://doi.org/10.1145/3308558.3313605
  38. Making GIFs Accessible. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–10. https://doi.org/10.1145/3373625.3417027
  39. Making memes accessible. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 367–376. https://doi.org/10.1145/3308561.3353792
  40. Twitter A11y: A browser extension to make Twitter images accessible. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–12. https://doi.org/10.1145/3313831.3376728
  41. Cristos Goodrow. 2017. You know what’s cool? A billion hours. https://blog.youtube/news-and-events/you-know-whats-cool-billion-hours/
  42. Google. 2019. YouTube-8M. https://research.google.com/youtube8m/
  43. Adaptive Subtitles: Preferences and Trade-Offs in Real-Time Media Adaption. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–11. https://doi.org/10.1145/3411764.3445509
  44. W3C Accessibility Guidelines Working Group. 2022. Using alt attributes on img elements. https://www.w3.org/WAI/WCAG21/Techniques/html/H37.html
  45. Danna Gurari and Kristen Grauman. 2017. Crowdverge: Predicting if people will agree on the answer to a visual question. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 3511–3522. http://doi.org/10.1145/3025453.3025781
  46. Vizwiz-priv: A dataset for recognizing the presence and purpose of private visual information in images taken by blind people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 939–948. https://doi.org/10.1109/CVPR.2019.00103
  47. AutoAD II: The sequel-who, when, and what in movie audio description. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13645–13655. https://doi.org/10.1109/ICCV51070.2023.01255
  48. AutoAD: Movie Description in Context. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18930–18940. https://doi.org/10.48550/arXiv.2303.16899
  49. Computer vision and conflicting values: Describing people with automated alt text. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 543–554. https://doi.org/10.1145/3461702.3462620
  50. A Chat (GPT) about the future of scientific publishing. Brain Behav Immun 110 (2023), 152–154. https://doi.org/10.1016/j.bbi.2023.02.022
  51. Animations at Your Fingertips: Using a Refreshable Tactile Display to Convey Motion Graphics for People who are Blind or have Low Vision. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility. 1–16. https://doi.org/10.1145/3517428.3544797
  52. Shelley Hughes. 2024. York academics collaborate on soundtrack of BAFTA-nominated film. https://www.york.ac.uk/news-and-events/news/2024/research/academics-bafta-film/
  53. Towards accessible conversations in a mobile context for people who are deaf and hard of hearing. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 81–92. https://doi.org/10.1145/3234695.3236362
  54. Towards Accessible Sports Broadcasts for Blind and Low-Vision Viewers. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–7. https://doi.org/10.1145/3544549.3585610
  55. Front Row: Automatically Generating Immersive Audio Representations of Tennis Broadcasts for Blind Viewers. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3586183.3606830
  56. Maham Javaid. 2023. How oral storytelling helped a blind man see the Montgomery brawl. https://www.washingtonpost.com/nation/2023/08/12/montgomery-riverfront-brawl-blind-tiktok-andy-slater/
  57. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38. https://doi.org/10.1145/3571730
  58. Lucy Jiang and Richard Ladner. 2022. Co-Designing Systems to Support Blind and Low Vision Audio Description Writers. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility. 1–3. https://doi.org/10.1145/3517428.3550394
  59. Beyond Audio Description: Exploring 360° Video Accessibility with Blind and Low Vision Users Through Collaborative Creation. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–17. https://doi.org/10.1145/3597638.3608381
  60. “So What? What’s That to Do With Me?” Expectations of People With Visual Impairments for Image Descriptions in Their Personal Photo Activities. In Designing Interactive Systems Conference. 1893–1906. https://doi.org/10.1145/3532106.3533522
  61. Daniel Killough and Amy Pavel. 2023. Exploring Community-Driven Descriptions for Making Livestreams Accessible. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–13. https://doi.org/10.1145/3597638.3608425
  62. Georgina Kleege. 2023. Fiction Podcasts Model Description by Design. In Crip Authorship. New York University Press, 318–325. https://doi.org/10.18574/nyu/9781479819386.003.0033
  63. Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics. arXiv preprint arXiv:2205.10646 (2022). https://doi.org/10.48550/arXiv.2205.10646
  64. Enhancing user engagement in immersive games through multisensory cues. In 2015 7th International Conference on Games and Virtual Worlds for Serious Applications (VS-Games). IEEE, 1–8. https://doi.org/10.1109/VS-GAMES.2015.7295773
  65. ImageExplorer: Multi-Layered Touch Exploration to Encourage Skepticism Towards Imperfect AI-Generated Image Captions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–15. https://doi.org/10.1145/3491102.3501966
  66. What Makes Videos Accessible to Blind and Visually Impaired People?. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14. https://doi.org/10.1145/3411764.3445233
  67. Seeing films through sound: Sound design, spatial audio, and accessibility for visually impaired audiences. British Journal of Visual Impairment 40, 2 (2022), 117–144. https://doi.org/10.1177/0264619620935935
  68. Lucy.q. 2023. i ¡3 my fans!! #paris #sacrecoeur #dailyvlogs. https://www.instagram.com/p/CuuHpRmvVqN/
  69. Wesee: Digital Cultural Heritage Interpretation for Blind and Low Vision People. In IFIP Conference on Human-Computer Interaction. Springer, 123–131. https://doi.org/10.1007/978-3-031-42280-5_8
  70. María Jesús Machuca and Anna Matamala. 2022. Neutral voices in audio descriptions: What does it mean? Babel 68, 5 (2022), 668–696. https://doi.org/10.1075/babel.00287.mac
  71. Designing Tools for High-Quality Alt Text Authoring. In The 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–14. https://doi.org/10.1145/3441852.3471207
  72. Understanding blind people’s experiences with computer-generated captions of social media images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 5988–5999. https://doi.org/10.1145/3025453.3025814
  73. 3Play Media. 2022. The Ultimate Guide to Audio Description. https://www.3playmedia.com/learn/popular-topics/audio-description/
  74. Lisa Montenegro. 2022. In 2022, Video Is Where We All Need To Be. https://www.forbes.com/sites/forbesagencycouncil/2022/01/28/in-2022-video-is-where-we-all-need-to-be/
  75. Guiding novice web workers in making image descriptions using templates. ACM Transactions on Accessible Computing (TACCESS) 7, 4 (2015), 1–21. https://doi.org/10.1145/2764916
  76. Rich representations of visual content for screen reader users. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–11. https://doi.org/10.1145/3173574.3173633
  77. With most of it being pictures now, I rarely use it Understanding Twitter’s Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI conference on human factors in computing systems. 5506–5516. https://doi.org/10.1145/2858036.2858116
  78. Accessibility of Profile Pictures: Alt Text and Beyond to Express Identity Online. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–13. https://doi.org/10.1145/3544548.3580710
  79. Annika Muehlbradt and Shaun K Kane. 2022. What’s in an ALT Tag? Exploring Caption Content Priorities through Collaborative Captioning. ACM Transactions on Accessible Computing (TACCESS) 15, 1 (2022), 1–32. https://doi.org/10.1145/3507659
  80. Mukhriddin Mukhiddinov and Soon-Young Kim. 2021. A systematic literature review on the automatic creation of tactile graphics for the blind and visually impaired. Processes 9, 10, 1726. https://doi.org/10.3390/pr9101726
  81. ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration Systems for Blind and Low Vision Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17. https://doi.org/10.1145/3544548.3581302
  82. Rosiana Natalie. 2022. Cost-effective and Collaborative Methods to Author Video’s Scene Description for Blind People. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–5. https://doi.org/10.1145/3491101.3503814
  83. Viscene: A collaborative authoring tool for scene descriptions in videos. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–4. https://doi.org/10.1145/3373625.3418030
  84. The efficacy of collaborative authoring of video scene descriptions. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–15. https://doi.org/10.1145/3441852.3471201
  85. Netflix. 2020. Our Planet — Forests — FULL EPISODE — Netflix. https://www.youtube.com/watch?v=JkaxUblCGz0&t=83s
  86. Netflix Inc. 2023. Audio Description Style Guide v2.5. https://partnerhelp.netflixstudios.com/hc/en-us/articles/215510667-Audio-Description-Style-Guide-v2-5
  87. Accessibility Research in Digital Audiovisual Media: What Has Been Achieved and What Should Be Done Next? (2023), 94–114. https://doi.org/10.1145/3573381.3596159
  88. WSFA News. 2023. Full Video: Viewer records as Montgomery riverfront brawl begins. https://www.wsfa.com/video/2023/08/07/full-video-viewer-records-montgomery-riverfront-brawl-begins/
  89. NotWildlin. 2023. @Andy Slater i hope this kinda helps. https://www.tiktok.com/@notwildlin/video/7265363866069093678
  90. OpenAI. 2023a. GPT-4. https://openai.com/product/gpt-4
  91. OpenAI. 2023b. GPT-4V(ision) System Card. (2023). https://cdn.openai.com/papers/GPTV_System_Card.pdf
  92. Tactile line drawings for improved shape understanding in blind and visually impaired users. ACM Transactions on Graphics (TOG) 39, 4, 89–1. https://doi.org/10.1145/3386569.3392388
  93. Rescribe: Authoring and automatically editing audio descriptions. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 747–759. https://doi.org/10.1145/3379337.3415864
  94. Describing images on the web: a survey of current practice and prospects for the future. Proceedings of Human Computer Interaction International (HCII) 71, 2 (2005). https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=ad28153c8ee2a3fa6fc90075b8643ce51eb6d59f
  95. Wong Fu Productions. 2015. How Old Is She?! https://www.youtube.com/watch?v=91lYBbBkftA
  96. Eyes-free art: Exploring proxemic audio interfaces for blind and low vision art engagement. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 1–21. https://doi.org/10.1145/3130958
  97. Watching movies on Netflix: investigating the effect of screen size on viewer immersion. In Proceedings of the 18th international conference on human-computer interaction with mobile devices and services adjunct. 714–721. https://doi.org/10.1145/2957265.2961843
  98. Pablo Romero-Fresco and Louise Fryer. 2013. Could audio-described films benefit from audio introductions? An audience response study. Journal of Visual Impairment & Blindness 107, 4 (2013), 287–295. https://doi.org/10.1177/0145482X1310700405
  99. Ensuring accessibility: Individual video playback enhancements for low vision users. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–4. https://doi.org/10.1145/3373625.3417997
  100. Toward scalable social alt text: Conversational crowdsourcing as a tool for refining vision-to-language technology for the blind. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 5. 147–156. https://doi.org/10.1609/hcomp.v5i1.13301
  101. Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing.. In IJCAI. 5349–5353. https://www.ijcai.org/Proceedings/2018/0751.pdf
  102. Marco Salsiccia. 2023. SVG Artwork. https://marconius.com/svg/
  103. VoxLens: Making Online Data Visualizations Accessible with an Interactive JavaScript Plug-In. In CHI Conference on Human Factors in Computing Systems. 1–19. https://doi.org/10.1145/3491102.3517431
  104. Engaging image captioning via personality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12516–12526. https://doi.org/10.1109/CVPR.2019.01280
  105. I Hope This Is Helpful Understanding Crowdworkers’ Challenges and Motivations for an Image Description Task. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–26. https://doi.org/10.1145/3415176
  106. Supporting accessible data visualization through audio data narratives. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19. https://doi.org/10.1145/3491102.3517678
  107. Andy Slater. 2023. #Inclusivity #Alabama #ancientanbiens. https://www.tiktok.com/@thisisandyslater/video/7264770242721697070
  108. Joel Snyder. 2005. Audio description: The visual made verbal. In International Congress Series, Vol. 1282. Elsevier, 935–939. https://doi.org/10.1016/j.ics.2005.05.215
  109. Defining problems of practices to advance inclusive tactile media consumption and production. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 329–341. https://doi.org/10.1145/3308561.3353778
  110. The Potential of a Visual Dialogue Agent In a Tandem Automated Audio Description System for Videos. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–16. https://doi.org/10.1145/3597638.3608402
  111. Person, Shoes, Tree. Is the Person Naked? What People with Vision Impairments Want in Image Descriptions. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–13. https://doi.org/10.1145/3313831.3376404
  112. “Dump it, Destroy it, Send it to Data Heaven”: Blind People’s Expectations for Visual Privacy in Visual Assistance Technologies. In Proceedings of the 20th International Web for All Conference. 134–147. https://doi.org/10.1145/3587281.3587296
  113. Privacy concerns for visual assistance technologies. ACM Transactions on Accessible Computing (TACCESS) 15, 2, 1–43. https://doi.org/10.1145/3517384
  114. Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who are Blind or Have Low Vision. In The 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–15. https://doi.org/10.1145/3441852.3471233
  115. Amara Tariq and Hassan Foroosh. 2015. Feature-independent context estimation for automatic image annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1958–1965. https://doi.org/10.1109/CVPR.2015.7298806
  116. Google Gemini Team. 2023. Gemini: A Family of Highly Capable Multimodal Models. (2023). https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf
  117. The American Council of the Blind. 2003. Guidelines for Audio Describers. https://adp.acb.org/guidelines.html
  118. The American Council of the Blind. 2023a. All About Audio Description. https://adp.acb.org/ad.html
  119. The American Council of the Blind. 2023b. The Audio Description Project. https://adp.acb.org/
  120. Twenty Thousand Hertz. 2020. Tudum! It’s Netflix. https://www.20k.org/episodes/netflix
  121. Horatio audio-describes Shakespeare’s Hamlet: Blind and low-vision theatre-goers evaluate an unconventional audio description strategy. British Journal of Visual Impairment 28, 2 (2010), 139–156. https://doi.org/10.1177/0264619609359753
  122. John-Patrick Udo and Deborah I Fels. 2010. Enhancing the entertainment experience of blind and low-vision theatregoers through touch tours. Disability & Society 25, 2 (2010), 231–240. https://doi.org/10.1080/09687590903537497
  123. Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems. 319–326. https://doi.org/10.1145/985692.985733
  124. How blind people interact with visual content on social networking services. In Proceedings of the 19th acm conference on computer-supported cooperative work & social computing. 1584–1595. https://doi.org/10.1145/2818048.2820013
  125. Agnieszka Walczak and Louise Fryer. 2017. Creative description: The impact of audio description style on presence in visually impaired audiences. British Journal of Visual Impairment 35, 1 (2017), 6–17. https://doi.org/10.1177/0264619616661603
  126. Toward automatic audio description generation for accessible videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–12. https://doi.org/10.1145/3411764.3445347
  127. Kara Warner. 2023. Discover the Art of Audio Description with ‘All the Light We Cannot See’. https://www.netflix.com/tudum/articles/all-the-light-we-cannot-see-aria-mia-lorbeti-audio-introduction
  128. LiveDescribe web redefining what and how entertainment content can be accessible to blind and low vision audiences. In Computers Helping People with Special Needs: 15th International Conference, ICCHP 2016, Linz, Austria, July 13-15, 2016, Proceedings, Part I 15. Springer, 224–230. https://doi.org/10.1177/0145482X1210600304
  129. Disability, bias, and AI. AI Now Institute 8 (2019). https://ainowinstitute.org/wp-content/uploads/2023/04/disabilitybiasai-2019.pdf
  130. Automatic alt-text: Computer-generated image descriptions for blind users on a social network service. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1180–1192. https://doi.org/10.1145/2998181.2998364
  131. Describing videos by exploiting temporal structure. In Proceedings of the IEEE international conference on computer vision. 4507–4515. https://doi.org/10.1109/ICCV.2015.512
  132. YouDescribe. 2023. YouDescribe - Audio Description for YouTube Videos. https://youdescribe.org/
  133. YouTube. 2016. The latest YouTube stats on when, where, and what people watch. https://www.thinkwithgoogle.com/data-collections/youtube-stats-video-consumption-trends/
  134. Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users. In Proceedings of the 2020 ACM Designing Interactive Systems Conference. 47–60. http://doi.org/10.1145/3357236.3395433
  135. MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning. arXiv preprint arXiv:2311.17435 (2023). https://arxiv.org/pdf/2311.17435.pdf
  136. Exploring Interactive Sound Design for Auditory Websites. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–16. https://doi.org/10.1145/3491102.3517695
  137. {{\{{ImageAlly}}\}}: A {{\{{Human-AI}}\}} Hybrid Approach to Support Blind People in Detecting and Redacting Private Image Content. In Nineteenth Symposium on Usable Privacy and Security (SOUPS 2023). 417–436. https://www.usenix.org/conference/soups2023/presentation/zhang
  138. Towards automatic learning of procedures from web instructional videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. https://doi.org/10.1609/aaai.v32i1.12342
Citations (5)

Summary

  • The paper reveals that BLV users’ video accessibility needs vary significantly with viewing context.
  • It uses surveys and interviews to critique one-size-fits-all audio description methods across media types.
  • The research introduces a six-dimensional design framework and discusses generative AI's role and limitations.

Understanding Video Accessibility Preferences for Blind and Low Vision Users

The paper “It's Kind of Context Dependent”: Understanding Blind and Low Vision People's Video Accessibility Preferences Across Viewing Scenarios” presents a comprehensive examination of how blind and low vision (BLV) individuals interact with and require modifications for video accessibility. This paper recognizes the limitations of traditional audio description (AD) approaches and advocates for more nuanced, scenario-specific methods to accommodate the diverse needs of BLV audiences across various video consumption contexts.

Overview of Research Objectives and Methodology

The research aims to elucidate BLV users' video accessibility preferences in diverse scenarios, addressing the inadequacies of the "one-size-fits-all" AD model traditionally used across media. The authors conducted a formative survey followed by semi-structured interviews with BLV participants to gather qualitative data on their experiences and preferences. The paper identifies significant gaps in the accessibility of varied video types such as educational, how-to, and entertainment content across different platforms like YouTube, Netflix, and social networking sites.

Key Findings and Scenario-Specific Preferences

The paper reveals that BLV users' accessibility needs and preferences vary greatly depending on the video type and context of viewing. For example, in educational settings, users prioritize detailed descriptions of visual aids and settings to facilitate comprehension. In contrast, entertainment such as music videos and dramas requires emphasis on characters, settings, and visual effects for an immersive experience. For rapidly evolving short-form content, synchronization of accessible content, such as extended descriptions or prologues, was particularly sought after.

Emerging Design Space for Video Accessibility

The researchers propose a novel six-dimensional design space for video accessibilities, such as:

  1. Level of Detail: This ranges from minimal to extreme, depending on user preference for verbose descriptions.
  2. Alteration of Video Time: Varied description durations can alter the pacing of video content for optimal user comprehension.
  3. Level of Augmentation: This involves the degree to which videos are enhanced post-production with accessibility measures.
  4. Modality of Presentation: Beyond spoken descriptions, modalities include visual enhancements, Braille, tactile models, and audio cues.
  5. Synchronicity of Accessible Content: Timing of access features, potentially before, during, or after video consumption, is essential.
  6. Tone and Style of Approach: This varies based on scenario, calling for narrative styles that align with user goals and content type.

Consideration of Generative AI in Video Accessibility

The paper astutely considers the role of generative AI technologies in expanding video accessibility options. By automating certain aspects of content description and enhancement, these technologies hold potential for broadening the scope of personalized accessibility accommodations. However, the authors caution against unregulated AI deployments due to potential biases and ethical considerations, underscoring the need for meticulous dataset curation and robust quality evaluation.

Implications for Future Developments

This research carries significant implications for the future of video accessibility. There is a compelling case for incorporating user-centered design into the creation of accessibility features, acknowledging not only BLV individuals' varied preferences across scenarios but also the rapid evolution of content platforms and viewing technologies. The paper highlights the necessity of adopting innovative approaches, such as integrating tactile and auditory feedback, to satisfy the diverse needs of BLV users in an increasingly digital landscape.

In conclusion, the paper provides a well-founded argument for a shift from uniform AD approaches to more flexible, context-aware video accessibility strategies. The proposed design space offers valuable frameworks for researchers and practitioners aiming to develop more inclusive and personalized media experiences for BLV audiences. As technology further interlaces with media consumption, these insights will become increasingly critical in guiding both practical applications and academic pursuits in human-centered computing and accessibility research.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com