"It's Kind of Context Dependent": Understanding Blind and Low Vision People's Video Accessibility Preferences Across Viewing Scenarios (2403.10792v1)
Abstract: While audio description (AD) is the standard approach for making videos accessible to blind and low vision (BLV) people, existing AD guidelines do not consider BLV users' varied preferences across viewing scenarios. These scenarios range from how-to videos on YouTube, where users seek to learn new skills, to historical dramas on Netflix, where a user's goal is entertainment. Additionally, the increase in video watching on mobile devices provides an opportunity to integrate nonverbal output modalities (e.g., audio cues, tactile elements, and visual enhancements). Through a formative survey and 15 semi-structured interviews, we identified BLV people's video accessibility preferences across diverse scenarios. For example, participants valued action and equipment details for how-to videos, tactile graphics for learning scenarios, and 3D models for fantastical content. We define a six-dimensional video accessibility design space to guide future innovation and discuss how to move from "one-size-fits-all" paradigms to scenario-specific approaches.
- Hussam Alkaissi and Samy I McFarlane. 2023. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 15, 2 (2023). https://doi.org/10.7759/cureus.35179
- An independent and interactive museum experience for blind people. In Proceedings of the 16th International Web for All Conference. 1–9. https://doi.org/10.1145/3315002.3317557
- Audio Description Coalition. 2009. Standards for Audio Description and Code of Professional Conduct for Describers. https://www.perkins.org/wp-content/uploads/elearning-media/adc_standards.pdf
- “It’s Complicated”: Negotiating Accessibility and (Mis) Representation in Image Descriptions of Race, Gender, and Disability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–19. https://doi.org/10.1145/3411764.3445498
- Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 333–342. https://doi.org/10.1145/1866029.1866080
- Automated Video Description for Blind and Low Vision Users. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7. https://doi.org/10.1145/3411763.3451810
- Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101. https://doi.org/10.1191/1478088706qp063oa
- Craig Brown and Amy Hurst. 2012. VizTouch: automatically generated tactile visualizations of coordinate spaces. In Proceedings of the Sixth International Conference on Tangible, Embedded and Embodied Interaction. 131–138. https://doi.org/10.1145/2148131.2148160
- Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities. In Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility. 135–142. https://doi.org/10.1145/2384916.2384941
- Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1209–1218. https://doi.org/10.1109/CVPR.2018.00132
- CineAD: a system for automated audio description script generation for the visually impaired. Universal Access in the Information Society 19 (2020), 99–111. https://doi.org/10.1007/s10209-018-0634-4
- Sensory design in games: Beyond visual-based experiences. In Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Communication, Organization and Work: 11th International Conference, DHM 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part II 22. Springer, 322–333. https://doi.org/10.1007/978-3-030-49907-5_23
- John M Carroll. 2003. Scenario-based design. MIT Press. https://arl.human.cornell.edu/linked%20docs/Scenario-Based%20Design%20John%20Carrol.pdf
- Accessible visual artworks for blind and visually impaired people: comparing a multimodal approach with tactile graphics. Electronics 10, 3 (2021), 297. https://doi.org/10.3390/electronics10030297
- An interactive multimodal guide to improve art accessibility for blind people. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 346–348. https://doi.org/10.1145/3234695.3241033
- ClassInFocus: enabling improved visual attention strategies for deaf and hard of hearing students. In Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility. 67–74. https://doi.org/10.1145/1639642.1639656
- Diagram Center. 2019. Image Description Guidelines. http://diagramcenter.org/table-of-contents-2.html
- OmniScribe: Authoring Immersive Audio Descriptions for 360 Videos. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–14. https://doi.org/10.1145/3526113.3545613
- Agnieszka Chmiel and Iwona Mazur. 2016. Researching preferences of audio description users—Limitations and solutions. Across Languages and Cultures 17, 2 (2016), 271–288. https://doi.org/10.1556/084.2016.17.2.7
- Agnieszka Chmiel and Iwona Mazur. 2022. A homogenous or heterogeneous audience? Audio description preferences of persons with congenital blindness, non-congenital blindness and low vision. Perspectives 30, 3 (2022), 552–567. https://doi.org/10.1080/0907676X.2021.1913198
- Attend to you: Personalized image captioning with context sequence memory networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 895–903. https://doi.org/10.1109/CVPR.2017.681
- Social Audio Description Collective. 2021. Spider Man No Way Home Trailer with Expanded Audio Description. https://www.youtube.com/watch?v=5CuHs-yVMLw
- Milagros Costabel. 2023. I’m Totally Blind. Artificial Intelligence Is Helping Me Rediscover the World. https://slate.com/technology/2023/10/ai-image-tools-blind-low-vision.html
- Google DeepMind. 2023. Gemini. https://deepmind.google/technologies/gemini/#build-with-gemini
- Dianna Delling. 2024. This ‘pictureless’ film is visionary cinema for those who can’t see. https://www.mastercard.com/news/perspectives/2024/australia-touch-film/
- Audio Description Tip Sheet. https://dcmp.org/learn/227-audio-description-tip-sheet
- Description Key. https://dcmp.org/learn/descriptionkey
- Description Key - How to Describe. https://dcmp.org/learn/617-description-key---how-to-describe
- Josh Dzieza. 2022. The Great Fiction of AI. https://www.theverge.com/c/23194235/ai-fiction-writing-amazon-kindle-sudowrite-jasper
- Odd Job Jack described: a universal design approach to described video. Universal Access in the Information society 5 (2006), 73–81. https://doi.org/10.1007/s10209-006-0025-0
- Anna Fernández-Torné and Anna Matamala. 2015. Text-to-speech vs. human voiced audio descriptions: a reception study in films dubbed into Catalan. The Journal of Specialised Translation 24 (2015), 61–88. https://core.ac.uk/download/pdf/78531939.pdf
- John C Flanagan. 1954. The critical incident technique. Psychological bulletin 51, 4 (1954), 327. https://doi.org/10.1037/h0061470
- Chancey Fleet. 2017. Announcing Dimensions: Community Tools for Creating Tactile Graphics & Objects. https://www.nypl.org/blog/2017/10/18/dimensions-tactile-graphics-objects
- Louise Fryer. 2016. An introduction to audio description: A practical guide. Routledge. https://doi.org/10.4324/9781315707228
- Stylenet: Generating attractive visual captions with styles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3137–3146. https://doi.org/10.1109/CVPR.2017.108
- An Autoethnographic Case Study of Generative Artificial Intelligence’s Utility for Accessibility. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–8. https://doi.org/10.1145/3597638.3614548
- “It’s almost like they’re trying to hide it”: How User-Provided Image Descriptions Have Failed to Make Twitter Accessible. In The World Wide Web Conference. 549–559. https://doi.org/10.1145/3308558.3313605
- Making GIFs Accessible. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–10. https://doi.org/10.1145/3373625.3417027
- Making memes accessible. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 367–376. https://doi.org/10.1145/3308561.3353792
- Twitter A11y: A browser extension to make Twitter images accessible. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–12. https://doi.org/10.1145/3313831.3376728
- Cristos Goodrow. 2017. You know what’s cool? A billion hours. https://blog.youtube/news-and-events/you-know-whats-cool-billion-hours/
- Google. 2019. YouTube-8M. https://research.google.com/youtube8m/
- Adaptive Subtitles: Preferences and Trade-Offs in Real-Time Media Adaption. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–11. https://doi.org/10.1145/3411764.3445509
- W3C Accessibility Guidelines Working Group. 2022. Using alt attributes on img elements. https://www.w3.org/WAI/WCAG21/Techniques/html/H37.html
- Danna Gurari and Kristen Grauman. 2017. Crowdverge: Predicting if people will agree on the answer to a visual question. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 3511–3522. http://doi.org/10.1145/3025453.3025781
- Vizwiz-priv: A dataset for recognizing the presence and purpose of private visual information in images taken by blind people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 939–948. https://doi.org/10.1109/CVPR.2019.00103
- AutoAD II: The sequel-who, when, and what in movie audio description. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13645–13655. https://doi.org/10.1109/ICCV51070.2023.01255
- AutoAD: Movie Description in Context. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18930–18940. https://doi.org/10.48550/arXiv.2303.16899
- Computer vision and conflicting values: Describing people with automated alt text. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 543–554. https://doi.org/10.1145/3461702.3462620
- A Chat (GPT) about the future of scientific publishing. Brain Behav Immun 110 (2023), 152–154. https://doi.org/10.1016/j.bbi.2023.02.022
- Animations at Your Fingertips: Using a Refreshable Tactile Display to Convey Motion Graphics for People who are Blind or have Low Vision. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility. 1–16. https://doi.org/10.1145/3517428.3544797
- Shelley Hughes. 2024. York academics collaborate on soundtrack of BAFTA-nominated film. https://www.york.ac.uk/news-and-events/news/2024/research/academics-bafta-film/
- Towards accessible conversations in a mobile context for people who are deaf and hard of hearing. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 81–92. https://doi.org/10.1145/3234695.3236362
- Towards Accessible Sports Broadcasts for Blind and Low-Vision Viewers. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–7. https://doi.org/10.1145/3544549.3585610
- Front Row: Automatically Generating Immersive Audio Representations of Tennis Broadcasts for Blind Viewers. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3586183.3606830
- Maham Javaid. 2023. How oral storytelling helped a blind man see the Montgomery brawl. https://www.washingtonpost.com/nation/2023/08/12/montgomery-riverfront-brawl-blind-tiktok-andy-slater/
- Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38. https://doi.org/10.1145/3571730
- Lucy Jiang and Richard Ladner. 2022. Co-Designing Systems to Support Blind and Low Vision Audio Description Writers. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility. 1–3. https://doi.org/10.1145/3517428.3550394
- Beyond Audio Description: Exploring 360° Video Accessibility with Blind and Low Vision Users Through Collaborative Creation. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–17. https://doi.org/10.1145/3597638.3608381
- “So What? What’s That to Do With Me?” Expectations of People With Visual Impairments for Image Descriptions in Their Personal Photo Activities. In Designing Interactive Systems Conference. 1893–1906. https://doi.org/10.1145/3532106.3533522
- Daniel Killough and Amy Pavel. 2023. Exploring Community-Driven Descriptions for Making Livestreams Accessible. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–13. https://doi.org/10.1145/3597638.3608425
- Georgina Kleege. 2023. Fiction Podcasts Model Description by Design. In Crip Authorship. New York University Press, 318–325. https://doi.org/10.18574/nyu/9781479819386.003.0033
- Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics. arXiv preprint arXiv:2205.10646 (2022). https://doi.org/10.48550/arXiv.2205.10646
- Enhancing user engagement in immersive games through multisensory cues. In 2015 7th International Conference on Games and Virtual Worlds for Serious Applications (VS-Games). IEEE, 1–8. https://doi.org/10.1109/VS-GAMES.2015.7295773
- ImageExplorer: Multi-Layered Touch Exploration to Encourage Skepticism Towards Imperfect AI-Generated Image Captions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–15. https://doi.org/10.1145/3491102.3501966
- What Makes Videos Accessible to Blind and Visually Impaired People?. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14. https://doi.org/10.1145/3411764.3445233
- Seeing films through sound: Sound design, spatial audio, and accessibility for visually impaired audiences. British Journal of Visual Impairment 40, 2 (2022), 117–144. https://doi.org/10.1177/0264619620935935
- Lucy.q. 2023. i ¡3 my fans!! #paris #sacrecoeur #dailyvlogs. https://www.instagram.com/p/CuuHpRmvVqN/
- Wesee: Digital Cultural Heritage Interpretation for Blind and Low Vision People. In IFIP Conference on Human-Computer Interaction. Springer, 123–131. https://doi.org/10.1007/978-3-031-42280-5_8
- María Jesús Machuca and Anna Matamala. 2022. Neutral voices in audio descriptions: What does it mean? Babel 68, 5 (2022), 668–696. https://doi.org/10.1075/babel.00287.mac
- Designing Tools for High-Quality Alt Text Authoring. In The 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–14. https://doi.org/10.1145/3441852.3471207
- Understanding blind people’s experiences with computer-generated captions of social media images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 5988–5999. https://doi.org/10.1145/3025453.3025814
- 3Play Media. 2022. The Ultimate Guide to Audio Description. https://www.3playmedia.com/learn/popular-topics/audio-description/
- Lisa Montenegro. 2022. In 2022, Video Is Where We All Need To Be. https://www.forbes.com/sites/forbesagencycouncil/2022/01/28/in-2022-video-is-where-we-all-need-to-be/
- Guiding novice web workers in making image descriptions using templates. ACM Transactions on Accessible Computing (TACCESS) 7, 4 (2015), 1–21. https://doi.org/10.1145/2764916
- Rich representations of visual content for screen reader users. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–11. https://doi.org/10.1145/3173574.3173633
- With most of it being pictures now, I rarely use it Understanding Twitter’s Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI conference on human factors in computing systems. 5506–5516. https://doi.org/10.1145/2858036.2858116
- Accessibility of Profile Pictures: Alt Text and Beyond to Express Identity Online. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–13. https://doi.org/10.1145/3544548.3580710
- Annika Muehlbradt and Shaun K Kane. 2022. What’s in an ALT Tag? Exploring Caption Content Priorities through Collaborative Captioning. ACM Transactions on Accessible Computing (TACCESS) 15, 1 (2022), 1–32. https://doi.org/10.1145/3507659
- Mukhriddin Mukhiddinov and Soon-Young Kim. 2021. A systematic literature review on the automatic creation of tactile graphics for the blind and visually impaired. Processes 9, 10, 1726. https://doi.org/10.3390/pr9101726
- ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration Systems for Blind and Low Vision Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17. https://doi.org/10.1145/3544548.3581302
- Rosiana Natalie. 2022. Cost-effective and Collaborative Methods to Author Video’s Scene Description for Blind People. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–5. https://doi.org/10.1145/3491101.3503814
- Viscene: A collaborative authoring tool for scene descriptions in videos. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–4. https://doi.org/10.1145/3373625.3418030
- The efficacy of collaborative authoring of video scene descriptions. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–15. https://doi.org/10.1145/3441852.3471201
- Netflix. 2020. Our Planet — Forests — FULL EPISODE — Netflix. https://www.youtube.com/watch?v=JkaxUblCGz0&t=83s
- Netflix Inc. 2023. Audio Description Style Guide v2.5. https://partnerhelp.netflixstudios.com/hc/en-us/articles/215510667-Audio-Description-Style-Guide-v2-5
- Accessibility Research in Digital Audiovisual Media: What Has Been Achieved and What Should Be Done Next? (2023), 94–114. https://doi.org/10.1145/3573381.3596159
- WSFA News. 2023. Full Video: Viewer records as Montgomery riverfront brawl begins. https://www.wsfa.com/video/2023/08/07/full-video-viewer-records-montgomery-riverfront-brawl-begins/
- NotWildlin. 2023. @Andy Slater i hope this kinda helps. https://www.tiktok.com/@notwildlin/video/7265363866069093678
- OpenAI. 2023a. GPT-4. https://openai.com/product/gpt-4
- OpenAI. 2023b. GPT-4V(ision) System Card. (2023). https://cdn.openai.com/papers/GPTV_System_Card.pdf
- Tactile line drawings for improved shape understanding in blind and visually impaired users. ACM Transactions on Graphics (TOG) 39, 4, 89–1. https://doi.org/10.1145/3386569.3392388
- Rescribe: Authoring and automatically editing audio descriptions. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 747–759. https://doi.org/10.1145/3379337.3415864
- Describing images on the web: a survey of current practice and prospects for the future. Proceedings of Human Computer Interaction International (HCII) 71, 2 (2005). https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=ad28153c8ee2a3fa6fc90075b8643ce51eb6d59f
- Wong Fu Productions. 2015. How Old Is She?! https://www.youtube.com/watch?v=91lYBbBkftA
- Eyes-free art: Exploring proxemic audio interfaces for blind and low vision art engagement. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 1–21. https://doi.org/10.1145/3130958
- Watching movies on Netflix: investigating the effect of screen size on viewer immersion. In Proceedings of the 18th international conference on human-computer interaction with mobile devices and services adjunct. 714–721. https://doi.org/10.1145/2957265.2961843
- Pablo Romero-Fresco and Louise Fryer. 2013. Could audio-described films benefit from audio introductions? An audience response study. Journal of Visual Impairment & Blindness 107, 4 (2013), 287–295. https://doi.org/10.1177/0145482X1310700405
- Ensuring accessibility: Individual video playback enhancements for low vision users. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–4. https://doi.org/10.1145/3373625.3417997
- Toward scalable social alt text: Conversational crowdsourcing as a tool for refining vision-to-language technology for the blind. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 5. 147–156. https://doi.org/10.1609/hcomp.v5i1.13301
- Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing.. In IJCAI. 5349–5353. https://www.ijcai.org/Proceedings/2018/0751.pdf
- Marco Salsiccia. 2023. SVG Artwork. https://marconius.com/svg/
- VoxLens: Making Online Data Visualizations Accessible with an Interactive JavaScript Plug-In. In CHI Conference on Human Factors in Computing Systems. 1–19. https://doi.org/10.1145/3491102.3517431
- Engaging image captioning via personality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12516–12526. https://doi.org/10.1109/CVPR.2019.01280
- I Hope This Is Helpful Understanding Crowdworkers’ Challenges and Motivations for an Image Description Task. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–26. https://doi.org/10.1145/3415176
- Supporting accessible data visualization through audio data narratives. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19. https://doi.org/10.1145/3491102.3517678
- Andy Slater. 2023. #Inclusivity #Alabama #ancientanbiens. https://www.tiktok.com/@thisisandyslater/video/7264770242721697070
- Joel Snyder. 2005. Audio description: The visual made verbal. In International Congress Series, Vol. 1282. Elsevier, 935–939. https://doi.org/10.1016/j.ics.2005.05.215
- Defining problems of practices to advance inclusive tactile media consumption and production. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 329–341. https://doi.org/10.1145/3308561.3353778
- The Potential of a Visual Dialogue Agent In a Tandem Automated Audio Description System for Videos. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–16. https://doi.org/10.1145/3597638.3608402
- Person, Shoes, Tree. Is the Person Naked? What People with Vision Impairments Want in Image Descriptions. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–13. https://doi.org/10.1145/3313831.3376404
- “Dump it, Destroy it, Send it to Data Heaven”: Blind People’s Expectations for Visual Privacy in Visual Assistance Technologies. In Proceedings of the 20th International Web for All Conference. 134–147. https://doi.org/10.1145/3587281.3587296
- Privacy concerns for visual assistance technologies. ACM Transactions on Accessible Computing (TACCESS) 15, 2, 1–43. https://doi.org/10.1145/3517384
- Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who are Blind or Have Low Vision. In The 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–15. https://doi.org/10.1145/3441852.3471233
- Amara Tariq and Hassan Foroosh. 2015. Feature-independent context estimation for automatic image annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1958–1965. https://doi.org/10.1109/CVPR.2015.7298806
- Google Gemini Team. 2023. Gemini: A Family of Highly Capable Multimodal Models. (2023). https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf
- The American Council of the Blind. 2003. Guidelines for Audio Describers. https://adp.acb.org/guidelines.html
- The American Council of the Blind. 2023a. All About Audio Description. https://adp.acb.org/ad.html
- The American Council of the Blind. 2023b. The Audio Description Project. https://adp.acb.org/
- Twenty Thousand Hertz. 2020. Tudum! It’s Netflix. https://www.20k.org/episodes/netflix
- Horatio audio-describes Shakespeare’s Hamlet: Blind and low-vision theatre-goers evaluate an unconventional audio description strategy. British Journal of Visual Impairment 28, 2 (2010), 139–156. https://doi.org/10.1177/0264619609359753
- John-Patrick Udo and Deborah I Fels. 2010. Enhancing the entertainment experience of blind and low-vision theatregoers through touch tours. Disability & Society 25, 2 (2010), 231–240. https://doi.org/10.1080/09687590903537497
- Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems. 319–326. https://doi.org/10.1145/985692.985733
- How blind people interact with visual content on social networking services. In Proceedings of the 19th acm conference on computer-supported cooperative work & social computing. 1584–1595. https://doi.org/10.1145/2818048.2820013
- Agnieszka Walczak and Louise Fryer. 2017. Creative description: The impact of audio description style on presence in visually impaired audiences. British Journal of Visual Impairment 35, 1 (2017), 6–17. https://doi.org/10.1177/0264619616661603
- Toward automatic audio description generation for accessible videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–12. https://doi.org/10.1145/3411764.3445347
- Kara Warner. 2023. Discover the Art of Audio Description with ‘All the Light We Cannot See’. https://www.netflix.com/tudum/articles/all-the-light-we-cannot-see-aria-mia-lorbeti-audio-introduction
- LiveDescribe web redefining what and how entertainment content can be accessible to blind and low vision audiences. In Computers Helping People with Special Needs: 15th International Conference, ICCHP 2016, Linz, Austria, July 13-15, 2016, Proceedings, Part I 15. Springer, 224–230. https://doi.org/10.1177/0145482X1210600304
- Disability, bias, and AI. AI Now Institute 8 (2019). https://ainowinstitute.org/wp-content/uploads/2023/04/disabilitybiasai-2019.pdf
- Automatic alt-text: Computer-generated image descriptions for blind users on a social network service. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1180–1192. https://doi.org/10.1145/2998181.2998364
- Describing videos by exploiting temporal structure. In Proceedings of the IEEE international conference on computer vision. 4507–4515. https://doi.org/10.1109/ICCV.2015.512
- YouDescribe. 2023. YouDescribe - Audio Description for YouTube Videos. https://youdescribe.org/
- YouTube. 2016. The latest YouTube stats on when, where, and what people watch. https://www.thinkwithgoogle.com/data-collections/youtube-stats-video-consumption-trends/
- Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users. In Proceedings of the 2020 ACM Designing Interactive Systems Conference. 47–60. http://doi.org/10.1145/3357236.3395433
- MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning. arXiv preprint arXiv:2311.17435 (2023). https://arxiv.org/pdf/2311.17435.pdf
- Exploring Interactive Sound Design for Auditory Websites. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–16. https://doi.org/10.1145/3491102.3517695
- {{\{{ImageAlly}}\}}: A {{\{{Human-AI}}\}} Hybrid Approach to Support Blind People in Detecting and Redacting Private Image Content. In Nineteenth Symposium on Usable Privacy and Security (SOUPS 2023). 417–436. https://www.usenix.org/conference/soups2023/presentation/zhang
- Towards automatic learning of procedures from web instructional videos. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. https://doi.org/10.1609/aaai.v32i1.12342