Context-Aware Image Descriptions for Web Accessibility (2409.03054v1)
Abstract: Blind and low vision (BLV) internet users access images on the web via text descriptions. New vision-to-LLMs such as GPT-V, Gemini, and LLaVa can now provide detailed image descriptions on-demand. While prior research and guidelines state that BLV audiences' information preferences depend on the context of the image, existing tools for accessing vision-to-LLMs provide only context-free image descriptions by generating descriptions for the image alone without considering the surrounding webpage context. To explore how to integrate image context into image descriptions, we designed a Chrome Extension that automatically extracts webpage context to inform GPT-4V-generated image descriptions. We gained feedback from 12 BLV participants in a user study comparing typical context-free image descriptions to context-aware image descriptions. We then further evaluated our context-informed image descriptions with a technical evaluation. Our user evaluation demonstrated that BLV participants frequently prefer context-aware descriptions to context-free descriptions. BLV participants also rated context-aware descriptions significantly higher in quality, imaginability, relevance, and plausibility. All participants shared that they wanted to use context-aware descriptions in the future and highlighted the potential for use in online shopping, social media, news, and personal interest blogs.
- [n. d.]. BeMyAI. https://www.bemyeyes.com/blog/introducing-be-my-ai. Accessed: April 20th, 2024.
- [n. d.]. Google Chrome Image Descriptions. https://support.google.com/chrome/answer/9311597?hl=en. Accessed: April 20th, 2024.
- [n. d.]. Google Gemini. https://gemini.google.com/app. Accessed: April 20th, 2024.
- [n. d.]. OpenAI GPT-V. https://www.openai.com. Accessed: April 20th, 2024.
- How teens with visual impairments take, edit, and share photos on social media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.
- Tim Berners-Lee and Dan Connolly. 1995. Hypertext Markup Language - 2.0. RFC 1866. Internet Engineering Task Force. 1–77 pages. https://doi.org/10.17487/RFC1866
- Vizwiz: nearly real-time answers to visual questions. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 333–342.
- WebInSight: making web images accessible. In Proceedings of the 8th International ACM SIGACCESS Conference on Computers and Accessibility. 181–188.
- Good news, everyone! context driven entity-aware captioning for news images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12466–12475.
- Laura Begley Bloom. [n. d.]. Bucket List Travel: The Top 20 Places In The World.
- DIAGRAM Center. 2016. General Guidelines for Accessible Content. http://diagramcenter.org/general-guidelines-final-draft.html. Accessed: 2024-04-21.
- Attend to you: Personalized image captioning with context sequence memory networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 895–903.
- André-Abush Clause. 2024. OpenAI NVDA add on. https://github.com/aaclause/nvda-OpenAI. Accessed: 2024.
- No Conformity. [n. d.]. No Conformity on X: ”After Brand Transformation: With a brand transformation, Billie Eilish became a global phenomenon. She captivated audiences w/ her sound, aesthetic, and authentic storytelling. Her brand evolution positioned her as an icon, earning her Grammy Awards and critical acclaim”.
- W3 Consortium. 2018. Web Content Accesisbility Guidelines (WCAG) 2.1. https://www.w3.org/TR/WCAG21/
- James Crawford-Smith. [n. d.]. Prince Harry’s ’Protective’ Gesture Over Meghan Markle Caught on Camera.
- Dillard’s. [n. d.]. Free People Bluebell Floral Print V-Neck Sleeveless Maxi Dress.
- Sophie Dodd. [n. d.].
- Hello! My name is… Buffy”–Automatic Naming of Characters in TV Video.. In BMVC, Vol. 2. 6.
- Explosion. [n. d.]. EntityRecognizer. https://spacy.io/api/entityrecognizer.
- American Foundation for the Blind. 2024a. Improving Your Website Accessibility. https://www.afb.org/consulting/afb-accessibility-resources/improving-your-web-site. Accessed: 2024-04-21.
- Perkins School for the Blind. 2023. How to Write Alt Text and Image Descriptions for the Visually Impaired. https://www.perkins.org/resource/how-write-alt-text-and-image-descriptions-visually-impaired/. Accessed: 2024-04-21.
- Perkins School for the Blind. 2024b. Creating Image Descriptions and Alt Text. https://www.perkins.org/resource/creating-image-descriptions-alt-text/. Accessed: 2024-04-21.
- Stylenet: Generating attractive visual captions with styles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3137–3146.
- Determining question-answer plausibility in crowdsourced datasets using multi-task learning. arXiv preprint arXiv:2011.04883 (2020).
- “It’s almost like they’re trying to hide it”: How User-Provided Image Descriptions Have Failed to Make Twitter Accessible. In The World Wide Web Conference. 549–559.
- Making GIFs Accessible. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–10.
- Making memes accessible. In Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility. 367–376.
- Making Memes Accessible. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility. 367–376.
- Twitter A11y: A browser extension to make Twitter images accessible. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–12.
- Twitter A11y: A Browser Extension to Make Twitter Images Accessible. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376728
- Global web accessibility analysis of national government portals and ministry web sites. Journal of Information Technology & Politics 8, 1 (2011), 41–67.
- Google. [n. d.]. Firebase Realtime Database Documentation. https://firebase.google.com/docs/database. Accessed: 2024-04-23.
- Caption crawler: Enabling reusable alternative text descriptions using reverse image search. In Proceedings of the 2018 chi conference on human factors in computing systems. 1–11.
- Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718 (2021).
- Cocomix: Utilizing Comments to Improve Non-Visual Webtoon Accessibility. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18.
- GenAssist: Making Image Generation Accessible. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–17.
- Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics. arXiv preprint arXiv:2205.10646 (2022).
- ContextRef: Evaluating Referenceless Metrics For Image Description Generation. arXiv preprint arXiv:2309.11710 (2023).
- ImageExplorer: Multi-Layered Touch Exploration to Encourage Skepticism Towards Imperfect AI-Generated Image Captions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–15.
- Veronica Lewis. 2023a. How to Write Alt Text and Image Descriptions for Photojournalism Images. https://veroniiiica.com/how-to-write-alt-text-and-image-descriptions-for-photojournalism-images/. Accessed: 2024-04-21.
- Veronica Lewis. 2023b. Writing Image Descriptions for Red Carpet Outfits. https://veroniiiica.com/writing-image-descriptions-for-red-carpet-outfits/. Accessed: 2024-04-21.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning. PMLR, 19730–19742.
- Visual Instruction Tuning. In NeurIPS.
- The state of corporate website accessibility. Commun. ACM 52, 9 (2009), 128–132.
- Designing tools for high-quality alt text authoring. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–14.
- Understanding Blind People’s Experiences with Computer-Generated Captions of Social Media Images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI ’17. ACM Press, New York, New York, USA, 5988–5999. https://doi.org/10.1145/3025453.3025814
- Senticap: Generating image descriptions with sentiments. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
- Guiding novice web workers in making image descriptions using templates. ACM Transactions on Accessible Computing (TACCESS) 7, 4 (2015), 1–21.
- Rich representations of visual content for screen reader users. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–11.
- ” With most of it being pictures now, I rarely use it” Understanding Twitter’s Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI conference on human factors in computing systems. 5506–5516.
- ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration Systems for Blind and Low Vision Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.
- NY Furniture Outlets. [n. d.]. French Beige Chenille Cherry Carved Wood Sofa Traditioanal McFerran SF8700.
- DreamStruct: Understanding Slides and UIs via Synthetic Data Generation. To Appear at ECCV 2024.
- Toward scalable social alt text: Conversational crowdsourcing as a tool for refining vision-to-language technology for the blind. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 5. 147–156.
- Engaging image captioning via personality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12516–12526.
- FigurA11y: AI Assistance for Writing Scientific Alt Text. In Proceedings of the 29th International Conference on Intelligent User Interfaces. 886–906.
- Alt-Text with Context: Improving Accessibility for Images on Twitter. In The Twelfth International Conference on Learning Representations.
- ” Person, Shoes, Tree. Is the Person Naked?” What People with Vision Impairments Want in Image Descriptions. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–13.
- Going beyond one-size-fits-all image descriptions to satisfy the information wants of people who are blind or have low vision. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility. 1–15.
- Browsewithme: An online clothes shopping assistant for people with visual impairments. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 107–118.
- Improving accessibility of the web with a computer game. In Proceedings of the SIGCHI conference on Human Factors in computing systems. 79–82.
- W3C. 2024. Tips for Creating Accessible Images. https://www.w3.org/WAI/tutorials/images/tips/. Accessed: 2024-04-21.
- WebAIM. 2021. Techniques for Writing Effective Alt Text. https://webaim.org/techniques/alttext/. Accessed: 2024-04-21.
- Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (Portland, Oregon, USA) (CSCW ’17). ACM, New York, NY, USA, 1180–1192. https://doi.org/10.1145/2998181.2998364
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.