Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AXNav: Replaying Accessibility Tests from Natural Language (2310.02424v3)

Published 3 Oct 2023 in cs.HC and cs.AI

Abstract: Developers and quality assurance testers often rely on manual testing to test accessibility features throughout the product lifecycle. Unfortunately, manual testing can be tedious, often has an overwhelming scope, and can be difficult to schedule amongst other development milestones. Recently, LLMs have been used for a variety of tasks including automation of UIs, however to our knowledge no one has yet explored their use in controlling assistive technologies for the purposes of supporting accessibility testing. In this paper, we explore the requirements of a natural language based accessibility testing workflow, starting with a formative study. From this we build a system that takes as input a manual accessibility test (e.g., ``Search for a show in VoiceOver'') and uses an LLM combined with pixel-based UI Understanding models to execute the test and produce a chaptered, navigable video. In each video, to help QA testers we apply heuristics to detect and flag accessibility issues (e.g., Text size not increasing with Large Text enabled, VoiceOver navigation loops). We evaluate this system through a 10 participant user study with accessibility QA professionals who indicated that the tool would be very useful in their current work and performed tests similarly to how they would manually test the features. The study also reveals insights for future work on using LLMs for accessibility testing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. 2021. Roboelectric. http://robolectric.org/
  2. 2022. Accessibility Programming Guide for OS X: Testing for Accessibility on OS X. https://developer.apple.com/library/archive/documentation/Accessibility/Conceptual/AccessibilityMacOSX/OSXAXTestingApps.html
  3. 2023. Accessibility (Button Shapes). https://developer.apple.com/design/human-interface-guidelines/accessibility
  4. 2023a. Accessibility on iOS. https://developer.apple.com/accessibility/
  5. 2023b. Accessibility Scanner. https://play.google.com/store/apps/details?id=com.google.android.apps.accessibility.auditor&hl=en_US
  6. 2023. Espresso. https://developer.android.com/training/testing/espresso
  7. 2023. Get started on Android with Talkback. https://support.google.com/accessibility/android/answer/6283677?hl=ens
  8. 2023c. Google. https://developer.android.com/guide/topics/ui/accessibility/
  9. 2023. Improve your code with lint checks. https://developer.android.com/studio/write/lint?hl=en
  10. 2023. VoiceOver. https://support.apple.com/guide/iphone/learn-voiceover-gestures-iph3e2e2281/ios
  11. 2023. WCAG 2 Overview. https://www.w3.org/WAI/standards-guidelines/wcag/
  12. 2023. XCTest. https://developer.apple.com/documentation/xctest
  13. Accessibility issues in Android apps: state of affairs, sentiments, and ways forward. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1323–1334.
  14. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233
  15. Apple. 2020. Recognizing Text in Images. https://developer.apple.com/documentation/vision/recognizing_text_in_images/
  16. How do people interact with biased text prediction models while writing?. In HCINLP. https://api.semanticscholar.org/CorpusID:233365289
  17. Accessibility in software practice: A practitioner’s perspective. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 4 (2022), 1–26.
  18. Accessibility by demonstration: enabling end users to guide developers to web accessibility solutions. In Proceedings of the 12th international ACM SIGACCESS conference on Computers and accessibility. 35–42.
  19. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
  20. A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility. arXiv:2202.02312 [cs.CL]
  21. John Canny. 1986. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8, 6 (1986), 679–698. https://doi.org/10.1109/TPAMI.1986.4767851
  22. Towards Complete Icon Labeling in Mobile Applications. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 387, 14 pages. https://doi.org/10.1145/3491102.3502073
  23. Accessible or Not An Empirical Investigation of Android App Accessibility. IEEE Transactions on Software Engineering (2021), 1–1. https://doi.org/10.1109/TSE.2021.3108162
  24. Understanding the role of human intuition on reliance in human-AI decision-making with explanations. arXiv preprint arXiv:2301.07255 (2023).
  25. PaLM: Scaling Language Modeling with Pathways. arXiv:2204.02311 [cs.CL]
  26. To the attention of mobile software developers: guess what, test your app! Empirical Software Engineering 24, 4 (2019), 2438–2468.
  27. Richard O. Duda and Peter E. Hart. 1972. Use of the Hough Transformation to Detect Lines and Curves in Pictures. Commun. ACM 15, 1 (jan 1972), 11–15. https://doi.org/10.1145/361237.361242
  28. Automated accessibility testing of mobile apps. In 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST). IEEE, 116–126.
  29. Sidong Feng and Chunyang Chen. 2023. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. arXiv:2306.01987 [cs.SE]
  30. A Large-Scale Longitudinal Analysis of Missing Label Accessibility Failures in Android Apps. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–16.
  31. Applied thematic analysis. sage publications.
  32. A real-world webagent with planning, long context understanding, and program synthesis. arXiv preprint arXiv:2307.12856 (2023).
  33. Understanding the test automation culture of app developers. In 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST). IEEE, 1–10.
  34. Large Language Models are Zero-Shot Reasoners. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 22199–22213. https://proceedings.neurips.cc/paper_files/paper/2022/file/8bb0d291acd4acf06ef112099c16f326-Paper-Conference.pdf
  35. Human-ai collaboration via conditional delegation: A case study of content moderation. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18.
  36. ATOM: Automatic maintenance of GUI test scripts for evolving mobile applications. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 161–171.
  37. Mapping Natural Language Instructions to Mobile UI Action Sequences. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8198–8210. https://doi.org/10.18653/v1/2020.acl-main.729
  38. Test automation in open-source android apps: A large-scale empirical study. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 1078–1089.
  39. How do developers test android applications?. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 613–622.
  40. Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing. arXiv preprint arXiv:2305.09434 (2023).
  41. Is your web page accessible? A comparative study of methods for assessing web page accessibility for the blind. In Proceedings of the SIGCHI conference on Human factors in computing systems. 41–50.
  42. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  43. Nobuyuki Otsu. 1979. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1 (1979), 62–66. https://doi.org/10.1109/TSMC.1979.4310076
  44. Gui-guided test script repair for mobile apps. IEEE Transactions on Software Engineering 48, 3 (2020), 910–929.
  45. Gorilla: Large Language Model Connected with Massive APIs. arXiv:2305.15334 [cs.CL]
  46. Android in the Wild: A Large-Scale Dataset for Android Device Control. arXiv:2307.10088 [cs.LG]
  47. Epidemiology as a framework for large-scale mobile application accessibility assessment. In Proceedings of the 19th international ACM SIGACCESS conference on computers and accessibility. 2–11.
  48. Latte: Use-case and assistive-service driven automated accessibility testing framework for android. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–11.
  49. Assistive-Technology Aided Manual Accessibility Testing in Mobile Apps, Powered by Record-and-Replay. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
  50. Groundhog: An Automated Accessibility Crawler for Mobile Apps. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
  51. Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761 [cs.CL]
  52. From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces. arXiv preprint arXiv:2306.00245 (2023).
  53. A survey on the tool support for the automatic evaluation of mobile accessibility. In Proceedings of the 8th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion. 286–293.
  54. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
  55. UGIF: UI Grounded Instruction Following. arXiv:2211.07615 [cs.CL]
  56. Benchmarking web accessibility evaluation tools: measuring the harm of sole reliance on automated tests. In Proceedings of the 10th international cross-disciplinary conference on web accessibility. 1–10.
  57. Enabling Conversational Interaction with Mobile UI Using Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 432, 17 pages. https://doi.org/10.1145/3544548.3580895
  58. Empowering LLM to use Smartphone for Intelligent Task Automation. arXiv preprint arXiv:2308.15272 (2023).
  59. Shunguo Yan and PG Ramachandran. 2019. The current status of accessibility in mobile apps. ACM Transactions on Accessible Computing (TACCESS) 12, 1 (2019), 1–31.
  60. Screen recognition: Creating accessibility metadata for mobile applications from pixels. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.
  61. Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators. arXiv:2306.01242 [cs.AI]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Maryam Taeb (2 papers)
  2. Amanda Swearngin (14 papers)
  3. Eldon Schoop (10 papers)
  4. Ruijia Cheng (10 papers)
  5. Yue Jiang (104 papers)
  6. Jeffrey Nichols (25 papers)
Citations (20)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets