Generating Automatic Feedback on UI Mockups with Large Language Models (2403.13139v1)
Abstract: Feedback on user interface (UI) mockups is crucial in design. However, human feedback is not always readily available. We explore the potential of using LLMs for automatic feedback. Specifically, we focus on applying GPT-4 to automate heuristic evaluation, which currently entails a human expert assessing a UI's compliance with a set of design guidelines. We implemented a Figma plugin that takes in a UI design and a set of written heuristics, and renders automatically-generated feedback as constructive suggestions. We assessed performance on 51 UIs using three sets of guidelines, compared GPT-4-generated design suggestions with those from human experts, and conducted a study with 12 expert designers to understand fit with existing practice. We found that GPT-4-based feedback is useful for catching subtle errors, improving text, and considering UI semantics, but feedback also decreased in utility over iterations. Participants described several uses for this plugin despite its imperfect suggestions.
- 2023. Figma: The Collaborative Interface Design Tool. https://www.figma.com/
- Cross-Cultural Web Design Guidelines. In Proceedings of the 14th International Web for All Conference (Perth, Western Australia, Australia) (W4A ’17). Association for Computing Machinery, New York, NY, USA, Article 10, 4 pages. https://doi.org/10.1145/3058555.3058574
- Ali Borji and Mehrdad Mohammadian. 2023. Battle of the Wordsmiths: Comparing ChatGPT, GPT-4, Claude, and Bard. https://doi.org/10.2139/ssrn.4476855
- Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3 (01 2006), 77–101. https://doi.org/10.1191/1478088706qp063oa
- Learning GUI Completions with User-Defined Constraints. ACM Transactions on Interactive Intelligent Systems (TiiS) 12, 1, Article 6 (mar 2022), 40 pages. https://doi.org/10.1145/3490034
- UI X-Ray: Interactive Mobile UI Testing Based on Computer Vision (IUI ’17). Association for Computing Machinery, New York, NY, USA, 245–255. https://doi.org/10.1145/3025171.3025190
- PLay: Parametrically Conditioned Layout Generation using Latent Diffusion. arXiv:2301.11529 [cs.LG]
- Critique Me. Proceedings of the ACM on Human-Computer Interaction 4 (2020), 1–24. https://api.semanticscholar.org/CorpusID:224804650
- Evolving Heuristic Evaluation for Multiple Contexts and Audiences: Perspectives from a Mapping Study. In Proceedings of the 34th ACM International Conference on the Design of Communication (Silver Spring, MD, USA) (SIGDOC ’16). Association for Computing Machinery, New York, NY, USA, Article 19, 8 pages. https://doi.org/10.1145/2987592.2987617
- Towards Semantically-Aware UI Design Tools: Design, Implementation and Evaluation of Semantic Grouping Guidelines. In ICML 2023 Workshop on Artificial Intelligence and Human-Computer Interaction. https://research.google/pubs/pub52594/
- Optimizing User Interface Layouts via Gradient Descent. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376589
- How accessible is my visualization? Evaluating visualization accessibility with Chartability. In Computer Graphics Forum, Vol. 41. Wiley Online Library, 57–70.
- J.L. Fleiss et al. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378–382.
- Predicting Visual Importance Across Graphic Design Types. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 249–260. https://doi.org/10.1145/3379337.3415825
- CritiqueKit: A Mixed-Initiative, Real-Time Interface For Improving Feedback. In Adjunct Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17 Adjunct). Association for Computing Machinery, New York, NY, USA, 7–9. https://doi.org/10.1145/3131785.3131791
- Barney G. Glaser and Anselm L. Strauss. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine de Gruyter, New York, NY.
- Design Guidelines and Recommendations for Multimodal, Touchscreen-Based Graphics. ACM Trans. Access. Comput. 13, 3, Article 10 (aug 2020), 30 pages. https://doi.org/10.1145/3403933
- Evaluating Large Language Models in Generating Synthetic HCI Research Data: A Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 433, 19 pages. https://doi.org/10.1145/3544548.3580688
- Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 17 pages. https://doi.org/10.1145/3491102.3501931
- PeerStudio: Rapid Peer Feedback Emphasizes Revision and Improves Performance. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale (Vancouver, BC, Canada) (L@S ’15). Association for Computing Machinery, New York, NY, USA, 75–84. https://doi.org/10.1145/2724660.2724670
- Sri Kurniawan and Panayiotis Zaphiris. 2005. Research-Derived Web Design Guidelines for Older People. In Proceedings of the 7th International ACM SIGACCESS Conference on Computers and Accessibility (Baltimore, MD, USA) (Assets ’05). Association for Computing Machinery, New York, NY, USA, 129–135. https://doi.org/10.1145/1090785.1090810
- When is a Tool a Tool? User Perceptions of System Agency in Human–AI Co-Creative Drawing. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 1978–1996. https://doi.org/10.1145/3563657.3595977
- GUIComp: A GUI Design Assistant with Real-Time, Multi-Faceted Feedback. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376327
- Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine 388, 13 (2023), 1233–1239.
- Chen Ling and Gavriel Salvendy. 2009. Effect of evaluators’ cognitive style on heuristic evaluation: Field dependent and field independent evaluators. International Journal of Human-Computer Studies 67, 4 (2009), 382–393. https://doi.org/10.1016/j.ijhcs.2008.11.002
- Visual Instruction Tuning. arXiv:2304.08485 [cs.CV]
- Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing. arXiv:2305.09434 [cs.SE]
- Frozen Pretrained Transformers as Universal Computation Engines. Proceedings of the AAAI Conference on Artificial Intelligence 36, 7 (Jun. 2022), 7628–7636. https://doi.org/10.1609/aaai.v36i7.20729
- CrowdCrit: Crowdsourcing and Aggregating Visual Design Critique. In Proceedings of the Companion Publication of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (Baltimore, Maryland, USA) (CSCW Companion ’14). Association for Computing Machinery, New York, NY, USA, 21–24. https://doi.org/10.1145/2556420.2556788
- Structuring, Aggregating, and Evaluating Crowdsourced Design Critique. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 473–485. https://doi.org/10.1145/2675133.2675283
- Heuristic Evaluation of Ambient Displays. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI ’03). Association for Computing Machinery, New York, NY, USA, 169–176. https://doi.org/10.1145/642611.642642
- Deriving Design Guidelines for Ambient Light Systems. In Proceedings of the 14th International Conference on Mobile and Ubiquitous Multimedia (Linz, Austria) (MUM ’15). Association for Computing Machinery, New York, NY, USA, 267–277. https://doi.org/10.1145/2836041.2836069
- A heuristic checklist for an accessible smartphone interface design. Universal Access in the Information Society 13 (2014), 351–365. https://api.semanticscholar.org/CorpusID:16948811
- Design Guidelines for Web Readability. In Proceedings of the 2017 Conference on Designing Interactive Systems (Edinburgh, United Kingdom) (DIS ’17). Association for Computing Machinery, New York, NY, USA, 285–296. https://doi.org/10.1145/3064663.3064711
- Design Guidelines for Hands-Free Speech Interaction. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (Barcelona, Spain) (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, 269–276. https://doi.org/10.1145/3236112.3236149
- Interactive Guidance Techniques for Improving Creative Feedback. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3173629
- Shöwn: Adaptive Conceptual Guidance Aids Example Use in Creative Tasks. In Proceedings of the 2021 ACM Designing Interactive Systems Conference (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 1834–1845. https://doi.org/10.1145/3461778.3462072
- Jakob Nielsen. 1992. Finding Usability Problems through Heuristic Evaluation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Monterey, California, USA) (CHI ’92). Association for Computing Machinery, New York, NY, USA, 373–380. https://doi.org/10.1145/142750.142834
- Jakob Nielsen. 1994. 10 Usability Heuristics for User Interface Design. https://www.nngroup.com/articles/ten-usability-heuristics/
- Jakob Nielsen and Rolf Molich. 1990. Heuristic Evaluation of User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’90). Association for Computing Machinery, New York, NY, USA, 249–256. https://doi.org/10.1145/97243.97281
- Charrette: Supporting In-Person Discussions around Iterations in User Interface Design. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3173574.3174109
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC]
- Social Simulacra: Creating Populated Prototypes for Social Computing Systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 74, 18 pages. https://doi.org/10.1145/3526113.3545616
- VidCrit: Video-Based Asynchronous Video Review. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST ’16). Association for Computing Machinery, New York, NY, USA, 517–528. https://doi.org/10.1145/2984511.2984552
- PromptInfuser: Bringing User Interface Mock-Ups to Life with Large Language Models. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 237, 6 pages. https://doi.org/10.1145/3544549.3585628
- Heuristic Evaluation for Games: Usability Principles for Video Game Design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 1453–1462. https://doi.org/10.1145/1357054.1357282
- D Royce Sadler. 1989. Formative assessment and the design of instructional systems. Instructional science 18 (1989), 119–144.
- Multitask Prompted Training Enables Zero-Shot Task Generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. https://openreview.net/forum?id=9Vrb9D0WI4
- Predicting and Explaining Mobile UI Tappability with Vision Modeling and Saliency Analysis. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 36, 21 pages. https://doi.org/10.1145/3491102.3517497
- Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366 [cs.AI]
- Amanda Swearngin and Yang Li. 2019. Modeling Mobile Interface Tappability Using Crowdsourcing and Deep Learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300305
- Scout: Rapid Exploration of Interface Layout Alternatives through High-Level Design Constraints. Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376593
- Sketchplore: Sketch and Explore with a Layout Optimiser. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (Brisbane, QLD, Australia) (DIS ’16). Association for Computing Machinery, New York, NY, USA, 543–555. https://doi.org/10.1145/2901790.2901817
- Norman G. Vinson. 1999. Design Guidelines for Landmarks to Support Navigation in Virtual Environments. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 278–285. https://doi.org/10.1145/302979.303062
- Enabling Conversational Interaction with Mobile UI Using Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 432, 17 pages. https://doi.org/10.1145/3544548.3580895
- SlideSpecs: Automatic and Interactive Presentation Feedback Collation. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 695–709. https://doi.org/10.1145/3581641.3584035
- Design Guidelines for Notifications on Smart TVs. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (Chicago, Illinois, USA) (TVX ’16). Association for Computing Machinery, New York, NY, USA, 13–24. https://doi.org/10.1145/2932206.2932212
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]
- RichReview: Blending Ink, Speech, and Gesture to Support Collaborative Document Review. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 481–490. https://doi.org/10.1145/2642918.2647390
- Age-Centered Research-Based Web Design Guidelines. In CHI ’05 Extended Abstracts on Human Factors in Computing Systems (Portland, OR, USA) (CHI EA ’05). Association for Computing Machinery, New York, NY, USA, 1897–1900. https://doi.org/10.1145/1056808.1057050
- Peitong Duan (5 papers)
- Jeremy Warner (5 papers)
- Yang Li (1140 papers)
- Bjoern Hartmann (11 papers)