Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UIClip: A Data-driven Model for Assessing User Interface Design (2404.12500v1)

Published 18 Apr 2024 in cs.HC, cs.CL, and cs.CV

Abstract: User interface (UI) design is a difficult yet important task for ensuring the usability, accessibility, and aesthetic qualities of applications. In our paper, we develop a machine-learned model, UIClip, for assessing the design quality and visual relevance of a UI given its screenshot and natural language description. To train UIClip, we used a combination of automated crawling, synthetic augmentation, and human ratings to construct a large-scale dataset of UIs, collated by description and ranked by design quality. Through training on the dataset, UIClip implicitly learns properties of good and bad designs by i) assigning a numerical score that represents a UI design's relevance and quality and ii) providing design suggestions. In an evaluation that compared the outputs of UIClip and other baselines to UIs rated by 12 human designers, we found that UIClip achieved the highest agreement with ground-truth rankings. Finally, we present three example applications that demonstrate how UIClip can facilitate downstream applications that rely on instantaneous assessment of UI design quality: i) UI code generation, ii) UI design tips generation, and iii) quality-aware UI example search.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  2. Anthropic. 2023. Introducing the next generation of Claude. https://www.anthropic.com/news/claude-3-family. Accessed: 2024-04-01.
  3. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond. (2023).
  4. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72.
  5. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning. PMLR, 2397–2430.
  6. Vins: Visual search for mobile user interface design. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  8. Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2818–2829.
  9. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
  10. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24, 240 (2023), 1–113.
  11. Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th annual ACM symposium on user interface software and technology. 845–854.
  12. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. arXiv preprint arXiv:2104.08758 (2021).
  13. Generating Automatic Feedback on UI Mockups with Large Language Models. arXiv preprint arXiv:2403.13139 (2024).
  14. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, Vol. 96. 226–231.
  15. Datacomp: In search of the next generation of multimodal datasets. Advances in Neural Information Processing Systems 36 (2024).
  16. Krzysztof Gajos and Daniel S Weld. 2004. SUPPLE: automatically generating user interfaces. In Proceedings of the 9th international conference on Intelligent user interfaces. 93–100.
  17. Krzysztof Gajos and Daniel S Weld. 2005. Preference elicitation for interface optimization. In Proceedings of the 18th annual ACM symposium on User interface software and technology. 173–182.
  18. Automatically generating user interfaces adapted to users’ motor and vision capabilities. In Proceedings of the 20th annual ACM symposium on User interface software and technology. 231–240.
  19. George Giannakopoulos and Vangelis Karkaletsis. 2011. AutoSummENG and MeMoG in Evaluating Guided Summaries.. In TAC.
  20. Kelley Gordon. 2020. 5 Principles of Visual Design in UX. https://www.nngroup.com/articles/principles-visual-design/. Accessed: 2024-03-25.
  21. Reinforced self-training (rest) for language modeling. arXiv preprint arXiv:2308.08998 (2023).
  22. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), Vol. 2. IEEE, 1735–1742.
  23. D. note: revising user interfaces through change tracking, annotations, and alternatives. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 493–502.
  24. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718 (2021).
  25. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
  26. Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 20406–20417.
  27. Swire: Sketch-based user interface retrieval. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–10.
  28. Bernard J Jansen. 1998. The graphical user interface. ACM SIGCHI Bulletin 30, 2 (1998), 22–26.
  29. Perplexity—a measure of the difficulty of speech recognition tasks. The Journal of the Acoustical Society of America 62, S1 (1977), S63–S63.
  30. Mistral 7B. arXiv preprint arXiv:2310.06825 (2023).
  31. Mixtral of experts. arXiv preprint arXiv:2401.04088 (2024).
  32. Miles A Kimball. 2013. Visual design principles: An empirical study of design lore. Journal of Technical Writing and Communication 43, 1 (2013), 3–41.
  33. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  34. Webzeitgeist: design mining the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 3083–3092.
  35. James A Landay. 1996. SILK: sketching interfaces like krazy. In Conference companion on Human factors in computing systems. 398–399.
  36. Bloom: A 176b-parameter open-access multilingual language model. (2022).
  37. GUIComp: A GUI design assistant with real-time, multi-faceted feedback. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–13.
  38. Pix2struct: Screenshot parsing as pretraining for visual language understanding. In International Conference on Machine Learning. PMLR, 18893–18912.
  39. Enrico: A dataset for topic modeling of mobile UI designs. In 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services. 1–4.
  40. Gang Li and Yang Li. 2022. Spotlight: Mobile ui understanding using vision-language models with a focus. arXiv preprint arXiv:2209.14927 (2022).
  41. Universal principles of design, revised and updated: 125 ways to enhance usability, influence perception, increase appeal, make better design decisions, and teach through design. Rockport Pub.
  42. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
  43. James Lin and James A Landay. 2002. Damask: A tool for early-stage design and prototyping of multi-device user interfaces. In In Proceedings of The 8th International Conference on Distributed Multimedia Systems (2002 International Workshop on Visual Computing). Citeseer, 573–580.
  44. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744 (2023).
  45. LLaVA-NeXT: Improved reasoning, OCR, and world knowledge. https://llava-vl.github.io/blog/2024-01-30-llava-next/
  46. Structuring, aggregating, and evaluating crowdsourced design critique. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing. 473–485.
  47. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems 36 (2024).
  48. Aliaksei Miniukovich and Antonella De Angeli. 2015. Computation of interface aesthetics. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 1163–1172.
  49. Automated reporting of GUI design violations for mobile apps. In Proceedings of the 40th International Conference on Software Engineering. 165–175.
  50. DENIM: An informal web site design tool inspired by observations of practice. Human-computer interaction 18, 3 (2003), 259–324.
  51. Improving automatic interface generation with smart templates. In Proceedings of the 9th international conference on Intelligent user interfaces. 286–288.
  52. Jakob Nielsen. 1992. Finding usability problems through heuristic evaluation. In Proceedings of the SIGCHI conference on Human factors in computing systems. 373–380.
  53. Jakob Nielsen. 1994. Enhancing the explanatory power of usability heuristics. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 152–158.
  54. Aalto interface metrics (AIM) a service and codebase for computational GUI evaluation. In Adjunct Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 16–19.
  55. Computational rationality as a theory of interaction. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–14.
  56. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022), 27730–27744.
  57. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
  58. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  59. Improving language understanding by generative pre-training. (2018).
  60. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67. http://jmlr.org/papers/v21/20-074.html
  61. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
  62. Andreas Riegler and Clemens Holzmann. 2018. Measuring visual user interface complexity of mobile applications with metrics. Interacting with Computers 30, 3 (2018), 207–223.
  63. Improved techniques for training gans. Advances in neural information processing systems 29 (2016).
  64. Stay on topic with classifier-free guidance. arXiv preprint arXiv:2306.17806 (2023).
  65. Predicting and explaining mobile ui tappability with vision modeling and saliency analysis. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–21.
  66. A Vision Check-up for Language Models. arXiv preprint arXiv:2401.01862 (2024).
  67. Designing the user interface: strategies for effective human-computer interaction. Pearson Education.
  68. Design2Code: How Far Are We From Automating Front-End Engineering? arXiv preprint arXiv:2403.03163 (2024).
  69. Amanda Swearngin and Yang Li. 2019. Modeling mobile interface tappability using crowdsourcing and deep learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–11.
  70. Scout: Rapid exploration of interface layout alternatives through high-level design constraints. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
  71. Towards Automated Accessibility Report Generation for Mobile Apps. arXiv preprint arXiv:2310.00091 (2023).
  72. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
  73. Sketchplore: Sketch and explore with a layout optimiser. In Proceedings of the 2016 ACM conference on designing interactive systems. 543–555.
  74. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  75. Screen2words: Automatic mobile UI summarization with multimodal learning. In The 34th Annual ACM Symposium on User Interface Software and Technology. 498–510.
  76. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2555–2563.
  77. Robin Williams. 2015. The non-designer’s design book: Design and typographic principles for the visual novice. Pearson Education.
  78. Euphemia Wong. 2024. User Interface Design Guidelines: 10 Rules of Thumb. https://www.interaction-design.org/literature/article/user-interface-design-guidelines-10-rules-of-thumb. Accessed: 2024-03-25.
  79. Never-ending Learning of User Interfaces. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–13.
  80. Webui: A dataset for enhancing visual ui understanding with web semantics. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
  81. Demystifying clip data. arXiv preprint arXiv:2309.16671 (2023).
  82. Sikuli: using GUI screenshots for search and automation. In Proceedings of the 22nd annual ACM symposium on User interface software and technology. 183–192.
  83. Scaling vision transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12104–12113.
  84. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems 36 (2024).
  85. Hype: A benchmark for human eye perceptual evaluation of generative models. Advances in neural information processing systems 32 (2019).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jason Wu (28 papers)
  2. Yi-Hao Peng (12 papers)
  3. Amanda Li (1 paper)
  4. Amanda Swearngin (14 papers)
  5. Jeffrey P. Bigham (48 papers)
  6. Jeffrey Nichols (25 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com