Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface (2403.04984v1)

Published 8 Mar 2024 in cs.SE

Abstract: Texts, widgets, and images on a UI page do not work separately. Instead, they are partitioned into groups to achieve certain interaction functions or visual information. Existing studies on UI elements grouping mainly focus on a specific single UI-related software engineering task, and their groups vary in appearance and function. In this case, we propose our semantic component groups that pack adjacent text and non-text elements with similar semantics. In contrast to those task-oriented grouping methods, our semantic component group can be adopted for multiple UI-related software tasks, such as retrieving UI perceptual groups, improving code structure for automatic UI-to-code generation, and generating accessibility data for screen readers. To recognize semantic component groups on a UI page, we propose a robust, deep learning-based vision detector, UISCGD, which extends the SOTA deformable-DETR by incorporating UI element color representation and a learned prior on group distribution. The model is trained on our UI screenshots dataset of 1988 mobile GUIs from more than 200 apps in both iOS and Android platforms. The evaluation shows that our UISCGD achieves 6.1\% better than the best baseline algorithm and 5.4 \% better than deformable-DETR in which it is based.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. VFNet: A Convolutional Architecture for Accent Classification. In 2019 IEEE 16th India Council International Conference (INDICON). 1–4. https://doi.org/10.1109/INDICON47234.2019.9030363
  2. Automatic HTML code generation from mock-up images using machine learning techniques. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). IEEE, 1–4.
  3. Tony Beltramelli. 2018. Pix2code: Generating Code from a Graphical User Interface Screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems (Paris, France) (EICS ’18). Association for Computing Machinery, New York, NY, USA, Article 3, 6 pages. https://doi.org/10.1145/3220134.3220135
  4. HGR-Net: Hierarchical Graph Reasoning Network for Arbitrary Shape Scene Text Detection. IEEE Transactions on Image Processing 32 (2023), 4142–4155. https://doi.org/10.1109/TIP.2023.3294822
  5. Robust Relational Layout Synthesis from Examples for Android. Proceedings of the ACM on Programming Languages 2, OOPSLA, Article 156 (oct 2018), 29 pages. https://doi.org/10.1145/3276526
  6. Encyclopaedia Britannica et al. 2008. Britannica concise encyclopedia. Encyclopaedia Britannica, Inc.
  7. VINS: Visual Search for Mobile User Interface Design. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 423, 14 pages. https://doi.org/10.1145/3411764.3445762
  8. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.
  9. Gallery D.C.: Design Search and Knowledge Discovery through Auto-Created GUI Component Gallery. Proceedings of the ACM on Human-Computer Interaction 3, CSCW, Article 180 (nov 2019), 22 pages. https://doi.org/10.1145/3359282
  10. From UI Design Image to GUI Skeleton: A Neural Machine Translator to Bootstrap Mobile GUI Implementation. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 665–676. https://doi.org/10.1145/3180155.3180240
  11. A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal. IEEE Transactions on Systems, Man, and Cybernetics: Systems 52, 2 (2022), 936–953. https://doi.org/10.1109/TSMC.2020.3005231
  12. Wireframe-Based UI Design Search through Image Autoencoder. ACM Transactions on Software Engineering and Methodology (TOSEM) 29, 3, Article 19 (jun 2020), 31 pages. https://doi.org/10.1145/3391613
  13. Towards Complete Icon Labeling in Mobile Applications. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 387, 14 pages. https://doi.org/10.1145/3491102.3502073
  14. EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12. https://doi.org/10.1145/3597503.3623313
  15. UI layers merger: merging UI layers via visual learning and boundary prior. Frontiers of Information Technology & Electronic Engineering (2022). https://doi.org/10.1631/FITEE.2200099
  16. Google Cloud. 2023. Vision AI — Cloud Vision API — Google Cloud. https://cloud.google.com/vision
  17. Learning User Interface Element Interactions. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 296–306. https://doi.org/10.1145/3293882.3330569
  18. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). Association for Computing Machinery, New York, NY, USA, 845–854. https://doi.org/10.1145/3126594.3126651
  19. Dingsheng Deng. 2020. DBSCAN Clustering Algorithm Based on Density. In 2020 7th International Forum on Electrical Engineering and Automation (IFEEA). 949–953. https://doi.org/10.1109/IFEEA51475.2020.00199
  20. CenterNet: Keypoint Triplets for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 6568–6577. https://doi.org/10.1109/ICCV.2019.00667
  21. Michael W Eysenck and Marc Brysbaert. 2023. Fundamentals of Cognition. Routledge, London. https://doi.org/10.4324/9781315617633
  22. Figma. 2023. Figma Community. https://www.figma.com/community/
  23. The American Foundation for the Blind. 2023. Screen Readers. https://www.afb.org/blindness-and-low-vision/using-technology/assistive-technology-products/screen-readers
  24. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).
  25. Vicki L. Hanson and John T. Richards. 2013. Progress on Website Accessibility? ACM Transactions on the Web (TWEB) 7, 1, Article 2 (mar 2013), 30 pages. https://doi.org/10.1145/2435215.2435217
  26. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90
  27. Method of Evaluating the User Interface of Software Systems for Compliance with the Gestalt Principles. In 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT), Vol. 2. IEEE, 138–141.
  28. Attention on Attention for Image Captioning. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 4633–4642. https://doi.org/10.1109/ICCV.2019.00473
  29. Imgcook. 2023. Imgcook: Convert Your Design to Code. https://www.imgcook.com/
  30. Apple Inc. 2023. Accessibility - Vision. https://www.apple.com/accessibility/vision/
  31. Reflective Decoding Network for Image Captioning. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 8887–8896. https://doi.org/10.1109/ICCV.2019.00898
  32. Janin Koch and Antti Oulasvirta. 2016. Computational layout perception using gestalt laws. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems. 1423–1429.
  33. Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at Scale. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 67, 13 pages. https://doi.org/10.1145/3491102.3502042
  34. Humanoid: A deep learning-based approach to automated black-box android app testing. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1070–1073.
  35. Focal Loss for Dense Object Detection. In 2017 IEEE International Conference on Computer Vision (ICCV). 2999–3007. https://doi.org/10.1109/ICCV.2017.324
  36. William MacNamara. 2017. Evaluating the Effectiveness of the Gestalt Principles of Perceptual Observation for Virtual Reality User Interface Design. Master’s thesis. Technological University Dublin. https://api.semanticscholar.org/CorpusID:59591184
  37. Magic Layouts: Structural Prior for Component Detection in User Interface Designs. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15804–15813. https://doi.org/10.1109/CVPR46437.2021.01555
  38. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE Transactions on Software Engineering 46, 2 (2020), 196–221. https://doi.org/10.1109/TSE.2018.2844788
  39. Author’s Name. 2019. UI Design in Practice: Gestalt Principles. https://uxmisfit.com/2019/04/23/ui-design-in-practice-gestalt-principles/. Accessed: 2024-02-24.
  40. Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces with REMAUI (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 248–259. https://doi.org/10.1109/ASE.2015.32
  41. RoScript: A Visual Script Driven Truly Non-Intrusive Robotic Testing System for Touch Screen Applications. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 297–308. https://doi.org/10.1145/3377811.3380431
  42. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
  43. Alex Robinson. 2019. Sketch2code: Generating a website from a paper mockup. arXiv preprint arXiv:1905.13750 (2019).
  44. Epidemiology as a Framework for Large-Scale Mobile Application Accessibility Assessment. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (Baltimore, Maryland, USA) (ASSETS ’17). Association for Computing Machinery, New York, NY, USA, 2–11. https://doi.org/10.1145/3132525.3132547
  45. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
  46. R. Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991
  47. Satoshi Suzuki and KeiichiA be. 1985. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing 30, 1 (1985), 32–46. https://doi.org/10.1016/0734-189X(85)90016-7
  48. Shane Torbert. 2016. Applied Computer Science. Springer Cham, Cham, Switzerland. https://doi.org/10.1007/978-3-319-30866-1
  49. velosoft. 2023. CodeFun. https://code.fun/. Accessed: 2024-02-24.
  50. Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 498–510. https://doi.org/10.1145/3472749.3474765
  51. Max Wertheimer. 1923. Untersuchungen zur Lehre von der Gestalt. II. Psychologische forschung 4, 1 (1923), 301–350.
  52. Improving Random GUI Testing with Image-Based Widget Detection. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 307–317. https://doi.org/10.1145/3293882.3330551
  53. WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 286, 14 pages. https://doi.org/10.1145/3544548.3581158
  54. Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 470–483. https://doi.org/10.1145/3472749.3474763
  55. UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box Attention. In CAAI International Conference on Artificial Intelligence. Springer, 303–314.
  56. IconIntent: Automatic Identification of Sensitive UI Widgets Based on Icon Classification for Android Apps. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 257–268. https://doi.org/10.1109/ICSE.2019.00041
  57. UIED: A Hybrid Tool for GUI Element Detection. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 1655–1659. https://doi.org/10.1145/3368089.3417940
  58. Psychologically-Inspired, Unsupervised Inference of Perceptual Groups of GUI Widgets from GUI Images. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 332–343. https://doi.org/10.1145/3540250.3549138
  59. Multimodal Icon Annotation For Mobile Applications. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction (Toulouse & Virtual, France) (MobileHCI ’21). Association for Computing Machinery, New York, NY, USA, Article 8, 11 pages. https://doi.org/10.1145/3447526.3472064
  60. Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 275, 15 pages. https://doi.org/10.1145/3411764.3445186
  61. A New Strategy for Improving the Accuracy in Scene Text Recognition. In 2023 4th International Conference on Electronic Communication and Artificial Intelligence (ICECAI). 319–323. https://doi.org/10.1109/ICECAI58670.2023.10176817
  62. EAST: An Efficient and Accurate Scene Text Detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2642–2651. https://doi.org/10.1109/CVPR.2017.283
  63. Deformable detr: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations. 1–16. https://openreview.net/forum?id=gZ9hCDWe6ke
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shuhong Xiao (9 papers)
  2. Yunnong Chen (11 papers)
  3. Yaxuan Song (10 papers)
  4. Liuqing Chen (16 papers)
  5. Lingyun Sun (38 papers)
  6. Yankun Zhen (6 papers)
  7. Yanfang Chang (4 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com