UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface (2403.04984v1)
Abstract: Texts, widgets, and images on a UI page do not work separately. Instead, they are partitioned into groups to achieve certain interaction functions or visual information. Existing studies on UI elements grouping mainly focus on a specific single UI-related software engineering task, and their groups vary in appearance and function. In this case, we propose our semantic component groups that pack adjacent text and non-text elements with similar semantics. In contrast to those task-oriented grouping methods, our semantic component group can be adopted for multiple UI-related software tasks, such as retrieving UI perceptual groups, improving code structure for automatic UI-to-code generation, and generating accessibility data for screen readers. To recognize semantic component groups on a UI page, we propose a robust, deep learning-based vision detector, UISCGD, which extends the SOTA deformable-DETR by incorporating UI element color representation and a learned prior on group distribution. The model is trained on our UI screenshots dataset of 1988 mobile GUIs from more than 200 apps in both iOS and Android platforms. The evaluation shows that our UISCGD achieves 6.1\% better than the best baseline algorithm and 5.4 \% better than deformable-DETR in which it is based.
- VFNet: A Convolutional Architecture for Accent Classification. In 2019 IEEE 16th India Council International Conference (INDICON). 1–4. https://doi.org/10.1109/INDICON47234.2019.9030363
- Automatic HTML code generation from mock-up images using machine learning techniques. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). IEEE, 1–4.
- Tony Beltramelli. 2018. Pix2code: Generating Code from a Graphical User Interface Screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems (Paris, France) (EICS ’18). Association for Computing Machinery, New York, NY, USA, Article 3, 6 pages. https://doi.org/10.1145/3220134.3220135
- HGR-Net: Hierarchical Graph Reasoning Network for Arbitrary Shape Scene Text Detection. IEEE Transactions on Image Processing 32 (2023), 4142–4155. https://doi.org/10.1109/TIP.2023.3294822
- Robust Relational Layout Synthesis from Examples for Android. Proceedings of the ACM on Programming Languages 2, OOPSLA, Article 156 (oct 2018), 29 pages. https://doi.org/10.1145/3276526
- Encyclopaedia Britannica et al. 2008. Britannica concise encyclopedia. Encyclopaedia Britannica, Inc.
- VINS: Visual Search for Mobile User Interface Design. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 423, 14 pages. https://doi.org/10.1145/3411764.3445762
- End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229.
- Gallery D.C.: Design Search and Knowledge Discovery through Auto-Created GUI Component Gallery. Proceedings of the ACM on Human-Computer Interaction 3, CSCW, Article 180 (nov 2019), 22 pages. https://doi.org/10.1145/3359282
- From UI Design Image to GUI Skeleton: A Neural Machine Translator to Bootstrap Mobile GUI Implementation. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 665–676. https://doi.org/10.1145/3180155.3180240
- A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal. IEEE Transactions on Systems, Man, and Cybernetics: Systems 52, 2 (2022), 936–953. https://doi.org/10.1109/TSMC.2020.3005231
- Wireframe-Based UI Design Search through Image Autoencoder. ACM Transactions on Software Engineering and Methodology (TOSEM) 29, 3, Article 19 (jun 2020), 31 pages. https://doi.org/10.1145/3391613
- Towards Complete Icon Labeling in Mobile Applications. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 387, 14 pages. https://doi.org/10.1145/3491102.3502073
- EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12. https://doi.org/10.1145/3597503.3623313
- UI layers merger: merging UI layers via visual learning and boundary prior. Frontiers of Information Technology & Electronic Engineering (2022). https://doi.org/10.1631/FITEE.2200099
- Google Cloud. 2023. Vision AI — Cloud Vision API — Google Cloud. https://cloud.google.com/vision
- Learning User Interface Element Interactions. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 296–306. https://doi.org/10.1145/3293882.3330569
- Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). Association for Computing Machinery, New York, NY, USA, 845–854. https://doi.org/10.1145/3126594.3126651
- Dingsheng Deng. 2020. DBSCAN Clustering Algorithm Based on Density. In 2020 7th International Forum on Electrical Engineering and Automation (IFEEA). 949–953. https://doi.org/10.1109/IFEEA51475.2020.00199
- CenterNet: Keypoint Triplets for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 6568–6577. https://doi.org/10.1109/ICCV.2019.00667
- Michael W Eysenck and Marc Brysbaert. 2023. Fundamentals of Cognition. Routledge, London. https://doi.org/10.4324/9781315617633
- Figma. 2023. Figma Community. https://www.figma.com/community/
- The American Foundation for the Blind. 2023. Screen Readers. https://www.afb.org/blindness-and-low-vision/using-technology/assistive-technology-products/screen-readers
- Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).
- Vicki L. Hanson and John T. Richards. 2013. Progress on Website Accessibility? ACM Transactions on the Web (TWEB) 7, 1, Article 2 (mar 2013), 30 pages. https://doi.org/10.1145/2435215.2435217
- Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90
- Method of Evaluating the User Interface of Software Systems for Compliance with the Gestalt Principles. In 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT), Vol. 2. IEEE, 138–141.
- Attention on Attention for Image Captioning. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 4633–4642. https://doi.org/10.1109/ICCV.2019.00473
- Imgcook. 2023. Imgcook: Convert Your Design to Code. https://www.imgcook.com/
- Apple Inc. 2023. Accessibility - Vision. https://www.apple.com/accessibility/vision/
- Reflective Decoding Network for Image Captioning. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 8887–8896. https://doi.org/10.1109/ICCV.2019.00898
- Janin Koch and Antti Oulasvirta. 2016. Computational layout perception using gestalt laws. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems. 1423–1429.
- Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at Scale. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 67, 13 pages. https://doi.org/10.1145/3491102.3502042
- Humanoid: A deep learning-based approach to automated black-box android app testing. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1070–1073.
- Focal Loss for Dense Object Detection. In 2017 IEEE International Conference on Computer Vision (ICCV). 2999–3007. https://doi.org/10.1109/ICCV.2017.324
- William MacNamara. 2017. Evaluating the Effectiveness of the Gestalt Principles of Perceptual Observation for Virtual Reality User Interface Design. Master’s thesis. Technological University Dublin. https://api.semanticscholar.org/CorpusID:59591184
- Magic Layouts: Structural Prior for Component Detection in User Interface Designs. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15804–15813. https://doi.org/10.1109/CVPR46437.2021.01555
- Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE Transactions on Software Engineering 46, 2 (2020), 196–221. https://doi.org/10.1109/TSE.2018.2844788
- Author’s Name. 2019. UI Design in Practice: Gestalt Principles. https://uxmisfit.com/2019/04/23/ui-design-in-practice-gestalt-principles/. Accessed: 2024-02-24.
- Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces with REMAUI (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 248–259. https://doi.org/10.1109/ASE.2015.32
- RoScript: A Visual Script Driven Truly Non-Intrusive Robotic Testing System for Touch Screen Applications. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 297–308. https://doi.org/10.1145/3377811.3380431
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
- Alex Robinson. 2019. Sketch2code: Generating a website from a paper mockup. arXiv preprint arXiv:1905.13750 (2019).
- Epidemiology as a Framework for Large-Scale Mobile Application Accessibility Assessment. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (Baltimore, Maryland, USA) (ASSETS ’17). Association for Computing Machinery, New York, NY, USA, 2–11. https://doi.org/10.1145/3132525.3132547
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
- R. Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991
- Satoshi Suzuki and KeiichiA be. 1985. Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing 30, 1 (1985), 32–46. https://doi.org/10.1016/0734-189X(85)90016-7
- Shane Torbert. 2016. Applied Computer Science. Springer Cham, Cham, Switzerland. https://doi.org/10.1007/978-3-319-30866-1
- velosoft. 2023. CodeFun. https://code.fun/. Accessed: 2024-02-24.
- Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 498–510. https://doi.org/10.1145/3472749.3474765
- Max Wertheimer. 1923. Untersuchungen zur Lehre von der Gestalt. II. Psychologische forschung 4, 1 (1923), 301–350.
- Improving Random GUI Testing with Image-Based Widget Detection. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). Association for Computing Machinery, New York, NY, USA, 307–317. https://doi.org/10.1145/3293882.3330551
- WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 286, 14 pages. https://doi.org/10.1145/3544548.3581158
- Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 470–483. https://doi.org/10.1145/3472749.3474763
- UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box Attention. In CAAI International Conference on Artificial Intelligence. Springer, 303–314.
- IconIntent: Automatic Identification of Sensitive UI Widgets Based on Icon Classification for Android Apps. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 257–268. https://doi.org/10.1109/ICSE.2019.00041
- UIED: A Hybrid Tool for GUI Element Detection. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 1655–1659. https://doi.org/10.1145/3368089.3417940
- Psychologically-Inspired, Unsupervised Inference of Perceptual Groups of GUI Widgets from GUI Images. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 332–343. https://doi.org/10.1145/3540250.3549138
- Multimodal Icon Annotation For Mobile Applications. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction (Toulouse & Virtual, France) (MobileHCI ’21). Association for Computing Machinery, New York, NY, USA, Article 8, 11 pages. https://doi.org/10.1145/3447526.3472064
- Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 275, 15 pages. https://doi.org/10.1145/3411764.3445186
- A New Strategy for Improving the Accuracy in Scene Text Recognition. In 2023 4th International Conference on Electronic Communication and Artificial Intelligence (ICECAI). 319–323. https://doi.org/10.1109/ICECAI58670.2023.10176817
- EAST: An Efficient and Accurate Scene Text Detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2642–2651. https://doi.org/10.1109/CVPR.2017.283
- Deformable detr: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations. 1–16. https://openreview.net/forum?id=gZ9hCDWe6ke
- Shuhong Xiao (9 papers)
- Yunnong Chen (11 papers)
- Yaxuan Song (10 papers)
- Liuqing Chen (16 papers)
- Lingyun Sun (38 papers)
- Yankun Zhen (6 papers)
- Yanfang Chang (4 papers)