AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding (2404.01240v1)
Abstract: Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected code coverage - particularly on sophisticated proprietary apps. Prior work has illustrated that a primary cause of these coverage deficiencies is related to so-called tarpits, or complex screens that are difficult to navigate. In this paper, we take a critical step toward enabling AIG tools to effectively navigate tarpits during app exploration through a new form of automated semantic screen understanding. We introduce AURORA, a technique that learns from the visual and textual patterns that exist in mobile app UIs to automatically detect common screen designs and navigate them accordingly. The key idea of AURORA is that there are a finite number of mobile app screen designs, albeit with subtle variations, such that the general patterns of different categories of UI designs can be learned. As such, AURORA employs a multi-modal, neural screen classifier that is able to recognize the most common types of UI screen designs. After recognizing a given screen, it then applies a set of flexible and generalizable heuristics to properly navigate the screen. We evaluated AURORA both on a set of 12 apps with known tarpits from prior work, and on a new set of five of the most popular apps from the Google Play store. Our results indicate that AURORA is able to effectively navigate tarpit screens, outperforming prior approaches that avoid tarpits by 19.6% in terms of method coverage. The improvements can be attributed to AURORA's UI design classification and heuristic navigation techniques.
- M. Linares-Vásquez, G. Bavota, C. Bernal-Cárdenas, M. Di Penta, R. Oliveto, and D. Poshyvanyk, “API Change and Fault Proneness: A Threat to the Success of Android Apps,” in ESEC/FSE, 2013.
- G. Bavota, M. Linares-Vásquez, C. E. Bernal-Cárdenas, M. D. Penta, R. Oliveto, and D. Poshyvanyk, “The Impact of API Change- and Fault-Proneness on the User Ratings of Android Apps,” TSE, 2015.
- F. Palomba, M. Linares-Vásquez, G. Bavota, R. Oliveto, M. Di Penta, D. Poshyvanyk, and A. De Lucia, “User Reviews Matter! Tracking Crowdsourced Reviews to Support Evolution of Successful Apps,” in ICSME, 2015.
- “Android and Google Play Statistics, Development Resources and Intelligence,” 2023. [Online]. Available: https://www.appbrain.com/stats
- P. S. Kochhar, F. Thung, N. Nagappan, T. Zimmermann, and D. Lo, “Understanding the Test Automation Culture of App Developers,” in ICST, 2015.
- O. Chaparro, C. Bernal-Cárdenas, J. Lu, K. Moran, A. Marcus, M. Di Penta, D. Poshyvanyk, and V. Ng, “Assessing the Quality of the Steps to Reproduce in Bug Reports,” in ESEC/FSE, 2019.
- J. Mahmud, N. De Silva, S. A. Khan, S. H. Mostafavi, S. M. H. Mansur, O. Chaparro, A. A. Marcus, and K. Moran, “On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization,” in ICSE, 2024.
- Y. Song, J. Mahmud, N. De Silva, Y. Zhou, O. Chaparro, K. Moran, A. Marcus, and D. Poshyvanyk, “Burt: A Chatbot for Interactive Bug Reporting,” in ICSE-Companion, 2023.
- Y. Yan, N. Cooper, O. Chaparro, K. Moran, and D. Poshyvanyk, “Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports,” in ICSE, 2024.
- Y. Song, J. Mahmud, Y. Zhou, O. Chaparro, K. Moran, A. Marcus, and D. Poshyvanyk, “Toward Interactive Bug Reporting for Android App End-users,” in ESEC/FSE, 2022.
- M. Fazzini, K. Moran, C. Bernal-Cárdenas, T. Wendland, A. Orso, and D. Poshyvanyk, “Enhancing Mobile App Bug Reporting via Real-Time Understanding of Reproduction Steps,” TSE, 2023.
- C. Bernal-Cárdenas, N. Cooper, M. Havranek, K. Moran, O. Chaparro, D. Poshyvanyk, and A. Marcus, “Translating Video Recordings of Complex Mobile App UI Gestures into Replayable Scenarios,” TSE, 2023.
- J. Johnson, J. Mahmud, T. Wendland, K. Moran, J. Rubin, and M. Fazzini, “An Empirical Investigation into the Reproduction of Bug Reports for Android Apps,” in SANER, 2022.
- N. Cooper, C. Bernal-Cárdenas, O. Chaparro, K. Moran, and D. Poshyvanyk, “It Takes Two to Tango: Combining Visual and Textual Information for Detecting Duplicate Video-Based Bug Reports,” in ICSE, 2021.
- M. Havranek, C. Bernal-Cárdenas, N. Cooper, O. Chaparro, D. Poshyvanyk, and K. Moran, “V2S: A Tool for Translating Video Recordings of Mobile App Usages into Replayable Scenarios,” in ICSE-Companion, 2021.
- C. Bernal-Cárdenas, N. Cooper, K. Moran, O. Chaparro, A. Marcus, and D. Poshyvanyk, “Translating Video Recordings of Mobile App Usages into Replayable Scenarios,” in ICSE, 2020.
- K. Moran, C. Watson, J. Hoskins, G. Purnell, and D. Poshyvanyk, “Detecting and Summarizing GUI Changes in Evolving Mobile Apps,” in ASE, 2018.
- S. Salma, S. H. Mansur, Y. Zhang, and K. Moran, “GuiEvo: Automated Evolution of Mobile App UIs,” in MSR, 2024.
- A. Orso and G. Rothermel, “Software Testing: A Research Travelogue (2000-2014),” in Future of Software Engineering Proceedings, 2014.
- “UI/Application Exerciser Monkey,” 2022. [Online]. Available: https://developer.android.com/studio/test/other-testing-tools/monkey
- T. Gu, C. Sun, X. Ma, C. Cao, C. Xu, Y. Yao, Q. Zhang, J. Lu, and Z. Su, “Practical GUI Testing of Android Applications via Model Abstraction and Refinement,” in ICSE, 2019.
- A. Machiry, R. Tahiliani, and M. Naik, “Dynodroid: An Input Generation System for Android apps,” in FSE, 2013.
- R. Sasnauskas and J. Regehr, “Intent Fuzzer: Crafting Intents of Death,” in WODA PERTEA, 2014.
- L. Ravindranath, S. Nath, J. Padhye, and H. Balakrishnan, “Automatic and Scalable Fault Detection for Mobile Applications,” in MobiSys, 2014.
- K. Moran, M. Linares-Vásquez, C. Bernal-Cárdenas, C. Vendome, and D. Poshyvanyk, “Automatically Discovering, Reporting and Reproducing Android Application Crashes,” in ICST, 2016.
- Y. M. Baek and D. H. Bae, “Automated Model-based Android GUI Testing using Multi-level GUI Comparison Criteria,” in ASE, 2016.
- K. Mao, M. Harman, and Y. Jia, “Sapienz: Multi-objective Automated Testing for Android Applications,” in ISSTA, 2016.
- Z. Dong, M. Böhme, L. Cojocaru, and A. Roychoudhury, “Time-travel Testing of Android Apps,” in ICSE, 2020.
- Y. Li, Z. Yang, Y. Guo, and X. Chen, “Humanoid: A Deep Learning-Based Approach to Automated Black-box Android App Testing,” in ASE, 2019.
- B. Deka, Z. Huang, C. Franzen, J. Hibschman, D. Afergan, Y. Li, J. Nichols, and R. Kumar, “Rico: A Mobile App Dataset for Building Data-driven Design Applications,” in UIST, 2017.
- Y. Zheng, Y. Liu, X. Xie, Y. Liu, L. Ma, J. Hao, and Y. Liu, “Automatic Web Testing using Curiosity-driven Reinforcement Learning,” in ICSE, 2021.
- W. Wang, D. Li, W. Yang, Y. Cao, Z. Zhang, Y. Deng, and T. Xie, “An Empirical Study of Android Test Generation Tools in Industrial Cases,” in ASE, 2018.
- W. Wang, W. Yang, T. Xu, and T. Xie, “VET: Identifying and Avoiding UI Exploration Tarpits,” in ESEC/FSE, 2021.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning Transferable Visual Models from Natural Language Supervision,” in ICML, 2021.
- “AURORA Replication Package,” 2024. [Online]. Available: https://sagelab.io/aurora
- “Imgur Google Play Store Page.” [Online]. Available: https://play.google.com/store/apps/details?id=com.imgur.mobile
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in NeurIPS, 2012.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in CVPR, 2016.
- D. Marutho, S. Hendra Handaka, E. Wijaya, and Muljono, “The Determination of Cluster Number at k-Mean Using Elbow Method and Purity Evaluation on Headline News,” in iSemantic, 2018.
- H. Zheng, D. Li, B. Liang, X. Zeng, W. Zheng, Y. Deng, W. Lam, W. Yang, and T. Xie, “Automated Test Input Generation for Android: Towards Getting There in an Industrial Case,” in ICSE-SEIP, 2017.
- G. Karpushkin, “The JSON Comparison package,” 2023. [Online]. Available: https://pypi.org/project/jsoncomparison
- X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “EAST: An Efficient and Accurate Scene Text Detector,” in CVPR, 2017.
- “Google Cloud Vision API.” [Online]. Available: https://cloud.google.com/vision/docs/ocr
- N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” 2019.
- W. Fu and T. Menzies, “Easy over Hard: A Case Study on Deep Learning,” in ESEC/FSE, 2017.
- “Tesseract OCR,” 2023. [Online]. Available: https://github.com/tesseract-ocr/tesseract
- A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving Language Understanding by Generative Pre-Training,” OpenAI, 2018.
- Swind, “pure python adb,” 2023. [Online]. Available: https://github.com/Swind/pure-python-adb
- T. J.-J. Li, L. Popowski, T. Mitchell, and B. A. Myers, “Screen2Vec: Semantic Embedding of GUI Screens and GUI Components,” in CHI, 2021.
- T. Su, G. Meng, Y. Chen, K. Wu, W. Yang, Y. Yao, G. Pu, Y. Liu, and Z. Su, “Guided, Stochastic Model-based GUI Testing of Android Apps,” in ESEC/FSE, 2017.
- “APE and Mini Trace Documentation,” 2023. [Online]. Available: http://gutianxiao.com/ape
- “Why Am I Not Seeing Any Ads?” [Online]. Available: https://support.applovin.com/hc/en-us/articles/4403932179597-Why-Am-I-Not-Seeing-Any-Ads
- M. Linares-Vasquez, M. White, C. Bernal-Cardenas, K. Moran, and D. Poshyvanyk, “Mining Android App Usages for Generating Actionable GUI-based Execution Scenarios,” in MSR, 2018.
- S. R. Choudhary, A. Gorla, and A. Orso, “Automated Test Input Generation for Android: Are We There Yet?” in ASE, 2015.
- S. Hao, B. Liu, S. Nath, W. G. Halfond, and R. Govindan, “PUMA: Programmable UI-automation for Large-scale Dynamic Analysis of Mobile Apps,” in MobiSys, 2014.
- T. Azim and I. Neamtiu, “Targeted and Depth-First Exploration for Systematic Testing of Android Apps,” in OOPSLA, 2013.
- D. Amalfitano, A. R. Fasolino, P. Tramontana, B. D. Ta, and A. M. Memon, “MobiGUITAR: Automated Model-Based Testing of Mobile Apps,” IEEE Software, 2015.
- M. Pan, A. Huang, G. Wang, T. Zhang, and X. Li, “Reinforcement Learning Based Curiosity-driven Testing of Android Applications,” in ISSTA, 2020.
- Z. Liu, C. Chen, J. Wang, X. Che, Y. Huang, J. Hu, and Q. Wang, “Fill in the Blank: Context-Aware Automated Text Input Generation for Mobile GUI Testing,” in ICSE, 2022.
- Safwat Ali Khan (6 papers)
- Wenyu Wang (75 papers)
- Yiran Ren (1 paper)
- Bin Zhu (218 papers)
- Jiangfan Shi (2 papers)
- Alyssa McGowan (1 paper)
- Wing Lam (4 papers)
- Kevin Moran (66 papers)