Seeing is Believing: Vision-driven Non-crash Functional Bug Detection for Mobile Apps (2407.03037v2)
Abstract: Mobile app GUI (Graphical User Interface) pages now contain rich visual information, with the visual semantics of each page helping users understand the application logic. However, these complex visual and functional logic present new challenges to software testing. Existing automated GUI testing methods, constrained by the lack of reliable testing oracles, are limited to detecting crash bugs with obvious abnormal signals. Consequently, many non-crash functional bugs, ranging from unexpected behaviors to logical errors, often evade detection by current techniques. While these non-crash functional bugs can exhibit visual cues that serve as potential testing oracles, they often entail a sequence of screenshots, and detecting them necessitates an understanding of the operational logic among GUI page transitions, which is challenging traditional techniques. Considering the remarkable performance of Multimodal LLMs (MLLM) in visual and language understanding, this paper proposes Trident, a novel vision-driven, multi-agent collaborative automated GUI testing approach for detecting non-crash functional bugs. It comprises three agents: Explorer, Monitor, and Detector, to guide the exploration, oversee the testing progress, and spot issues. We also address several challenges, i.e., align visual and textual information for MLLM input, achieve functionality-oriented exploration, and infer test oracles for non-crash bugs, to enhance the performance of functionality bug detection. We evaluate Trident on 590 non-crash bugs and compare it with 12 baselines, it can achieve more than 14%-112% and 108%-147% boost in average recall and precision compared with the best baseline. The ablation study further proves the contribution of each module. Moreover, Trident identifies 43 new bugs on Google Play, of which 31 have been fixed.
- S. Pargaonkar, “A comprehensive review of performance testing methodologies and best practices: Software quality engineering,” International Journal of Science and Research (IJSR), vol. 12, no. 8, pp. 2008–2014, 2023.
- N. Mirzaei, J. Garcia, H. Bagheri, A. Sadeghi, and S. Malek, “Reducing combinatorics in gui testing of android applications,” in 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 2016, pp. 559–570.
- S. Yang, H. Wu, H. Zhang, Y. Wang, C. Swaminathan, D. Yan, and A. Rountev, “Static window transition graphs for android,” Automated Software Engineering, vol. 25, no. 4, pp. 833–873, 2018.
- W. Yang, M. R. Prasad, and T. Xie, “A grey-box approach for automated gui-model generation of mobile applications,” in International Conference on Fundamental Approaches to Software Engineering. Springer, 2013, pp. 250–265.
- X. Zeng, D. Li, W. Zheng, F. Xia, Y. Deng, W. Lam, W. Yang, and T. Xie, “Automated test input generation for android: Are we really there yet in an industrial case?” in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 987–992.
- K. Mao, M. Harman, and Y. Jia, “Sapienz: Multi-objective automated testing for android applications,” in Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016, pp. 94–105.
- T. Su, G. Meng, Y. Chen, K. Wu, W. Yang, Y. Yao, G. Pu, Y. Liu, and Z. Su, “Guided, stochastic model-based gui testing of android apps,” in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 245–256.
- Z. Dong, M. Böhme, L. Cojocaru, and A. Roychoudhury, “Time-travel testing of android apps,” in 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 2020, pp. 481–492.
- T. Gu, C. Sun, X. Ma, C. Cao, C. Xu, Y. Yao, Q. Zhang, J. Lu, and Z. Su, “Practical gui testing of android applications via model abstraction and refinement,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 2019, pp. 269–280.
- J. Wang, Y. Jiang, C. Xu, C. Cao, X. Ma, and J. Lu, “Combodroid: generating high-quality test inputs for android apps via use case combinations,” in ICSE, 2020, pp. 469–480.
- Z. Liu, C. Chen, J. Wang, and Q. Wang, “Guided bug crush: Assist manual gui testing of android apps via hint moves,” in CHI 2022, 2022. [Online]. Available: https://doi.org/10.1145/3491102.3501903
- Z. Liu, C. Chen, J. Wang, Y. Huang, J. Hu, and Q. Wang, “Owl eyes: Spotting ui display issues via visual understanding,” in ASE. IEEE, 2020.
- ——, “Nighthawk: Fully automated localizing ui display issues via visual understanding.” IEEE, 2022, pp. 1–16.
- M. Fazzini and A. Orso, “Automated cross-platform inconsistency detection for mobile apps,” in ASE. IEEE, 2017.
- Y. Su, C. Chen, J. Wang, Z. Liu, D. Wang, S. Li, and Q. Wang, “The metamorphosis: Automatic detection of scaling issues for mobile apps,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–12.
- K. Chen, Y. Li, Y. Chen, C. Fan, Z. Hu, and W. Yang, “Glib: towards automated test oracle for graphically-rich applications,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 1093–1104.
- J. Hu, L. Wei, Y. Liu, S.-C. Cheung, and H. Huang, “A tale of two cities: How webview induces bugs to android applications,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018, pp. 702–713.
- S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of artificial general intelligence: Early experiments with gpt-4,” arXiv preprint arXiv:2303.12712, 2023.
- “Github,” https://github.com/, 2024.
- C. B. Seaman, “Qualitative methods in empirical studies of software engineering,” IEEE Transactions on software engineering, vol. 25, no. 4, pp. 557–572, 1999.
- UIAutomator, “Python wrapper of android uiautomator test tool.” https://github.com/xiaocong/uiautomator, 2021.
- X. Zhang, L. de Greef, A. Swearngin, S. White, K. Murray, L. Yu, Q. Shan, J. Nichols, J. Wu, C. Fleizach et al., “Screen recognition: Creating accessibility metadata for mobile applications from pixels,” in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–15.
- S. Ghosh, M. Halappanavar, A. Tumeo, A. Kalyanaraman, H. Lu, D. Chavarria-Miranda, A. Khan, and A. Gebremedhin, “Distributed louvain algorithm for graph community detection,” in 2018 IEEE international parallel and distributed processing symposium (IPDPS). IEEE, 2018, pp. 885–895.
- Z. Liu, C. Chen, J. Wang, M. Chen, B. Wu, X. Che, D. Wang, and Q. Wang, “Chatting with gpt-3 for zero-shot human-like mobile automated gui testing,” arXiv preprint arXiv:2305.09434, 2023.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” ICLR, 2013.
- “pyvbox,” https://pypi.org/project/pyvbox/, 2023.
- “Android debug bridge (adb),” https://developer.android.com/studio/command-line/adb.html#forwardports, 2020.
- Y. Xiong, M. Xu, T. Su, J. Sun, J. Wang, H. Wen, G. Pu, J. He, and Z. Su, “An empirical study of functional bugs in android apps,” in Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023, pp. 1319–1331.
- J. Wang, Y. Jiang, T. Su, S. Li, C. Xu, J. Lu, and Z. Su, “Detecting non-crashing functional bugs in android apps via deep-state differential analysis,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 434–446.
- “Google play,” https://play.google.com/store/apps/, 2022.
- A. Ghaleb and K. Pattabiraman, “How effective are smart contract analysis tools? evaluating smart contract static analysis tools using bug injection,” in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 415–427.
- P. Bhattacharya, L. Ulanova, I. Neamtiu, and S. C. Koduru, “An empirical analysis of bug reports and bug fixing in open source android apps,” in 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 2013, pp. 133–143.
- G. Hu, X. Yuan, Y. Tang, and J. Yang, “Efficiently, effectively detecting mobile app bugs with appdoctor,” in Proceedings of the Ninth European Conference on Computer Systems, 2014, pp. 1–15.
- T. Cai, Z. Zhang, and P. Yang, “Fastbot: A multi-agent model-based test generation system beijing bytedance network technology co., ltd.” in Proceedings of the IEEE/ACM 1st International Conference on Automation of Software Test, 2020, pp. 93–96.
- Y. Li, Z. Yang, Y. Guo, and X. Chen, “Humanoid: a deep learning-based approach to automated black-box android app testing,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 1070–1073.
- W. Guo, Z. Dong, L. Shen, W. Tian, T. Su, and X. Peng, “Detecting and fixing data loss issues in android apps,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 605–616.
- J. Sun, T. Su, K. Liu, C. Peng, Z. Zhang, G. Pu, T. Xie, and Z. Su, “Characterizing and finding system setting-related defects in android apps,” IEEE Transactions on Software Engineering, 2023.
- T. Su, Y. Yan, J. Wang, J. Sun, Y. Xiong, G. Pu, K. Wang, and Z. Su, “Fully automated functional fuzzing of android apps for detecting non-crashing logic bugs,” Proceedings of the ACM on Programming Languages, vol. 5, no. OOPSLA, pp. 1–31, 2021.
- http://tools.android.com/tips/lint, 2020.
- U. Farooq, Z. Zhao, M. Sridharan, and I. Neamtiu, “Livedroid: Identifying and preserving mobile app state in volatile runtime environments,” Proceedings of the ACM on Programming Languages, vol. 4, no. OOPSLA, pp. 1–30, 2020.
- Y. Li, Z. Yang, Y. Guo, and X. Chen, “Droidbot: a lightweight ui-guided test input generator for android,” in 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). IEEE, 2017, pp. 23–26.
- Y. He, L. Zhang, Z. Yang, Y. Cao, K. Lian, S. Li, W. Yang, Z. Zhang, M. Yang, Y. Zhang et al., “Textexerciser: feedback-driven text input exercising for android applications,” in 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020, pp. 1071–1087.
- P. Liu, X. Zhang, M. Pistoia, Y. Zheng, M. Marques, and L. Zeng, “Automatic text input generation for mobile testing,” in 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 2017, pp. 643–653.
- Y. L. Arnatovich, L. Wang, N. M. Ngo, and C. Soh, “Mobolic: An automated approach to exercising mobile application guis using symbiosis of online testing technique and customated input generation,” Software: Practice and Experience, vol. 48, no. 5, pp. 1107–1142, 2018.
- W. Wang, W. Yang, T. Xu, and T. Xie, “Vet: identifying and avoiding ui exploration tarpits,” in FSE, 2021, pp. 83–94.
- M. Pan, A. Huang, G. Wang, T. Zhang, and X. Li, “Reinforcement learning based curiosity-driven testing of android applications,” in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 153–164.
- “Android,” https://developer.android.google/topic/, 2022.
- Q. Xie and A. M. Memon, “Designing and comparing automated test oracles for gui-based software applications,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 16, no. 1, pp. 4–es, 2007.
- S. Anand, M. Naik, M. J. Harrold, and H. Yang, “Automated concolic testing of smartphone apps,” in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012, pp. 1–11.
- T. Wu, X. Deng, J. Yan, and J. Zhang, “Analyses for specific defects in android applications: A survey,” Frontiers of Computer Science, pp. 1–18, 2019.
- R. Jabbarvand, J.-W. Lin, and S. Malek, “Search-based energy testing of android,” in ICSE. IEEE, 2019, pp. 1119–1130.
- R. Matinnejad, S. Nejati, and L. C. Briand, “Automated testing of hybrid simulink/stateflow controllers: industrial case studies,” in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 938–943.
- C. Q. Adamsen, G. Mezzetti, and A. Møller, “Systematic execution of android test suites in adverse conditions,” in Proceedings of the 2015 International Symposium on Software Testing and Analysis, 2015, pp. 83–93.
- E. S. Lam, P. Zhang, and B.-Y. E. Chang, “Chimpcheck: property-based randomized test generation for interactive apps,” in Proceedings of the 2017 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, 2017, pp. 58–77.
- G. Hu, L. Zhu, and J. Yang, “Appflow: using machine learning to synthesize robust, reusable ui tests,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 269–282.
- A. Rosenfeld, O. Kardashov, and O. Zang, “Automation of android applications functional testing using machine learning activities classification,” in Proceedings of the 5th international conference on mobile software engineering and systems, 2018, pp. 122–132.
- F. Behrang and A. Orso, “Test migration between mobile apps with similar functionality,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 54–65.
- J.-W. Lin, R. Jabbarvand, and S. Malek, “Test transfer across mobile apps through semantic mapping,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 42–53.
- Y.-D. Lin, J. F. Rojas, E. T.-H. Chu, and Y.-C. Lai, “On the accuracy, efficiency, and reusability of automated test oracles for android devices,” IEEE Transactions on Software Engineering, vol. 40, no. 10, pp. 957–970, 2014.
- C. Escobar-Velásquez, M. Osorio-Riaño, J. Dominguez-Osorio, M. Arevalo, and M. Linares-Vásquez, “An empirical study of i18n collateral changes and bugs in guis of android apps,” in 2020 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 2020, pp. 581–592.
- Zhe Liu (234 papers)
- Cheng Li (1094 papers)
- Chunyang Chen (86 papers)
- Junjie Wang (164 papers)
- Boyu Wu (8 papers)
- Yawen Wang (11 papers)
- Jun Hu (239 papers)
- Qing Wang (341 papers)
- Mengzhuo Chen (5 papers)