Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities (2309.13574v1)

Published 24 Sep 2023 in cs.SE

Abstract: This paper investigates the application of LLMs (LLM) in the domain of mobile application test script generation. Test script generation is a vital component of software testing, enabling efficient and reliable automation of repetitive test tasks. However, existing generation approaches often encounter limitations, such as difficulties in accurately capturing and reproducing test scripts across diverse devices, platforms, and applications. These challenges arise due to differences in screen sizes, input modalities, platform behaviors, API inconsistencies, and application architectures. Overcoming these limitations is crucial for achieving robust and comprehensive test automation. By leveraging the capabilities of LLMs, we aim to address these challenges and explore its potential as a versatile tool for test automation. We investigate how well LLMs can adapt to diverse devices and systems while accurately capturing and generating test scripts. Additionally, we evaluate its cross-platform generation capabilities by assessing its ability to handle operating system variations and platform-specific behaviors. Furthermore, we explore the application of LLMs in cross-app migration, where it generates test scripts across different applications and software environments based on existing scripts. Throughout the investigation, we analyze its adaptability to various user interfaces, app architectures, and interaction patterns, ensuring accurate script generation and compatibility. The findings of this research contribute to the understanding of LLMs' capabilities in test automation. Ultimately, this research aims to enhance software testing practices, empowering app developers to achieve higher levels of software quality and development efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. S. Yu, C. Fang, Y. Yun, and Y. Feng, “Layout and image recognition driving cross-platform automated mobile testing,” in IEEE/ACM 43rd International Conference on Software Engineering.   IEEE, 2021, pp. 1561–1571.
  2. C. Li, “Mobile gui test script generation from natural language descriptions using pre-trained model,” in Proceedings of the 9th IEEE/ACM International Conference on Mobile Software Engineering and Systems, 2022, pp. 112–113.
  3. X. Li, N. Chang, Y. Wang, H. Huang, Y. Pei, L. Wang, and X. Li, “Atom: Automatic maintenance of gui test scripts for evolving mobile applications,” in 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).   IEEE, 2017, pp. 161–171.
  4. Z. Gao, Z. Chen, Y. Zou, and A. M. Memon, “Sitar: Gui test script repair,” Ieee transactions on software engineering, vol. 42, no. 2, pp. 170–186, 2015.
  5. X. Qin, H. Zhong, and X. Wang, “Testmig: Migrating gui test cases from ios to android,” in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019, pp. 284–295.
  6. J.-W. Lin and S. Malek, “Gui test transfer from web to android,” in 2022 IEEE Conference on Software Testing, Verification and Validation.   IEEE, 2022, pp. 1–11.
  7. S. Liu, Y. Zhou, T. Han, and T. Chen, “Test reuse based on adaptive semantic matching across android mobile applications,” in 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS).   IEEE, 2022, pp. 703–709.
  8. L. Mariani, A. Mohebbi, M. Pezzè, and V. Terragni, “Semantic matching of gui events for test reuse: are we there yet?” in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021, pp. 177–190.
  9. F. Behrang and A. Orso, “Test migration for efficient large-scale assessment of mobile app coding assignments,” in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2018, pp. 164–175.
  10. ——, “Test migration between mobile apps with similar functionality,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2019, pp. 54–65.
  11. J.-W. Lin, R. Jabbarvand, and S. Malek, “Test transfer across mobile apps through semantic mapping,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2019, pp. 42–53.
  12. I. Ozkaya, “Application of large language models to software engineering tasks: Opportunities, risks, and implications,” IEEE Software, vol. 40, no. 3, pp. 4–8, 2023.
  13. R. Anbunathan and A. Basu, “An event based test automation framework for android mobiles,” in 2014 International Conference on Contemporary Computing and Informatics (IC3I).   IEEE, 2014, pp. 76–79.
  14. X. Zeng, D. Li, W. Zheng, F. Xia, Y. Deng, W. Lam, W. Yang, and T. Xie, “Automated test input generation for android: Are we really there yet in an industrial case?” in Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 987–992.
  15. Y. Li, Z. Yang, Y. Guo, and X. Chen, “Droidbot: a lightweight ui-guided test input generator for android,” in 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).   IEEE, 2017, pp. 23–26.
  16. J. Wang, Y. Jiang, C. Xu, C. Cao, X. Ma, and J. Lu, “Combodroid: generating high-quality test inputs for android apps via use case combinations,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020, pp. 469–480.
  17. Y. Choi, A. Seo, and H.-S. Kim, “Scriptpainter: Vision-based, on-device test script generation for mobile systems,” in 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).   IEEE, 2022, pp. 477–490.
  18. H. Tanno and X. Zhang, “Test script generation based on design documents for web application testing,” in 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 3.   IEEE, 2015, pp. 672–673.
  19. V. Dallmeier, B. Pohl, M. Burger, M. Mirold, and A. Zeller, “Webmate: Web application test generation in the real world,” in 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops.   IEEE, 2014, pp. 413–418.
  20. M. Iyama, H. Kirinuki, H. Tanno, and T. Kurabayashi, “Automatically generating test scripts for gui testing,” in 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops.   IEEE, 2018, pp. 146–150.
  21. A. Mesbah, A. Van Deursen, and D. Roest, “Invariant-based automatic testing of modern web applications,” IEEE Transactions on Software Engineering, vol. 38, no. 1, pp. 35–53, 2011.
  22. A. Rau, J. Hotzkow, and A. Zeller, “Poster: Efficient gui test generation by learning from tests of other apps,” in 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion.   IEEE, 2018, pp. 370–371.
  23. ——, “Transferring tests across web applications,” in Web Engineering: 18th International Conference, ICWE 2018, Cáceres, Spain, June 5-8, 2018, Proceedings 18.   Springer, 2018, pp. 50–64.
  24. Google, “https://developer.android.com/studio/test/other-testing-tools/monkey,” 2022.
  25. S. R. Choudhary, A. Gorla, and A. Orso, “Automated test input generation for android: Are we there yet? (e),” in Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering.   IEEE, 2015, pp. 429–440.
  26. F. Y. B. Daragh and S. Malek, “Deep gui: Black-box gui input generation with deep learning,” in Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering.   IEEE, 2021, pp. 905–916.
  27. D. Amalfitano, A. R. Fasolino, P. Tramontana, B. D. Ta, and A. M. Memon, “Mobiguitar: Automated model-based testing of mobile apps,” IEEE Software, vol. 32, no. 5, pp. 53–59, 2014.
  28. B. Yu, L. Ma, and C. Zhang, “Incremental web application testing using page object,” in Proceedings of the 2015 Third IEEE Workshop on Hot Topics in Web Systems and Technologies.   IEEE, 2015, pp. 1–6.
  29. K. Mao, M. Harman, and Y. Jia, “Sapienz: Multi-objective automated testing for android applications,” in Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016, pp. 94–105.
  30. T. Su, G. Meng, Y. Chen, K. Wu, W. Yang, Y. Yao, G. Pu, Y. Liu, and Z. Su, “Guided, stochastic model-based gui testing of android apps,” in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 245–256.
  31. T. Gu, C. Cao, T. Liu, C. Sun, J. Deng, X. Ma, and J. Lü, “Aimdroid: Activity-insulated multi-level automated testing for android applications,” in Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution.   IEEE, 2017, pp. 103–114.
  32. M. Biagiola, A. Stocco, F. Ricca, and P. Tonella, “Diversity-based web test generation,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 142–153.
  33. T. D. White, G. Fraser, and G. J. Brown, “Improving random gui testing with image-based widget detection,” in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019, pp. 307–317.
  34. Y. Li, Z. Yang, Y. Guo, and X. Chen, “Humanoid: A deep learning-based approach to automated black-box android app testing,” in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).   IEEE, 2019, pp. 1070–1073.
  35. Y. Koroglu, A. Sen, O. Muslu, Y. Mete, C. Ulker, T. Tanriverdi, and Y. Donmez, “Qbe: Qlearning-based exploration of android applications,” in 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).   IEEE, 2018, pp. 105–115.
  36. Z. Lv, C. Peng, Z. Zhang, T. Su, K. Liu, and P. Yang, “Fastbot2: Reusable automated model-based gui testing for android enhanced by reinforcement learning,” in 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–5.
  37. M. Pan, A. Huang, G. Wang, T. Zhang, and X. Li, “Reinforcement learning based curiosity-driven testing of android applications,” in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 153–164.
  38. A. Romdhana, A. Merlo, M. Ceccato, and P. Tonella, “Deep reinforcement learning for black-box testing of android apps,” ACM Transactions on Software Engineering and Methodology, nov 2021, just Accepted. [Online]. Available: https://doi.org/10.1145/3502868
  39. F. Wang and W. Du, “A test automation framework based on web,” in 2012 IEEE/ACIS 11th International Conference on Computer and Information Science.   IEEE, 2012, pp. 683–687.
  40. S. Anand, M. Naik, M. J. Harrold, and H. Yang, “Automated concolic testing of smartphone apps,” in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012, pp. 1–11.
  41. D. Xu, W. Xu, M. Kent, L. Thomas, and L. Wang, “An automated test generation technique for software quality assurance,” IEEE transactions on reliability, vol. 64, no. 1, pp. 247–268, 2014.
  42. L. Mariani, M. Pezzè, V. Terragni, and D. Zuddas, “An evolutionary approach to adapt tests across mobile apps,” in 2021 IEEE/ACM International Conference on Automation of Software Test (AST).   IEEE, 2021, pp. 70–79.
  43. G. de Cleva Farto and A. T. Endo, “Reuse of model-based tests in mobile apps,” in Proceedings of the XXXI Brazilian Symposium on Software Engineering, 2017, pp. 184–193.
  44. S. Thummalapenta, P. Devaki, S. Sinha, S. Chandra, S. Gnanasundaram, D. D. Nagaraj, S. Kumar, and S. Kumar, “Efficient and change-resilient test automation: An industrial case study,” in 2013 35th International Conference on Software Engineering (ICSE).   IEEE, 2013, pp. 1002–1011.
  45. L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y. Yang, J. Callan, and G. Neubig, “Pal: Program-aided language models,” in International Conference on Machine Learning.   PMLR, 2023, pp. 10 764–10 799.
  46. J. Liu, A. Liu, X. Lu, S. Welleck, P. West, R. L. Bras, Y. Choi, and H. Hajishirzi, “Generated knowledge prompting for commonsense reasoning,” arXiv preprint arXiv:2110.08387, 2021.
  47. X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022.
  48. J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
  49. G. Sridhara, S. Mazumdar et al., “Chatgpt: A study on its utility for ubiquitous software engineering tasks,” arXiv preprint arXiv:2305.16837, 2023.
  50. H. Tian, W. Lu, T. O. Li, X. Tang, S.-C. Cheung, J. Klein, and T. F. Bissyandé, “Is chatgpt the ultimate programming assistant–how far is it?” arXiv preprint arXiv:2304.11938, 2023.
  51. S. Mandal, A. Chethan, V. Janfaza, S. Mahmud, T. A. Anderson, J. Turek, J. J. Tithi, and A. Muzahid, “Large language models based automatic synthesis of software specifications,” arXiv preprint arXiv:2304.09181, 2023.
  52. Z. Xing, Q. Huang, Y. Cheng, L. Zhu, Q. Lu, and X. Xu, “Prompt sapper: Llm-empowered software engineering infrastructure for ai-native services,” arXiv preprint arXiv:2306.02230, 2023.
  53. S. I. Ross, F. Martinez, S. Houde, M. Muller, and J. D. Weisz, “The programmer’s assistant: Conversational interaction with a large language model for software development,” in Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023, pp. 491–514.
  54. J. White, S. Hays, Q. Fu, J. Spencer-Smith, and D. C. Schmidt, “Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design,” arXiv preprint arXiv:2303.07839, 2023.
  55. V. Raychev, M. Vechev, and E. Yahav, “Code completion with statistical language models,” in Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation, 2014, pp. 419–428.
  56. P. Vaithilingam, T. Zhang, and E. L. Glassman, “Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models,” in Chi conference on human factors in computing systems extended abstracts, 2022, pp. 1–7.
  57. K. T. Le, G. Rashidi, and A. Andrzejak, “A methodology for refined evaluation of neural code completion approaches,” Data Mining and Knowledge Discovery, vol. 37, no. 1, pp. 167–204, 2023.
  58. S. MacNeil, A. Tran, A. Hellas, J. Kim, S. Sarsa, P. Denny, S. Bernstein, and J. Leinonen, “Experiences from using code explanations generated by large language models in a web software development e-book,” in Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, 2023, pp. 931–937.
  59. T. Ahmed and P. Devanbu, “Few-shot training llms for project-specific code-summarization,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–5.
  60. Y. Su, C. Wan, U. Sethi, S. Lu, M. Musuvathi, and S. Nath, “Hotgpt: How to make software documentation more useful with a large language model?” in Proceedings of the 19th Workshop on Hot Topics in Operating Systems, 2023, pp. 87–93.
  61. M. Allamanis, M. Brockschmidt, and M. Khademi, “Learning to represent programs with graphs,” arXiv preprint arXiv:1711.00740, 2017.
  62. X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching large language models to self-debug,” arXiv preprint arXiv:2304.05128, 2023.
  63. S. Kang, J. Yoon, and S. Yoo, “Large language models are few-shot testers: Exploring llm-based general bug reproduction,” arXiv preprint arXiv:2209.11515, 2022.
  64. M. Schäfer, S. Nadi, A. Eghbali, and F. Tip, “Adaptive test generation using a large language model,” arXiv preprint arXiv:2302.06527, 2023.
  65. S. Feng and C. Chen, “Prompting is all your need: Automated android bug replay with large language models,” arXiv preprint arXiv:2306.01987, 2023.
  66. Z. Liu, C. Chen, J. Wang, X. Che, Y. Huang, J. Hu, and Q. Wang, “Fill in the blank: Context-aware automated text input generation for mobile gui testing,” arXiv preprint arXiv:2212.04732, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shengcheng Yu (16 papers)
  2. Chunrong Fang (71 papers)
  3. Yuchen Ling (5 papers)
  4. Chentian Wu (1 paper)
  5. Zhenyu Chen (91 papers)
Citations (26)