Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Guiding ChatGPT to Fix Web UI Tests via Explanation-Consistency Checking (2312.05778v2)

Published 10 Dec 2023 in cs.SE

Abstract: The rapid evolution of Web UI incurs time and effort in maintaining UI tests. Existing techniques in Web UI test repair focus on finding the target elements on the new web page that match the old ones so that the corresponding broken statements can be repaired. We present the first study that investigates the feasibility of using prior Web UI repair techniques for initial local matching and then using ChatGPT to perform global matching. Our key insight is that given a list of elements matched by prior techniques, ChatGPT can leverage the language understanding to perform global view matching and use its code generation model for fixing the broken statements. To mitigate hallucination in ChatGPT, we design an explanation validator that checks whether the provided explanation for the matching results is consistent, and provides hints to ChatGPT via a self-correction prompt to further improve its results. Our evaluation on a widely used dataset shows that the ChatGPT-enhanced techniques improve the effectiveness of existing Web test repair techniques. Our study also shares several important insights in improving future Web UI test repair techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. 2022. https://platform.openai.com/docs/api-reference
  2. 2022. https://github.com/mantisbt/mantisbt
  3. Kai Briechle and Uwe D Hanebeck. 2001. Template matching using fast normalized cross correlation. In Optical Pattern Recognition XII, Vol. 4387. SPIE, 95–102.
  4. SFTM: Fast matching of web pages using Similarity-based Flexible Tree Matching. Information Systems 112 (2023), 102126.
  5. Change-based test script maintenance for android apps. In 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 215–225.
  6. An Improving Approach for DOM-Based Web Test Suite Repair. In Web Engineering. Springer International Publishing, 372–387.
  7. Water: Web application test repair. In Proceedings of the First International Workshop on End-to-End Test Script Engineering. 24–29.
  8. Reassert: a tool for repairing broken unit tests. In Proceedings of the 33rd International Conference on Software Engineering. 1010–1012.
  9. Automated repair of programs from large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469–1481.
  10. Flakify: A Black-Box, Language Model-Based Predictor for Flaky Tests. IEEE Transactions on Software Engineering 49, 4 (2023), 1912–1927. https://doi.org/10.1109/TSE.2022.3201209
  11. Socratest: Towards Autonomous Testing Agents via Conversational Large Language Models. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering.
  12. Sidong Feng and Chunyang Chen. 2023. Prompting Is All Your Need: Automated Android Bug Replay with Large Language Models. arXiv preprint arXiv:2306.01987 (2023).
  13. SITAR: GUI Test Script Repair. IEEE Transactions on Software Engineering 42, 2 (2016), 170–186. https://doi.org/10.1109/TSE.2015.2454510
  14. Why do Record/Replay Tests of Web Applications Break?. In 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST). 180–190. https://doi.org/10.1109/ICST.2016.16
  15. Pete Houston. 2013. Instant jsoup How-to. Packt Publishing Ltd.
  16. An automated model-based approach to repair test suites of evolving web applications. Journal of Systems and Software 171 (2021), 110841.
  17. A Flexible Algorithmic Approach for Identifying Conflicting/Deviating Data on the Web. In 2018 International Conference on Computer, Information and Telecommunication Systems (CITS). 1–5. https://doi.org/10.1109/CITS.2018.8440185
  18. Repair Is Nearly Generation: Multilingual Program Repair with LLMs. Proceedings of the AAAI Conference on Artificial Intelligence 37, 4 (Jun. 2023), 5131–5140. https://doi.org/10.1609/aaai.v37i4.25642
  19. COLOR: correct locator recommender for broken test scripts using various clues in web application. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 310–320.
  20. CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-Trained Large Language Models. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 919–931. https://doi.org/10.1109/ICSE48619.2023.00085
  21. Using multi-locators to increase the robustness of web test cases. In 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST). IEEE, 1–10.
  22. On Sampling Top-K Recommendation Evaluation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 2114–2124. https://doi.org/10.1145/3394486.3403262
  23. ATOM: Automatic Maintenance of GUI Test Scripts for Evolving Mobile Applications. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST). 161–171. https://doi.org/10.1109/ICST.2017.22
  24. DEAR: A Novel Deep Learning-based Approach for Automated Program Repair. In 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). 511–523. https://doi.org/10.1145/3510003.3510177
  25. Automated Fixing of Web UI Tests via Iterative Element Matching. In 38th IEEE/ACM International Conference on Automated Software Engineering.
  26. Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1355–1367. https://doi.org/10.1109/ICSE48619.2023.00119
  27. Coconut: combining context-aware neural translation models using ensemble for program repair. In Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis. 101–114.
  28. F. Ricca M. Leotta, A. Stocco and P. Tonella. 2016. Robula+: An algorithm for generating robust XPath locators for web testing. In J. Softw. Evol. Process, Vol. 28. 177–204.
  29. Semantic Matching of GUI Events for Test Reuse: Are We There Yet?. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) (Virtual, Denmark). Association for Computing Machinery, New York, NY, USA, 177–190. https://doi.org/10.1145/3460319.3464827
  30. Ehsan Mashhadi and Hadi Hemmati. 2021. Applying CodeBERT for Automated Program Repair of Java Simple Bugs. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 505–509. https://doi.org/10.1109/MSR52588.2021.00063
  31. OpenAI. 2023a. Six Strategies for Getting Better Results with GPT. https://platform.openai.com/docs/guides/prompt-engineering/six-strategies-for-getting-better-results Accessed: November 15, 2023.
  32. OpenAI. 2023b. Strategy: Split Complex Tasks into Simpler Subtasks. https://platform.openai.com/docs/guides/prompt-engineering/strategy-split-complex-tasks-into-simpler-subtasks Accessed: November 15, 2023.
  33. OpenAI Help. 2023. ChatGPT API Transition Guide. https://help.openai.com/en/articles/7042661-chatgpt-api-transition-guide Accessed: November 15, 2023.
  34. GUI-Guided Test Script Repair for Mobile Apps. IEEE Transactions on Software Engineering (2020), 1–1. https://doi.org/10.1109/TSE.2020.3007664
  35. Cross-Device Difference Detector for Mobile Application GUI Compatibility Testing. In 2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 253–260.
  36. Eric Sven Ristad and Peter N Yianilos. 1998. Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 5 (1998), 522–532.
  37. ChatGPT: Opportunities, Features and Future Prospects. In 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI). 1614–1622. https://doi.org/10.1109/ICOEI56765.2023.10125747
  38. WebEvo: taming web application evolution via detecting semantic structure changes. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 16–28.
  39. An Analysis of the Automatic Bug Fixing Performance of ChatGPT. In 2023 IEEE/ACM International Workshop on Automated Program Repair (APR). 23–30. https://doi.org/10.1109/APR59189.2023.00012
  40. Visual web test repair. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 503–514.
  41. Robert F Tate. 1954. Correlation between a discrete and a continuous variable. Point-biserial correlation. The Annals of mathematical statistics 25, 3 (1954), 603–607.
  42. Better Language Models of Code through Self-Improvement. arXiv preprint arXiv:2304.01228 (2023).
  43. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.
  44. DevGPT: Studying Developer-ChatGPT Conversations. In Proceedings of the International Conference on Mining Software Repositories (MSR 2024).
  45. Guider: Gui structure and vision co-guided test script repair for android apps. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 191–203.
  46. Repairing Fragile GUI Test Cases Using Word and Layout Embedding. In 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). 291–301. https://doi.org/10.1109/ICST53961.2022.00038
  47. LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities. arXiv preprint arXiv:2309.13574 (2023).
  48. Automatically repairing broken workflows for evolving GUI applications. In Proceedings of the 2013 International Symposium on Software Testing and Analysis. 45–55.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zhuolin Xu (3 papers)
  2. Qiushi Li (10 papers)
  3. Shin Hwei Tan (20 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.