Enhancing Testing at Meta with Rich-State Simulated Populations (2403.15374v1)
Abstract: This paper reports the results of the deployment of Rich-State Simulated Populations at Meta for both automated and manual testing. We use simulated users (aka test users) to mimic user interactions and acquire state in much the same way that real user accounts acquire state. For automated testing, we present empirical results from deployment on the Facebook, Messenger, and Instagram apps for iOS and Android Platforms. These apps consist of tens of millions of lines of code, communicating with hundreds of millions of lines of backend code, and are used by over 2 billion people every day. Our results reveal that rich state increases average code coverage by 38\%, and endpoint coverage by 61\%. More importantly, it also yields an average increase of 115\% in the faults found by automated testing. The rich-state test user populations are also deployed in a (continually evolving) Test Universe; a web-enabled simulation platform for privacy-safe manual testing, which has been used by over 21,000 Meta engineers since its deployment in November 2022.
- Shadi Abdul Khalek and Sarfraz Khurshid. 2010. Automated SQL query generation for systematic testing of database engines. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering. 329–332.
- David Adam. 2020. Special report: The simulations driving the world’s response to COVID-19. Nature (April 2020).
- WES: Agent-based User Interaction Simulation on Real Infrastructure. In GI @ ICSE 2020, Shin Yoo, Justyna Petke, Westley Weimer, and Bobby R. Bruce (Eds.). ACM, 276–284. https://doi.org/doi:10.1145/3387940.3392089 Invited Keynote.
- Testing Web Enabled Simulation at Scale Using Metamorphic Testing. In International Conference on Software Engineering (ICSE) Software Engineering in Practice (SEIP) track. Virtual.
- Facebook’s Cyber–Cyber and Cyber–Physical Digital Twins. In 25th International Conference on Evaluation and Assessment in Software Engineering (EASE 2021). Virtual.
- Facebook’s Cyber–Cyber and Cyber–Physical Digital Twins (keynote paper). In 25th International Conference on Evaluation and Assessment in Software Engineering (EASE 2021). Virtual. Keynote talk given jointly by Inna Dvortsova and Mark Harman.
- Facebook’s Cyber–Cyber and Cyber–Physical Digital Twins. In Proceedings of the Evaluation and Assessment in Software Engineering (EASE 2021) Conference. to appear.
- A comprehensive survey on vehicular Ad Hoc network. Journal of Network and Computer Applications 37 (2014), 380 – 392.
- ARTE: Automated Generation of Realistic Test Inputs for Web APIs. IEEE Transactions on Software Engineering 49, 1 (2022), 348–363.
- Deploying Search Based Software Engineering with Sapienz at Facebook (keynote paper). In 10thsuperscript10𝑡ℎ10^{th}10 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Symposium on Search Based Software Engineering (SSBSE 2018). Montpellier, France, 3–45. Springer LNCS 11036.
- An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software 86, 8 (August 2013), 1978–2001.
- Automated concolic testing of smartphone apps. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, 59.
- Andrea Arcuri and Lionel Briand. 2011. A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering. In 33rdsuperscript33𝑟𝑑33^{rd}33 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPT International Conference on Software Engineering (ICSE’11) (Waikiki, Honolulu, HI, USA). ACM, New York, NY, USA, 1–10.
- The Oracle Problem in Software Testing: A Survey. IEEE Transactions on Software Engineering 41, 5 (May 2015), 507–525.
- Chronicler: Lightweight recording to reproduce field failures. In 35th International Conference on Software Engineering (ICSE). IEEE, 362–371.
- Reproducing concurrency failures from crash stacks. In Foundations of Software Engineering (FSE). 705–716.
- Measurement Challenges for Cyber Cyber Digital Twins: Experiences from the Deployment of Facebook’s WW Simulation System (keynote paper). In ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’21). Keynote talk given jointly by Maria Lomeli and Mark Harman.
- SELECT – a Formal System for Testing and Debugging Programs by Symbolic Execution. In International Conference on Reliable Software (Los Angeles, California). ACM, New York, NY, USA, 234–245.
- Mustafa Bozkurt and Mark Harman. 2012. Optimised Realistic Test Input Generation Using Web Services. In 4thsuperscript4𝑡ℎ4^{th}4 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Symposium on Search Based Software Engineering (SSBSE 2012). Riva del Garda, Italy, 105–120.
- Cristian Cadar and Koushik Sen. 2013. Symbolic Execution for Software Testing: Three Decades Later. Commun. ACM 56, 2 (Feb. 2013), 82–90.
- Realistic load testing of web applications. In Conference on Software Maintenance and Reengineering (CSMR’06). IEEE, 11–pp.
- Large Language Models for Software Engineering: Survey and Open Problems. In ICSE Future of Software Engineering (FoSE 2023. To Appear.
- Gordon Fraser and Andreas Zeller. 2010. Mutation-driven generation of unit tests and oracles. In International Symposium on Software Testing and Analysis (ISSTA 2010). ACM, Trento, Italy, 147–158. http://doi.acm.org/10.1145/1831708.1831728
- Dave Gray. 2015. Everything is a service. https://medium.com/the-connected-company/everything-is-a-service-96e668fc1fa4
- Search-based system testing: high coverage, no false alarms. In International Symposium on Software Testing and Analysis (ISSTA 2012). 67–77.
- Testability Transformation. IEEE Transactions on Software Engineering 30, 1 (Jan. 2004), 3–16.
- Achievements, open problems and challenges for search based software testing (keynote Paper). In 8thsuperscript8𝑡ℎ8^{th}8 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT IEEE International Conference on Software Testing, Verification and Validation (ICST 2015). Graz, Austria.
- Mark Harman and Phil McMinn. 2007. A Theoretical and Empirical Analysis of Evolutionary Testing and Hill Climbing for Structural Test Data Generation. In International Symposium on Software Testing and Analysis (ISSTA’07). Association for Computer Machinery, London, United Kingdom, 73 – 83.
- Search Based Software Engineering: Techniques, Taxonomy, Tutorial. In Empirical software engineering and verification: LASER 2009-2010, Bertrand Meyer and Martin Nordio (Eds.). Springer, 1–59. LNCS 7007.
- Mark Harman and Peter O’Hearn. 2018. From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis (keynote paper). In 18thsuperscript18𝑡ℎ18^{th}18 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2018). Madrid, Spain, 1–23.
- Wei Jin and Alessandro Orso. 2012. Bugredux: Reproducing field failures for in-house debugging. In 34th international conference on software engineering (ICSE). IEEE, 474–484.
- Stochastic weather simulation: Overview and analysis of two commonly used models. Journal of Applied Meteorology 35, 10 (1996), 1878–1896.
- James Cornelius King. 1969. A Program Verifier. Ph. D. Dissertation. Carnegie Mellon University.
- Sergio Luna and Michael J Pennock. 2018. Social media applications and emergency management: A literature review and research agenda. International journal of disaster risk reduction 28 (2018), 565–577.
- Sapienz: Multi-objective Automated Testing for Android Applications. In International Symposium on Software Testing and Analysis (ISSTA 2016). 94–105.
- FAUSTA: Scaling Dynamic Analysis with Traffic Generation at WhatsApp. In 15th IEEE Conference on Software Testing, Verification and Validation, ICST 2022, Valencia, Spain, April 4-14, 2022. IEEE, 267–278. https://doi.org/10.1109/ICST53961.2022.00036
- Phil McMinn. 2004. Search-based Software Test Data Generation: A Survey. Software Testing, Verification and Reliability 14, 2 (June 2004), 105–156.
- Transformed Vargha-Delaney effect size. In Search-Based Software Engineering: 7th International Symposium, SSBSE 2015, Bergamo, Italy, September 5-7, 2015, Proceedings 7. Springer, 318–324.
- Leveraging social media data in agent-based simulations. In Proceedings of the 2014 Annual Simulation Symposium. 1–8.
- A survey of twitter rumor spreading simulations. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9329 (2015), 113–122. https://doi.org/10.1007/978-3-319-24069-5_11
- MODA: Automated test generation for database applications via mock objects. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering. 289–292.
- Sergio Terzi and Sergio Cavalieri. 2004. Simulation in the supply chain context: a survey. Computers in Industry 53, 1 (2004), 3–16.
- Simulation-Driven Automated End-to-End Test and Oracle Inference. In 45th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, SEIP@ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 122–133.
- Towards reproducible research of event detection techniques for Twitter. In 2019 6th Swiss Conference on Data Science (SDS). IEEE, 69–74.
- Andreas Zeller. 2007. Beautiful Debugging. In Beautiful Code, Andy Oram and Greg Wilson (Eds.). O’Reilly & Associates, Inc., Sebastopol, CA 95472, 463–476. chapter 28.
- Shlomo Zilberstein. 1996. Using anytime algorithms in intelligent systems. AI magazine 17, 3 (1996), 73–73.