VAMP: Visual Analytics for Microservices Performance (2404.14273v1)
Abstract: Analysis of microservices' performance is a considerably challenging task due to the multifaceted nature of these systems. Each request to a microservices system might raise several Remote Procedure Calls (RPCs) to services deployed on different servers and/or containers. Existing distributed tracing tools leverage swimlane visualizations as the primary means to support performance analysis of microservices. These visualizations are particularly effective when it is needed to investigate individual end-to-end requests' performance behaviors. Still, they are substantially limited when more complex analyses are required, as when understanding the system-wide performance trends is needed. To overcome this limitation, we introduce vamp, an innovative visual analytics tool that enables, at once, the performance analysis of multiple end-to-end requests of a microservices system. Vamp was built around the idea that having a wide set of interactive visualizations facilitates the analyses of the recurrent characteristics of requests and their relation w.r.t. the end-to-end performance behavior. Through an evaluation of 33 datasets from an established open-source microservices system, we demonstrate how vamp aids in identifying RPC execution time deviations with significant impact on end-to-end performance. Additionally, we show that vamp can support in pinpointing meaningful structural patterns in end-to-end requests and their relationship with microservice performance behaviors.
- Studying the Effectiveness of Application Performance Management (APM) Tools for Detecting Performance Regressions for Web Applications: An Experience Report. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas) (MSR ’16). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/2901739.2901774
- Aggregate-Driven Trace Visualizations for Performance Debugging. arXiv: 2010.13681 [cs] Number: arXiv:2010.13681.
- Performance Analysis of Cloud Applications. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (NSDI’18). USENIX Association, USA, 405–417. event-place: Renton, WA, USA.
- PPTAM: Production and Performance Testing Based Application Monitoring. In Companion of the 2019 ACM/SPEC International Conference on Performance Engineering (Mumbai, India) (ICPE ’19). Association for Computing Machinery, New York, NY, USA, 39–40. https://doi.org/10.1145/3302541.3311961
- TransVis: Using visualizations and chatbots for supporting transient behavior in microservice systems. In 2021 Working Conference on Software Visualization (VISSOFT). IEEE, 65–75.
- Visualizing Distributed System Executions. ACM Trans. Softw. Eng. Methodol. 29, 2 (March 2020). https://doi.org/10.1145/3375633 Place: New York, NY, USA Publisher: Association for Computing Machinery.
- Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press. https://doi.org/10.1017/CBO9780511804441
- Vittorio Cortellessa and Luca Traini. 2020. Detecting Latency Degradation Patterns in Service-Based Systems. In Proceedings of the ACM/SPEC International Conference on Performance Engineering (ICPE ’20). Association for Computing Machinery, New York, NY, USA, 161–172. https://doi.org/10.1145/3358960.3379126
- Thomas Davidson and Jonathan Mace. 2022. See it to believe it? The role of visualisation in systems research. In Proceedings of the 13th Symposium on Cloud Computing (SoCC ’22). Association for Computing Machinery, New York, NY, USA, 419–428. https://doi.org/10.1145/3542929.3563488
- A Qualitative Interview Study of Distributed Tracing Visualisation: A Characterisation of Challenges and Opportunities. IEEE Transactions on Visualization and Computer Graphics (2023), 1–12. https://doi.org/10.1109/TVCG.2023.3241596
- Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56 (2013), 74–80. http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext
- Elastic. 2023a. Elasticsearch: The Official Distributed Search & Analytics Engine. https://www.elastic.co/elasticsearch/ ”Accessed 2023-01-15 18:38”.
- Elastic. 2023b. Kibana: Explore, Visualize, Discover Data. https://www.elastic.co/kibana ”Accessed 2023-01-15 17:38”.
- Brian Everitt. 1998. The Cambridge Dictionary of Statistics. Cambridge University Press, Cambridge, UK.
- Joe Farro. 2018. Trace comparisons arrive in Jaeger 1.7. https://medium.com/jaegertracing/trace-comparisons-arrive-in-jaeger-1-7-a97ad5e2d05d ”Accessed 2023-01-15 18:16”.
- Dynatrace Inc. 2023. Dynatrace. Service Flow. https://www.dynatrace.com/platform/service-flow/ ”Accessed 2023-02-14 14:10:59”.
- Zhen Ming Jiang and Ahmed E. Hassan. 2015. A Survey on Load Testing of Large-Scale Software Systems. IEEE Transactions on Software Engineering 41, 11 (2015), 1091–1118. https://doi.org/10.1109/TSE.2015.2445340
- An open source load testing tool. https://locust.io ”Accessed 2023-06-10 13:38”.
- Canopy: An End-to-End Performance Tracing And Analysis System. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP ’17). Association for Computing Machinery, New York, NY, USA, 34–50. https://doi.org/10.1145/3132747.3132749
- Darja Krushevskaja and Mark Sandler. 2013. Understanding Latency Variations of Black Box Services. In Proceedings of the 22nd International Conference on World Wide Web (Rio de Janeiro, Brazil) (WWW ’13). Association for Computing Machinery, New York, NY, USA, 703–714. https://doi.org/10.1145/2488388.2488450
- Christoph Laaber and Philipp Leitner. 2018. An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR ’18). Association for Computing Machinery, New York, NY, USA, 119–130. https://doi.org/10.1145/3196398.3196407
- Jessica Leone and Luca Traini. 2023. Enhancing Trace Visualizations for Microservices Performance Analysis. In Companion of the 2023 ACM/SPEC International Conference on Performance Engineering (Coimbra, Portugal) (ICPE ’23 Companion). Association for Computing Machinery, New York, NY, USA, 283–287. https://doi.org/10.1145/3578245.3584729
- Enjoy your observability: an industrial survey of microservice tracing and analysis. Empirical Software Engineering 27, 1 (2021), 25. https://doi.org/10.1007/s10664-021-10063-9
- Using Black-Box Performance Models to Detect Performance Regressions under Varying Workloads: An Empirical Study. Empirical Softw. Engg. 25, 5 (sep 2020), 4130–4160. https://doi.org/10.1007/s10664-020-09866-z
- Locating Performance Regression Root Causes in the Field Operations of Web-based Systems: An Experience Report. IEEE Transactions on Software Engineering (2021), 1–1. https://doi.org/10.1109/TSE.2021.3131529
- S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (1982), 129–137. https://doi.org/10.1109/TIT.1982.1056489
- Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis. In Proceedings of the ACM Symposium on Cloud Computing. 412–426.
- Jonathan Mace. 2017. End-to-End Tracing: Adoption and Use Cases. Survey. Brown University. https://cs.brown.edu/people/jcmace/papers/mace2017survey.pdf
- Observability Engineering: Achieving Production Excellence. O’Reilly Media, Incorporated. https://books.google.it/books?id=MbmLzgEACAAJ
- Sam Newman. 2015. Building Microservices (1st ed.). O’Reilly Media, Inc.
- Charlene O’Hanlon. 2006. A Conversation with Werner Vogels. Queue 4, 4 (May 2006), 14:14–14:22. https://doi.org/10.1145/1142055.1142065 Place: New York, NY, USA Publisher: ACM.
- Distributed Tracing in Practice: Instrumenting, Analyzing, and Debugging Microservices. O’Reilly Media, Incorporated.
- Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
- Julia Rubin and Martin Rinard. 2016. The Challenges of Staying Together While Moving Fast: An Exploratory Study. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). Association for Computing Machinery, New York, NY, USA, 982–993. https://doi.org/10.1145/2884781.2884871
- Principled Workflow-centric Tracing of Distributed Systems. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC ’16). ACM, New York, NY, USA, 401–414. https://doi.org/10.1145/2987550.2987568
- Visualizing Request-Flow Comparison to Aid Performance Diagnosis in Distributed Systems. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2466–2475. https://doi.org/10.1109/TVCG.2013.233
- Dapper, a Large-Scale Distributed Systems Tracing Infrastructure. Technical Report. Google.
- Cindy Sridharan. 2017. Testing Microservices, the sane way. https://copyconstruct.medium.com/testing-microservices-the-sane-way-9bb31d158c16 ”Accessed 2023-01-16 11:14”.
- Uber Technologies. 2023. Jaeger: Open source, end-to-end distributed tracing. https://www.jaegertracing.io/ ”Accessed 2023-01-15 14:10:59”.
- Luca Traini. 2022. Exploring Performance Assurance Practices and Challenges in Agile Software Development: An Ethnographic Study. Empirical Software Engineering 27, 3 (2022), 74. https://doi.org/10.1007/s10664-021-10069-3 ISBN: 1573-7616.
- Luca Traini and Vittorio Cortellessa. 2023. DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-Based Systems. IEEE Transactions on Software Engineering (2023), 1–28. https://doi.org/10.1109/TSE.2023.3266041
- Towards effective assessment of steady state performance in Java software: are we there yet? Empirical Software Engineering 28, 1 (2022), 13. https://doi.org/10.1007/s10664-022-10247-x
- How Software Refactoring Impacts Execution Time. ACM Trans. Softw. Eng. Methodol. 31, 2, Article 25 (dec 2021), 23 pages. https://doi.org/10.1145/3485136
- VAMP - Replication Package. https://github.com/lucatraini/VAMP.
- Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 635–651.
- DeepTraLog: Trace-Log Combined Microservice Anomaly Detection through Graph-based Deep Learning. In 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). 623–634. https://doi.org/10.1145/3510003.3510180
- Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study. IEEE Transactions on Software Engineering 47, 2 (2021), 243–260. https://doi.org/10.1109/TSE.2018.2887384