Unveiling and Vanquishing Goroutine Leaks in Enterprise Microservices: A Dynamic Analysis Approach (2312.12002v1)
Abstract: Go is a modern programming language gaining popularity in enterprise microservice systems. Concurrency is a first-class citizen in Go with lightweight goroutines'' as the building blocks of concurrent execution. Go advocates message-passing to communicate and synchronize among goroutines. Improper use of message passing in Go can result in
partial deadlocks'' , a subtle concurrency bug where a blocked sender (receiver) never finds a corresponding receiver (sender), causing the blocked goroutine to leak memory, via its call stack and objects reachable from the stack. In this paper, we systematically study the prevalence of message passing and the resulting partial deadlocks in 75 million lines of Uber's Go monorepo hosting over 2500 microservices. We develop two lightweight, dynamic analysis tools: Goleak and LeakProf, designed to identify partial deadlocks. Goleak detects partial deadlocks during unit testing and prevents the introduction of new bugs. Conversely, LeakProf uses goroutine profiles obtained from services deployed in production to pinpoint intricate bugs arising from complex control flow, unexplored interleavings, or the absence of test coverage. We share our experience and insights deploying these tools in developer workflows in a large industrial setting. Using Goleak we unearthed 857 pre-existing goroutine leaks in the legacy code and prevented the introduction of around 260 new leaks over one year period. Using LeakProf we found 24 and fixed 21 goroutine leaks, which resulted in up to 34% speedup and 9.2x memory reduction in some of our production services.
- S. Kramer, “The Biggest Thing Amazon Got Right: The Platform,” https://gigaom.com/2011/10/12/419-the-biggest-thing-amazon-got-right-the-platform/, October 2011.
- T. Mauro, “Adopting Microservices at Netflix: Lessons for Architectural Design,” https://www.nginx.com/blog/microservices-at-netflix-architectural-best-practices/, Feb 2015.
- Y. Goldberg, “Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Services Architecture,” https://www.infoq.com/presentations/scale-gilt.
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, “Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware,” in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS XVII. New York, NY, USA: Association for Computing Machinery, 2012, p. 37–48. [Online]. Available: https://doi.org/10.1145/2150976.2150982
- M. Villamizar, O. Garcés, H. Castro, M. Verano, L. Salamanca, R. Casallas, and S. Gil, “Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud,” in 2015 10th Computing Colombian Conference (10CCC), 2015, pp. 583–590.
- A. Gluck, “Introducing Domain-Oriented Microservice Architecture),” https://eng.uber.com/microservice-architecture/.
- Z. Zhang, M. K. Ramanathan, P. Raj, A. Parwal, T. Sherwood, and M. Chabbi, “CRISP: Critical Path Analysis of Large-Scale Microservice Architectures,” in 2022 USENIX Annual Technical Conference (USENIX ATC 22), 2022, pp. 655–672.
- Z. Zhang, M. Chabbi, A. Welc, and T. Sherwood, “Optimistic Concurrency Control for Real-World Go Programs,” in 2021 USENIX Annual Technical Conference (USENIX ATC 21), 2021, pp. 939–955.
- “Tiobe index,” https://www.tiobe.com/tiobe-index/.
- “Awesome Go - A curated list of awesome Go frameworks, libraries and software,” https://github.com/avelino/awesome-go.
- T. Tu, X. Liu, L. Song, and Y. Zhang, “Understanding real-world concurrency bugs in go,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019, pp. 865–878.
- M. Chabbi and M. K. Ramanathan, “A Study of Real-World Data Races in Golang,” in Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, ser. PLDI 2022. New York, NY, USA: Association for Computing Machinery, 2022, p. 474–489. [Online]. Available: https://doi.org/10.1145/3519939.3523720
- L. Nyman and M. Laakso, “Notes on the history of fork and join,” IEEE Ann. Hist. Comput., vol. 38, no. 3, pp. 84–87, 2016. [Online]. Available: https://doi.org/10.1109/MAHC.2016.34
- “Share memory by communicating,” https://go.dev/blog/codelab-share.
- C. A. R. Hoare, “Communicating sequential processes,” Commun. ACM, vol. 21, no. 8, pp. 666–677, 1978. [Online]. Available: https://doi.org/10.1145/359576.359585
- G. R. Luecke, Y. Zou, J. Coyle, J. Hoekstra, and M. Kraeva, “Deadlock detection in MPI programs,” Concurrency and Computation: Practice and Experience, vol. 14, no. 11, pp. 911–932, 2002.
- D. P. Mitchell and M. J. Merritt, “A distributed algorithm for deadlock detection and resolution,” in Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, ser. PODC ’84. New York, NY, USA: Association for Computing Machinery, 1984, p. 282–284. [Online]. Available: https://doi.org/10.1145/800222.806755
- K.-C. Tai et al., “Deadlock analysis of synchronous message-passing programs,” in 1999 Proceedings International Symposium on Software Engineering for Parallel and Distributed Systems. IEEE, 1999, pp. 62–69.
- “The bazel build system,” https://bazel.build/.
- Z. Liu, S. Zhu, B. Qin, H. Chen, and L. Song, “Automatically detecting and fixing concurrency bugs in go software systems,” in ASPLOS ’21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual Event, USA, April 19-23, 2021, T. Sherwood, E. D. Berger, and C. Kozyrakis, Eds. ACM, 2021, pp. 616–629. [Online]. Available: https://doi.org/10.1145/3445814.3446756
- O. H. Veileborg, G. Saioc, and A. Møller, “Detecting Blocking Errors in Go Programs using Localized Abstract Interpretation,” in 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022. ACM, 2022, pp. 32:1–32:12. [Online]. Available: https://doi.org/10.1145/3551349.3561154
- N. Dilley and J. Lange, “Bounded verification of message-passing concurrency in Go using Promela and Spin,” in Proceedings of the 12th International Workshop on Programming Language Approaches to Concurrency- and Communication-cEntric Software, PLACES@ETAPS 2020, Dublin, Ireland, 26th April 2020, ser. EPTCS, S. Balzer and L. Padovani, Eds., vol. 314, 2020, pp. 34–45. [Online]. Available: https://doi.org/10.4204/EPTCS.314.4
- J. Midtgaard, F. Nielson, and H. R. Nielson, “Process-local static analysis of synchronous processes,” in Static Analysis - 25th International Symposium, SAS 2018, Freiburg, Germany, August 29-31, 2018, Proceedings, ser. Lecture Notes in Computer Science, A. Podelski, Ed., vol. 11002. Springer, 2018, pp. 284–305. [Online]. Available: https://doi.org/10.1007/978-3-319-99725-4\_18
- N. Dilley and J. Lange, “Automated verification of go programs via bounded model checking,” in 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, 2021, pp. 1016–1027. [Online]. Available: https://doi.org/10.1109/ASE51524.2021.9678571
- Go Developers, “Pointer pkg in Golang,” https://pkg.go.dev/golang.org/x/tools/go/pointer.
- L. O. Andersen, “Program analysis and specialization for the c programming language,” 1994.
- P. Cousot, “Abstract interpretation,” ACM Comput. Surv., vol. 28, no. 2, pp. 324–328, 1996. [Online]. Available: https://doi.org/10.1145/234528.234740
- L. M. de Moura and N. S. Bjørner, “Z3: an efficient SMT solver,” in Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, ser. Lecture Notes in Computer Science, C. R. Ramakrishnan and J. Rehof, Eds., vol. 4963. Springer, 2008, pp. 337–340. [Online]. Available: https://doi.org/10.1007/978-3-540-78800-3\_24
- G. J. Holzmann, “The model checker SPIN,” IEEE Trans. Software Eng., vol. 23, no. 5, pp. 279–295, 1997. [Online]. Available: https://doi.org/10.1109/32.588521
- “Profiling Go Programs,” https://blog.golang.org/pprof, Google, 2013.
- “Time API,” https://pkg.go.dev/time.
- “Context API,” https://pkg.go.dev/context.