GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production (2311.09394v2)

Published 15 Nov 2023 in cs.SE and cs.PL

Abstract: Despite the recent advances in pre-production bug detection, heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++, across all major software ecosystems. Memory-safe languages solve this problem when they are used, but the existing code bases consisting of billions of lines of C and C++ continue to grow, and we need additional bug detection mechanisms. This paper describes a family of tools that detect these two classes of memory-safety bugs, while running in production, at near-zero overhead. These tools combine page-granular guarded allocation and low-rate sampling. In other words, we added an "if" statement to a 36-year-old idea and made it work at scale. We describe the basic algorithm, several of its variants and implementations, and the results of multi-year deployments across mobile, desktop, and server applications.

References (29)

Citations (1)

View on Semantic Scholar

Summary

The paper presents a novel sampling-based algorithm that uses guard pages to detect heap-use-after-free and heap-buffer-overflow bugs in C/C++ applications.
It combines page-granular guarded allocation with low-rate sampling to achieve near-zero runtime overhead in production environments.
Empirical evaluations demonstrate its effectiveness across platforms, identifying over 550 bugs in Google’s server applications and numerous issues in Chrome and Android.

Overview of GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production

The paper presents "GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production," a paper introducing a family of tools designed to detect memory-safety bugs specifically heap-use-after-free and heap-buffer-overflow bugs in C and C++ applications. The authors highlight the persistent challenge posed by these bugs across major software ecosystems, even amidst significant advancements in pre-production bug detection techniques. The paper details a comprehensive approach that incorporates page-granular guarded allocation combined with low-rate sampling, achieving near-zero overhead when these tools are run in production environments.

Technical Contributions

The core technical contribution of this paper is the refined algorithmic incorporation of guard pages and sampling mechanisms to detect runtime memory-safety violations while minimizing performance overhead. The authors revisit the decades-old Electric Fence memory management concept, enhancing it with an "if" statement to enable scalable production use. The key elements of the algorithm include:

Guard Allocation: Using memory management unit (MMU) capabilities, it places inaccessible guard pages around allocated memory to catch illegal accesses.
Sampling Mechanism: It implements a low-probability sampling technique controlled by a lightweight decision function to determine when to apply guarded allocations.

The paper further elucidates the deployment of GWP-ASan across various platforms, including Google server applications, the Chrome browser, Android, Firefox, Apple platforms, and the Linux kernel. Each implementation demonstrates nuanced adjustments to the basic algorithm to cater to the specific context of the software environment and hardware constraints.

Results and Observations

Empirical evaluations reveal the effectiveness of GWP-ASan across different systems. For instance, in Google's server-side software, the tool identified over 550 unique bugs in 2023 alone, predominantly heap-use-after-free issues. The deployments on Chrome and Android also yield substantial results in identifying security vulnerabilities, emphasising the tool's potential impact in commercial applications. In Android, out of nearly 11.7 million collected crash reports, 1,972 unique stack traces were discovered.

Significant statistical insights presented include the distribution of bug occurrence frequencies. The deployment data suggests that most bugs are captured infrequently, with GWP-ASan catching many only once—highlighting the critical nature of GWP-ASan's probabilistic detection in vast production environments.

Practical Implications

The practical implications of this research are profound for the software development community. GWP-ASan offers an alternative, efficient solution for detecting memory safety violations without imposing the typical runtime penalties associated with dynamic analysis. Consequently, it presents itself as a complementary tool to more pervasive dynamic analysis techniques used in pre-production testing, such as AddressSanitizer (ASan) and HardwareAddressSanitizer (HWASan).

However, the paper also underscores the intrinsic limitations associated with sampling-based detection. Given its low per-instance detection likelihood, GWP-ASan primarily surfaces frequently occurring memory errors in production, leaving the less frequent—and potentially severe—bugs undiscovered unless they manifest frequently.

Future Directions

The paper outlines compelling avenues for future research and enhancements. These include the development of extensions or new algorithms aimed at other classes of memory bugs, such as stack-use-after-return, and data races, as well as approaches to account for and mitigate the undetected infrequent bugs. The authors advocate for leveraging emerging hardware capabilities to further optimize sampling rates and improve detection fidelity.

Enhancements are suggested not only in the form of algorithmic innovations but also in feedback mechanisms that dynamically adjust sampling strategies based on historical data to improve discoverability for elusive bugs. Such strategies could lead to a nuanced, adaptable deployment model boosting overall software robustness.

Conclusion

In conclusion, by leveraging a well-founded yet previously overlooked approach and augmenting it with modern sampling techniques, GWP-ASan establishes itself as a pivotal tool in the ongoing endeavor to enhance the memory safety of legacy and new software offerings. This paper’s contribution is vital, particularly during a transitional phase wherein the software industry progressively adopts memory-safe languages while still managing extensive C and C++ codebases.

PDF Markdown

Related Papers

Tweets

https://twitter.com/dvyukov/status/1798296739419660652

https://twitter.com/maelver/status/1803079304307200318

https://twitter.com/dvyukov/status/1798380030596436318

https://twitter.com/Komodosec/status/1781699985928827121

https://twitter.com/Komodosec/status/1781746033950888156

HackerNews

GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production (2 points, 0 comments)