Analyzing WebAssembly Performance: Insights from Large-Scale Benchmarks
The paper "Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code" by Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha provides a rigorous evaluation of WebAssembly's performance compared to native code. The paper specifically addresses the performance gap by leveraging a novel toolkit, Browsix-Wasm, which extends the functionalities of Browsix to support unmodified Unix applications compiled to WebAssembly. This innovative toolkit enables researchers to conduct the first comprehensive performance analysis across meaningful application benchmarks, such as the SPEC CPU suite.
Methodology and Toolkit Development
A key contribution of this work is the development of Browsix-Wasm, allowing Unix programs to be executed as WebAssembly without modification directly in web browsers. This is complemented by the Browsix-SPEC harness, which facilitates the automated collection of performance metrics, including detailed timing and hardware counters. Together, these tools enable a thorough analysis beyond small scientific computing benchmarks, addressing a gap left by earlier studies that evaluated smaller code kernels.
Performance Evaluation and Results
The authors perform extensive benchmarking using the SPEC CPU2006 and SPEC CPU2017 suites, alongside the PolyBenchC for comparative purposes. Results reveal significant slowdowns in WebAssembly performance compared to native code. On average, WebAssembly code runs 1.55 and 1.45 times slower on Google Chrome and Mozilla Firefox browsers, respectively. Peak slowdowns were observed at 2.08x in Firefox and 2.5x in Chrome, challenging claims of performance parity with native code presented in earlier studies.
Root Cause Analysis
To understand the performance lag, the authors perform a detailed analysis of performance counters:
- Increased Register Pressure: Both Chrome and Firefox demonstrate heightened register usage, attributed to suboptimal register allocation strategies and limited register availability caused by JavaScript's reserved registers.
- Additional Branch Instructions: The analysis shows more branch instructions were retired in WebAssembly compared to native code, a consequence of additional safety checks and redundant jumps generated by the just-in-time (JIT) compilation processes.
- Larger Code Footprint: The WebAssembly implementation results in larger binary sizes, leading to more L1 instruction cache misses and increased CPU cycles for execution.
Implications and Future Directions
The performance discrepancies highlighted in this paper can be attributed to both avoidable inefficiencies and inherent design trade-offs in WebAssembly. The latter provides formal guarantees of security and portability at the cost of certain runtime checks, impacting efficiency.
Future developments in WebAssembly could strive to close this performance gap by optimizing code generation processes, particularly improving register allocation techniques and exploiting advanced x86 addressing modes. Additionally, addressing inherent overheads, such as stack overflow and indirect call checks, might necessitate architectural changes or enhancements in WebAssembly’s intermediate representation that maintain security without compromising on performance.
Conclusion
This research sets a benchmark for future evaluations of WebAssembly, emphasizing the importance of comprehensive tooling and suitable performance benchmarks. The insights offered here should guide WebAssembly implementers in refining optimization techniques, potentially making it a more attractive target for performance-sensitive web applications. The availability of Browsix-Wasm and Browsix-SPEC for public use further emphasizes the authors' commitment to advancing the WebAssembly ecosystem.