Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code (1901.09056v3)

Published 25 Jan 2019 in cs.PL

Abstract: All major web browsers now support WebAssembly, a low-level bytecode intended to serve as a compilation target for code written in languages like C and C++. A key goal of WebAssembly is performance parity with native code; previous work reports near parity, with many applications compiled to WebAssembly running on average 10% slower than native code. However, this evaluation was limited to a suite of scientific kernels, each consisting of roughly 100 lines of code. Running more substantial applications was not possible because compiling code to WebAssembly is only part of the puzzle: standard Unix APIs are not available in the web browser environment. To address this challenge, we build Browsix-Wasm, a significant extension to Browsix that, for the first time, makes it possible to run unmodified WebAssembly-compiled Unix applications directly inside the browser. We then use Browsix-Wasm to conduct the first large-scale evaluation of the performance of WebAssembly vs. native. Across the SPEC CPU suite of benchmarks, we find a substantial performance gap: applications compiled to WebAssembly run slower by an average of 45% (Firefox) to 55% (Chrome), with peak slowdowns of 2.08x (Firefox) and 2.5x (Chrome). We identify the causes of this performance degradation, some of which are due to missing optimizations and code generation issues, while others are inherent to the WebAssembly platform.

Analyzing WebAssembly Performance: Insights from Large-Scale Benchmarks

The paper "Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code" by Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha provides a rigorous evaluation of WebAssembly's performance compared to native code. The paper specifically addresses the performance gap by leveraging a novel toolkit, Browsix-Wasm, which extends the functionalities of Browsix to support unmodified Unix applications compiled to WebAssembly. This innovative toolkit enables researchers to conduct the first comprehensive performance analysis across meaningful application benchmarks, such as the SPEC CPU suite.

Methodology and Toolkit Development

A key contribution of this work is the development of Browsix-Wasm, allowing Unix programs to be executed as WebAssembly without modification directly in web browsers. This is complemented by the Browsix-SPEC harness, which facilitates the automated collection of performance metrics, including detailed timing and hardware counters. Together, these tools enable a thorough analysis beyond small scientific computing benchmarks, addressing a gap left by earlier studies that evaluated smaller code kernels.

Performance Evaluation and Results

The authors perform extensive benchmarking using the SPEC CPU2006 and SPEC CPU2017 suites, alongside the PolyBenchC for comparative purposes. Results reveal significant slowdowns in WebAssembly performance compared to native code. On average, WebAssembly code runs 1.55 and 1.45 times slower on Google Chrome and Mozilla Firefox browsers, respectively. Peak slowdowns were observed at 2.08x in Firefox and 2.5x in Chrome, challenging claims of performance parity with native code presented in earlier studies.

Root Cause Analysis

To understand the performance lag, the authors perform a detailed analysis of performance counters:

  • Increased Register Pressure: Both Chrome and Firefox demonstrate heightened register usage, attributed to suboptimal register allocation strategies and limited register availability caused by JavaScript's reserved registers.
  • Additional Branch Instructions: The analysis shows more branch instructions were retired in WebAssembly compared to native code, a consequence of additional safety checks and redundant jumps generated by the just-in-time (JIT) compilation processes.
  • Larger Code Footprint: The WebAssembly implementation results in larger binary sizes, leading to more L1 instruction cache misses and increased CPU cycles for execution.

Implications and Future Directions

The performance discrepancies highlighted in this paper can be attributed to both avoidable inefficiencies and inherent design trade-offs in WebAssembly. The latter provides formal guarantees of security and portability at the cost of certain runtime checks, impacting efficiency.

Future developments in WebAssembly could strive to close this performance gap by optimizing code generation processes, particularly improving register allocation techniques and exploiting advanced x86 addressing modes. Additionally, addressing inherent overheads, such as stack overflow and indirect call checks, might necessitate architectural changes or enhancements in WebAssembly’s intermediate representation that maintain security without compromising on performance.

Conclusion

This research sets a benchmark for future evaluations of WebAssembly, emphasizing the importance of comprehensive tooling and suitable performance benchmarks. The insights offered here should guide WebAssembly implementers in refining optimization techniques, potentially making it a more attractive target for performance-sensitive web applications. The availability of Browsix-Wasm and Browsix-SPEC for public use further emphasizes the authors' commitment to advancing the WebAssembly ecosystem.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Abhinav Jangda (13 papers)
  2. Bobby Powers (3 papers)
  3. Emery Berger (5 papers)
  4. Arjun Guha (44 papers)
Citations (125)
Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com