Wasm-R3: Record-Reduce-Replay for Realistic and Standalone WebAssembly Benchmarks

Published 1 Sep 2024 in cs.PL | (2409.00708v1)

Abstract: WebAssembly (Wasm for short) brings a new, powerful capability to the web as well as Edge, IoT, and embedded systems. Wasm is a portable, compact binary code format with high performance and robust sandboxing properties. As Wasm applications grow in size and importance, the complex performance characteristics of diverse Wasm engines demand robust, representative benchmarks for proper tuning. Stopgap benchmark suites, such as PolyBenchC and libsodium, continue to be used in the literature, though they are known to be unrepresentative. Porting of more complex suites remains difficult because Wasm lacks many system APIs and extracting real-world Wasm benchmarks from the web is difficult due to complex host interactions. To address this challenge, we introduce Wasm-R3, the first record and replay technique for Wasm. Wasm-R3 transparently injects instrumentation into Wasm modules to record an execution trace from inside the module, then reduces the execution trace via several optimizations, and finally produces a replay module that is executable sandalone without any host environment - on any engine. The benchmarks created by our approach are (i) realistic, because the approach records real-world web applications, (ii) faithful to the original execution, because the replay benchmark includes the unmodified original code, only adding emulation of host interactions, and (iii) standalone, because the replay benchmarks run on any engine. Applying Wasm-R3 to web-based Wasm applications in the wild demonstrates the correctness of our approach as well as the effectiveness of our optimizations, which reduce the recorded traces by 99.53 percent and the size of the replay benchmark by 9.98 percent. We release the resulting benchmark suite of 27 applications, called Wasm-R3-Bench, to the community, to inspire a new generation of realistic and standalone Wasm benchmarks.

Abstract PDF HTML Upgrade to Chat

Summary

The paper presents a record-reduce-replay approach that captures actual WebAssembly interactions to create accurate and representative benchmarks.
It employs shadow memory and call stack optimizations that reduce trace size by 99.53%, ensuring efficiency in benchmark generation.
Benchmarks evaluated on 27 applications show a median recording overhead of 3.79× and minimal replay impact, validating the method’s practical utility.

Wasm-R3: Record-Reduce-Replay for Realistic and Standalone WebAssembly Benchmarks

Wasm-R3 presents a novel method for producing realistic and standalone benchmarks from real-world WebAssembly (Wasm) applications. With the increasing significance of Wasm in various domains, from web browsers to IoT devices, the need for robust and representative benchmarks has become paramount for performance evaluation and tuning of Wasm engines. Wasm-R3 addresses this by introducing a record-reduce-replay (R3) technique that allows the creation of benchmarks from actual usage scenarios of Wasm web applications, ensuring representativeness and standalone execution.

Core Contributions

Record-Reduce-Replay Technique

The core of Wasm-R3 lies in its three-phase approach: record, reduce, and replay.

Record Phase

In the record phase, Wasm-R3 instruments Wasm modules to record interactions with the host environment. This phase captures function calls, memory loads, and stores to create an execution trace. By employing a proxy-based approach that intercepts Wasm and JavaScript code, Wasm-R3 can transparently insert instrumentation without requiring modifications to the browser or Wasm engine.

Reduce Phase

Given the potential size of execution traces, the reduce phase is crucial for filtering out unnecessary events. Wasm-R3 employs two key reduction techniques: shadow memory optimization and call stack optimization. These techniques significantly decrease trace size by discarding redundant memory operations and irrelevant function calls. The reduction phase sets the stage for creating practical and efficient replay benchmarks.

Replay Phase

In the replay phase, the optimized trace is translated into a standalone executable benchmark. This involves generating replay functions that reproduce the recorded execution by emulating host interactions within the Wasm environment. The replay phase ensures that the benchmarks remain realistic by preserving the original Wasm code and only adding necessary replay logic.

Evaluation and Results

Applicability

Wasm-R3 has been evaluated against a diverse set of real-world Wasm web applications. The study successfully produced accurate benchmarks for 27 out of 43 applications, highlighting the approach's wide applicability. Additionally, the generated benchmarks, referred to as Wasm-R3-Bench, can run across major Wasm engines, including web browsers and standalone Wasm runtimes, demonstrating the portability of the approach.

Performance

Recording overhead is a critical factor, particularly for interactive applications. Wasm-R3 introduces a median overhead of approximately 3.79×, which is deemed acceptable for capturing realistic user interactions without significant disruption. Moreover, in the replay benchmarks, the majority of execution time (geometric mean of 0.20% spent in replay functions) is in the original Wasm code, ensuring that the benchmarks faithfully represent the original application's performance.

Effectiveness of Optimization

The trace reduction techniques of Wasm-R3 achieve a remarkable reduction in trace size, averaging a 99.53% decrease. This reduction is essential for managing the size and complexity of traces from real-world applications. Furthermore, replay optimizations reduce the size of the replay binary by an average of 9.98%, thereby enhancing load and validation times and maintaining execution efficiency.

Implications and Future Directions

Wasm-R3 sets a new standard for creating benchmarks that are both representative of real-world applications and standalone. This has significant implications for the development and tuning of Wasm engines, as it allows for more accurate performance evaluations. The record-reduce-replay approach can be extended to support emerging Wasm features and proposals, ensuring its relevance in evolving Wasm ecosystems.

Future developments may focus on enhancing the support for complex Wasm features like SIMD and multi-threading. Moreover, the technique's adaptability to non-web Wasm environments opens opportunities for comprehensive performance benchmarking across diverse applications beyond the web.

Conclusion

Wasm-R3 introduces an effective method for creating realistic and standalone benchmarks from Wasm applications, addressing the need for representative performance evaluation tools. The systematic approach of recording, reducing, and replaying executions ensures that the generated benchmarks are accurate, efficient, and portable, making Wasm-R3 a valuable contribution to the field of Wasm performance analysis. The Wasm-R3-Bench suite stands as a testament to the approach's efficacy, offering a new resource for researchers and developers to evaluate and improve Wasm engines.

Markdown