WebAssembly, a new binary execution format for the Web, is starting to arrive in stable versions of browsers. A major goal of WebAssembly is to be fast. This post gives some technical details about how it achieves that.
Before we start, the usual caveats: Performance is tricky to measure, and has many aspects. Also, in a new technology there are always going to be not-yet-optimized cases. So not every single benchmark will be fast on WebAssembly today. This post describes why WebAssembly should be fast; where it isn’t yet, those are bugs we need to fix.
With that out of the way, here is why WebAssembly is fast:
WebAssembly is designed to be small to download and fast to parse, so that even large applications start up quickly.
Total startup time can include factors other than downloading and parsing, such as the VM fully optimizing the code, or downloading additional data files that are necessary before execution, etc. But downloading and parsing are unavoidable and therefore important to improve upon as much as possible. All the rest can be optimized or mitigated, either in the browser or in the app (for example, fully optimizing the code can be avoided by using a baseline compiler or interpreter for WebAssembly, for the first few frames).
2. CPU features
- 64-bit integers. Operations on them can be up to 4x faster. This can speed up hashing and encryption algorithms, for example.
- Load and store offsets. This helps very broadly, basically anything that uses memory objects with fields at fixed offsets (C structs, etc.).
- Unaligned loads and stores, avoiding asm.js’s need to mask (which asm.js did for Typed Array compatibility purposes). This helps with practically every load and store.
- Various CPU instructions like popcount, copysign, etc. Each of these can help in specific circumstances (e.g. popcount can help in cryptanalysis).
How much a specific benchmark benefits will depend on whether it uses the features mentioned above. We often see a 5% speedup on average compared to asm.js. Further speedups are expected in the future from CPU features like SIMD.
3. Toolchain Improvements
WebAssembly is primarily a compiler target, and therefore has two parts: Compilers that generate it (the toolchain side), and VMs that run it (the browser side). Good performance depends on both.
This was already the case with asm.js, and Emscripten did a bunch of toolchain optimizations, running LLVM’s optimizer and also Emscripten’s asm.js optimizer. For WebAssembly, we built on top of that, but have also added some significant improvements while doing so. Both asm.js and WebAssembly are not typical compiler targets, and in similar ways, so lessons learned during the asm.js days helped do things better for WebAssembly. In particular:
- We replaced the Emscripten asm.js optimizer with the Binaryen WebAssembly optimizer, which is designed for speed. That speed lets us run more costly optimization passes. For example, we remove duplicate functions by default when optimizing, which often shrinks large compiled C++ codebases by around 5%.
- Better optimizations for irreducible and convoluted control flow, improving the Relooper algorithm. Helps a lot on compiled interpreter-type loops.
- The Binaryen optimizer was designed with experimentation in mind, and experiments with superoptimization have led to miscellaneous minor improvements — things which could have been done in asm.js too, had we thought of them.
Overall, these toolchain improvements help about as much as moving from asm.js to WebAssembly helps us (7% and 5% on Box2D, respectively).
4. Predictably Good Performance
About Alon Zakai