asm.js Speedups Everywhere

asm.js is an easy-to-optimize subset of JavaScript. It runs in all browsers without plugins, and is a good target for porting C/C++ codebases such as game engines – which have in fact been the biggest adopters of this approach, for example Unity 3D and Unreal Engine.

Obviously, developers porting games using asm.js would like them to run well across all browsers. However, each browser has different performance characteristics, because each has a different JavaScript engine, different graphics implementation, and so forth. In this post, we’ll focus on JavaScript execution speed and see the significant progress towards fast asm.js execution that has been happening across the board. Let’s go over each of the four major browsers now.

Chrome

Already in 2013, Google released Octane 2.0, a new version of their primary JavaScript benchmark suite, which contained a new asm.js benchmark, zlib. Benchmarks define what browsers optimize: things that matter are included in benchmarks, and browsers then compete to achieve the best scores. Therefore, adding an asm.js benchmark to Octane clearly signaled Google’s belief that asm.js content is important to optimize for.

A further major development happened more recently, when Google landed TurboFan, a new work-in-progress optimizing compiler for Chrome’s JavaScript engine, v8. TurboFan has a “sea of nodes” architecture (which is new in the JavaScript space, and has been used very successfully elsewhere, for example in the Java server virtual machine), and aims to reach even higher speeds than CrankShaft, the first optimizing compiler for v8.

While TurboFan is not yet ready to be enabled on all JavaScript content, as of Chrome 41 it is enabled on asm.js. Getting the benefits of TurboFan early on asm.js shows the importance of optimizing asm.js for the Chrome team. And the benefits can be quite substantial: For example, TurboFan speeds up Emscripten‘s zlib benchmark by 13%, and fasta by 24%.

Safari

During the last year, Safari’s JavaScript Engine, JavaScriptCore, introduced a new JIT (Just In Time compiler) called FTL. FTL stands for “Fourth Tier LLVM,” as it adds a fourth level of optimization above the three previously-existing ones, and it is based on LLVM, a powerful open source compiler framework. This is exciting because LLVM is a top-tier general-purpose compiler, with many years of optimizations put into it, and Safari gets to reuse all those efforts. As shown in the blogposts linked to earlier, the speedups that FTL provides can be very substantial.

Another interesting development from Apple this year was the introduction of a new JavaScript benchmark, JetStream. JetStream contains several asm.js benchmarks, an indication that Apple believes asm.js content is important to optimize for, just as when Google added an asm.js benchmark to Octane.

Internet Explorer

The JavaScript engine inside Internet Explorer is named Chakra. Last year, the Chakra team blogged about a suite of optimizations coming to IE in Windows 10 and pointed to significant improvements in the scores on asm.js workloads in Octane and JetStream. This is yet another example of how having asm.js workloads in common benchmarks drives measurement and optimization.

The big news, however, is the recent announcement by the Chakra team that they are working on adding specific asm.js optimizations, to arrive in Windows 10 together with the other optimizations mentioned earlier. These optimizations haven’t made it to the Preview channel yet, so we can’t measure and report on them here. However, we can speculate on the improvements based on the initial impact of landing asm.js optimizations in Firefox. As shown in this benchmark comparisons slide containing measurements from right after the landing, asm.js optimizations immediately brought Firefox to around 2x slower than native performance (from 5-12x native before). Why should these wins translate to Chakra? Because, as explained in our previous post, the asm.js spec provides a predictable way to validate asm.js code and generate high-quality code based on the results.

So, here’s looking forward to good asm.js performance in Windows 10!

Firefox

As we mentioned before, the initial landing of asm.js optimizations in Firefox generally put Firefox within 2x of native in terms of raw throughput. By the end of 2013, we were able to report that the gap had shrunk to around 1.5x native – which is close to the amount of variability that different native compilers have between each other anyhow, so comparisons to “native speed” start to be less meaningful.

At a high-level, this progress comes from two kinds of improvements: compiler backend optimizations and new JavaScript features. In the area of compiler backend optimizations, there has been a stream of tiny wins (specific to particular code patterns or hardware) making it difficult to point to any one thing. Two significant improvements stand out, though:

Along with backend optimization work, two new JavaScript features have been incorporated into asm.js which unlock new performance capabilities in the hardware. The first feature, Math.fround, may look simple but it enables the compiler backend to generate single-precision floating-point arithmetic when used carefully in JS. As described in this post, the switch can result in anywhere from a 5% – 60% speedup, depending on the workload. The second feature is much bigger: SIMD.js. This is still a stage 1 proposal for ES7 so the new SIMD operations and the associated asm.js extensions are only available in Firefox Nightly. Initial results are promising though.

Separate from all these throughput optimizations, there have also been a set of load time optimizations in Firefox: off-main-thread and parallel compilation of asm.js code as well as caching of the compiled machine code. As described in this post, these optimizations significantly improve the experience of starting a Unity- or Epic-sized asm.js application. Existing asm.js workloads in the benchmarks mentioned above do not test this aspect of asm.js performance so we put together a new benchmark suite named Massive that does. Looking at Firefox’s Massive score over time, we can see the load-time optimizations contributing to a more than 6x improvement (more details in the Hacks post introducing the Massive benchmark).

The Bottom Line

What is most important, in the end, are not the underlying implementation details, nor even specific performance numbers on this benchmark or that. What really matters is that applications run well. The best way to check that is to actually run real-world games! A nice example of an asm.js-using game is Dead Trigger 2, a Unity 3D game:

The video shows the game running on Firefox, but as it uses only standard web APIs, it should work in any browser. We tried it now, and it renders quite smoothly on Firefox, Chrome and Safari. We are looking forward to testing it on the next Preview version of Internet Explorer as well.

Another example is Cloud Raiders:

As with Unity, the developers of Cloud Raiders were able to compile their existing C++ codebase (using Emscripten) to run on the web without relying on plugins. The result runs well in all four of the major browsers.

In conclusion, asm.js performance has made great strides over the last year. There is still room for improvement – sometimes performance is not perfect, or a particular API is missing, in one browser or another – but all major browsers are working to make sure that asm.js runs quickly. We can see that by looking at the benchmarks they are optimizing on, which contain asm.js, and in the new improvements they are implementing in their JavaScript engines, which are often motivated by asm.js. As a result, games that not long ago would have required plugins are quickly getting to the point where they can run well without them, in modern browsers across the web.

About Alon Zakai

Alon is on the research team at Mozilla ,where he works primarily on Emscripten, a compiler from C and C++ to JavaScript. Alon founded the Emscripten project in 2010.

More articles by Alon Zakai…

About Luke Wagner

Luke Wagner is a Mozilla software engineer and hacks on JavaScript and WebAssembly in Firefox.

More articles by Luke Wagner…


10 comments

  1. David Flanagan

    I’d really like to have an easy way to use asm.js in single standalone functions that I can call from regular JavaScript. I’d like the asm.js speed boost when doing image processing in a worker, for example.

    I haven’t been able to figure out how to do this either with hand-coded asm.js or with emscripten. Is this possible? If so, I’d love to see an example (or documentation) of how to make it work.

    March 3rd, 2015 at 11:56

    1. Alon Zakai

      The emscripten docs has a section on that,

      http://kripken.github.io/emscripten-site/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html

      For a ‘raw’ asm.js example, see the asm.js spec github repo

      https://github.com/dherman/asm.js/

      Note though that if you call into asm.js for a very small amount of work, it might not be worth it. The benefits are most pronounced when calling into asm.js to do significant amounts of calculation.

      March 3rd, 2015 at 12:04

    2. William Furr

      I have had good luck with the embind system in Alon’s link. Module.cwrap is a lot simpler if you have externed C functions.

      If you’re compiling library code to call from JS, then you want to set -s NO_EXIT_RUNTIME=1 which will keep the emscripten runtime around after the main function, such as it is, has completed.

      March 3rd, 2015 at 14:48

    3. Gerard Braad

      Not really advised, but it is possible to handcraft the code in asm.is. Sebastian did so for jor1k.

      March 3rd, 2015 at 17:36

  2. David Flanagan

    Thanks for the links Alon.

    I didn’t find the hand-written asm.js module example in Dave’s repo very helpful, but the one in the spec (near the end of http://asmjs.org/spec/latest/#introduction) is great.

    With that as a model, I was able to write a simple asm.js module did some simulated image processing on a few million pixels. I was caught by surprise by the link-time requirement that the heap size be a power of two, which in general means that I can’t use an arbitrary ImageData.data.buffer as my heap and will have to copy the pixels I want to process into a separately allocated heap array.

    Once I got that figured out, though, I had a working asm.js module for my first time ever. Unfortunately the code was actually slower with asm.js. If I removed the “use asm” directive or if I broke linking by using a bad heap size, it was faster. I suspect there is overhead that I need to amortize over multiple calls to my asm’ed function.

    As part of this experimentation I also discovered that a good way to improve image processing speed is convert from Uint8ClampedArray to Uint8Array. This change alone may make as much or more of a difference as asm.js will for my code.

    I’ll add that I have not tried actually using emscripten to compile C functions to do my image processing because the docs keep talking about “the emscripten runtime” which seems ominous and large when all I want is 20 to 50 lines of really fast javascript.

    March 4th, 2015 at 23:37

    1. Luke Wagner

      In general, the heap size doesn’t have to be a power of 2; it may also be any multiple of 16mb (http://asmjs.org/spec/latest/#linking-0).

      It seems likely the slowdown is related to spending a lot of time in the trampoline going into and out of asm.js; there isn’t an IC yet for this call path. You can see this overhead in the FF builtin profiler by enabling “Show Gecko Platform Data” in the devtools prefs and looking for self time in the asm.js entry/exit trampolines.

      March 5th, 2015 at 02:33

  3. Eric Morgen

    A SpiderMonkey powered node.js fork JXcore made it possible to use ASM.JS with node applications. ‘asm.js speedups everywhere’ could also cover that.

    March 5th, 2015 at 09:42

  4. Owen Densmore

    It’s nice that Moz has a LLVM > Emscripten > asm.js workflow for C/C++ and other compiled languages. And hopefully PNaCl is now dead.

    But why is JS the bastard son? Can’t there be a JS > asm.js approach? Or a TypeScript-like transpiler? Or SweetJS macros?

    It seems like Moz is pandering to the nonJS crowd, leaving JS devs behind.

    March 6th, 2015 at 09:49

    1. Alon Zakai

      First, you can’t really compile JS into asm.js – JS doesn’t have explicit types, and asm.js is only easily-optimizable because it does.

      Second, Mozilla and other browser vendors are putting massive efforts into improving and optimizing general JS. This article happens to be about asm.js, but which is pretty small in comparison, in terms of the work put into it.

      For example, general JS will become much faster with things like Typed Objects and SIMD.js, both of which are being implemented and optimized as we speak. You can see some of the work on Typed Objects at http://arewefastyet.com/ (“unboxed objects”).

      March 6th, 2015 at 18:06

  5. Paul Topping

    You won’t be able to compile dynamic programming languages like full JavaScript into asm.js precisely because asm.js omits the support for all that dynamic stuff. That is how it achieves its speed. C++ is faster than JavaScript, both via asm.js and with native code, because it knows a lot more about data types at runtime and also omits all that dynamic stuff.

    March 11th, 2015 at 16:16

Comments are closed for this article.