Mozilla

Gap between asm.js and native performance gets even narrower with float32 optimizations

asm.js is a simple subset of JavaScript that is very easy to optimize, suitable for use as a compiler target from languages like C and C++. Earlier this year Firefox could run asm.js code at about half of native speed – that is, C++ code compiled by emscripten could run at about half the speed that the same C++ code could run when compiled natively – and we thought that through improvements in both emscripten (which generates asm.js code from C++) and JS engines (that run that asm.js code), it would be possible to get much closer to native speed.

Since then many speedups have arrived, lots of them small and specific, but there were also a few large features as well. For example, Firefox has recently gained the ability to optimize some floating-point operations so that they are performed using 32-bit floats instead of 64-bit doubles, which provides substantial speedups in some cases as shown in that link. That optimization work was generic and applied to any JavaScript code that happens to be optimizable in that way. Following that work and the speedups it achieved, there was no reason not to add float32 to the asm.js type system so that asm.js code can benefit from it specifically.

The work to implement that in both emscripten and SpiderMonkey has recently completed, and here are the performance numbers:

asm1.5b

Run times are normalized to clang, so lower is better. The red bars (firefox-f32) represent Firefox running on emscripten-generated code using float32. As the graph shows, Firefox with float32 optimizations can run all those benchmarks at around 1.5x slower than native, or better. That’s a big improvement from earlier this year, when as mentioned before things were closer to 2x slower than native. You can also see the specific improvement thanks to float32 optimizations by comparing to the orange bar (firefox) next to it – in floating-point heavy benchmarks like skinning, linpack and box2d, the speedup is very noticeable.

Another thing to note about those numbers is that not just one native compiler is shown, but two, both clang and gcc. In a few benchmarks, the difference between clang and gcc is significant, showing that while we often talk about “times slower than native speed”, “native speed” is a somewhat loose term, since there are differences between native compilers.

In fact, on some benchmarks, like box2d, fasta and copy, asm.js is as close or closer to clang than clang is to gcc. There is even one case where asm.js beats clang by a slight amount, on box2d (gcc also beats clang on that benchmark, by a larger amount, so probably clang’s backend codegen just happens to be a little unlucky there).

Overall, what this shows is that “native speed” is not a single number, but a range. It looks like asm.js on Firefox is very close to that range – that is, while it’s on average slower than clang and gcc, the amount it is slower by is not far off from how much native compilers differ amongst themselves.

Note that float32 code generation is off by default in emscripten. This is intentional, as while it can both improve performance as well as ensure the proper C++ float semantics, it also increases code size – due to adding Math.fround calls – which can be detrimental in some cases, especially in JavaScript engines not yet supporting Math.fround.

There are some ways to work around that issue, such as the outlining option which reduces maximum function size. We have some other ideas on ways to improve code generation in emscripten as well, so we’ll be experimenting with those for a while as well as following when Math.fround gets supported in browsers (so far Firefox and Safari do). Hopefully in the not so far future we can enable float32 optimizations by default in emscripten.

Summary

In summary, the graph above shows asm.js performance getting yet closer to native speed. While for the reasons just mentioned I don’t recommend that people build with float32 optimizations quite yet – hopefully soon though! – it’s an exciting increase in performance. And even the current performance numbers – 1.5x slower than native, or better – are not the limit of what can be achieved, as there are still big improvements either under way or in planning, both in emscripten and in JavaScript engines.

20 comments

Comments are now closed.

  1. Hervé Renault wrote on December 20th, 2013 at 03:20:

    That’s good news. I published a summary in french on http://mozillazine-fr.org/lecart-se-reduit-entre-c-et-javascript/
    Hope to see more good news like this one in 2014 from Mozilla.
    Cheers !

    1. Robert Nyman [Editor] wrote on December 20th, 2013 at 09:08:

      Thank you!

  2. Rimantas wrote on December 20th, 2013 at 09:01:

    Nobody cares about floating point operations. Make DOM operations as fast and smooth as native, then we can talk.

    1. Robert Nyman [Editor] wrote on December 20th, 2013 at 09:10:

      Personally, I believe people care a lot about getting better performance, and if this helps towards that goal I’d say it’s a good thing. And it’s not an either/or thing, there are many areas that work go into to make things faster and better.

    2. Alon Zakai wrote on December 20th, 2013 at 11:14:

      Some people certainly care, for example games and game engines. Definitely DOM performance is crucial as well, but different people work on those things, improvements in one do not preclude improvements on the other.

    3. Boris wrote on December 21st, 2013 at 00:09:

      Which particular DOM operations are relevant to the things you’re doing? We’re always working on making the DOM faster, and specific real-life testcases that show slowness are much appreciated!

      1. Glamsci wrote on December 21st, 2013 at 23:38:

        Any tweening performed through setting CSS (not even through any libraries, through element.style.width etc.) appear very choppy in firefox whereas they appear smooth in chrome. Usually visual effects using DOM will want to be done smoothly rather than jump cut and surprise the user, so this is a big deal.

        jQuery, Tween.JS (which BTW, is secretly not that good – you might get 7 steps even though you want 100), and even if I just create a featherweight setTimer recursive function with a callback for the current value and prime the function first so the JIT gets it, with big divs movement is still choppy.

        Because if I’m setting a css value more than a few times a second, firefox has a problem rendering it smoothly even with native methods – its scheduling is either poor in regard to DOM, or the DOM operations are too inefficient. It makes it impossible to smoothly change shape or move things around on firefox in a consistent manner.

        1. anon wrote on December 23rd, 2013 at 09:58:

          Agreed. While in Chrome and Firefox for Android I find the opposite to be true. Strange.

        2. Hervé Renault wrote on December 23rd, 2013 at 11:09:

          While I don’t see such a gap between Firefox and Chrome… maybe Firefox will improve with project Electrolysis : https://wiki.mozilla.org/Electrolysis

        3. Robert O’Callahan wrote on December 27th, 2013 at 19:10:

          Care to file a bug with specific testcase(s)? Performance of setting CSS ‘width’ is going to depend a lot on the specific content. Thanks!

  3. kruger wrote on December 20th, 2013 at 09:04:

    It would be good to include C++ compile flags and the machine specs. Was -O3, -march=native used? Becouse Firefox uses all the available CPU instructions when executing JavaScript I assume?
    Also the compilers are pretty old. My Fedora already has clang 3.3 and gcc 4.8. It’s not like they ain’t making any progress.

    1. Alon Zakai wrote on December 20th, 2013 at 11:12:

      You can reproduce the results by running the emscripten benchmark suite, and see the compilation flags in there as well. Flags are -O2, please let me know if you get different results with other ones (didn’t seem to matter on my machine). My machine is i7-2600 @ 3.40GHz.

      The version of clang is 3.2 which is 1 behind. 3.2 is used in emscripten, so using it in the native build as well makes the comparison more apples-to-apples. The gcc version is the one installed by my distro (Ubuntu 12.04). Yes, newer versions might make a difference, but overall the goal here was to compare to a few “reasonable” native compilers – for example, that gcc version is reasonable because many people build with the distro compiler – and to get a general picture of how much variance there is between native compilers, and between asm.js and those compilers. For that purpose, I think the compilers tested on are good. But I agree that if you want to see the absolute limit of current performance, or to see exactly where gcc or clang trunk are, then for those purposes other compilers would have been better, it’s just that the goals here were different.

  4. Axel Rauschmayer wrote on December 20th, 2013 at 11:13:

    I take it that the ECMAScript 6 Math.fround() is used for float32 type annotations(?)

    1. Alon Zakai wrote on December 20th, 2013 at 11:15:

      Yes, fround is used. Without it is very difficult to efficiently get float32 semantics in JS.

  5. Ron wrote on December 20th, 2013 at 12:44:

    The related subject not explored here is how much faster V8 and Spidermonkey can run JavaScript without asm.js. I imagine the gap between normal JavaScript and asm.js is closing, just as the gap between asm.js and native is closing.

    1. Dany wrote on December 21st, 2013 at 04:29:

      You’re kind of out of scope, how would you compile c++ into a javascript without using asm.js?

      Anyway the optimization that could be done using a subset like asm.js improves performances A LOT.
      Using hand tweaked vanilla javascript vs using a compiled asm.js version, make a HUGE difference, likely something around 10x, maybe even more.

  6. M. Edward Borasky (@znmeb) wrote on December 20th, 2013 at 17:21:

    This is great news! The majority of compute-intensive things most games do – graphics and audio – work fine in 32-bit floats. It’s only heavy-duty matrix / scientific computing that require 64 bits.

  7. DDT wrote on December 23rd, 2013 at 01:59:

    That looks very promising not only for games for cryptographic functions to protect our JS applications. But in crypto you use mainly integer calculation then floating stuff.

  8. Xavi Conde wrote on December 24th, 2013 at 02:10:

    Regarding your chart,

    a) Why’re you normalizing against clang? It seems gcc is faster than clang in general.

    b) Performance difference between firefox and f-32 is not that different.

    c) I’d be curious on how your tests perform against Chrome.

    d) Which use case are you targetting here with float optimization? As some people pointed above, it’s not a most common operation. Most javascript application won’t deal with heavy floating-point operations.

    1. Alon Zakai wrote on December 24th, 2013 at 09:34:

      a) In previous comparisons we compared and normalized to clang, so the normalized numbers are in a familiar scale. But yes, perhaps we should consider changing that

      b) On most benchmarks yes, but on a few it is very substantial, for example skinning.

      c) You can see numbers for chrome here: http://arewefastyet.com/#machine=12&view=breakdown&suite=asmjs-apps

      d) Games have been one of the main use cases for emscripten and asm.js, I guess because (1) they want to run on the web so they can reach more people and (2) they really really need maximum performance. And games very often use floating point operations, in fact several of the benchmarks here that benefit from float32 optimizations are real-world code from game engines: bullet, box2d and skinning.

Comments are closed for this article.