Mozilla

TraceMonkey Articles

Sort by:

View:

  1. Firefox 4 Performance

    Dave Mandelin from the JS team and Joe Drew from the Graphics team summarize the key performance improvements in Firefox 4.

    The web wants fast browsers. Cutting-edge HTML5 web pages play games, mash up and share maps, sound, and videos, show spreadsheets and presentations, and edit photos. Only a high-performance browser can do that. What the web wants, it’s our job to make, and we’ve been working hard to make Firefox 4 fast.

    Firefox 4 comes with performance improvements in almost every area. The most dramatic improvements are in JavaScript and graphics, which are critical for modern HTML5 apps and games. In the rest of this article, we’ll profile the key performance technologies and show how they make the web that much “more awesomer”.

    Fast JavaScript: Uncaging the JägerMonkey
    JavaScript is the programming language of the web, powering most of the dynamic content and behavior, so fast JavaScript is critical for rich apps and games. Firefox 4 gets fast JavaScript from a beast we call JägerMonkey. In techno-gobbledygook, JägerMonkey is a multi-architecture per-method JavaScript JIT compiler with 64-bit NaN-boxing, inline caching, and register allocation. Let’s break that down:

      Multi-architecture
      JägerMonkey has full support for x86, x64, and ARM processors, so we’re fast on both traditional computers and mobile devices. W00t!
      (Crunchy technical stuff ahead: if you don’t care how it works, skip the rest of the sections.)

      Per-method JavaScript JIT compilation

      The basic idea of JägerMonkey is to translate (compile) JavaScript to machine code, “just in time” (JIT). JIT-compiling JavaScript isn’t new: previous versions of Firefox feature the TraceMonkey JIT, which can generate very fast machine code. But some programs can’t be “jitted” by TraceMonkey. JägerMonkey has a simpler design that is able to compile everything in exchange for not doing quite as much optimization. But it’s still fast. And TraceMonkey is still there, to provide a turbo boost when it can.

      64-bit NaN-boxing
      That’s the technical name for the new 64-bit formats the JavaScript engine uses to represent program values. These formats are designed to help the JIT compilers and tuned for modern hardware. For example, think about floating-point numbers, which are 64 bits. With the old 32-bit value formats, floating-point calculations required the engine to allocate, read, write, and deallocate extra memory, all of which is slow, especially now that processors are much faster than memory. With the new 64-bit formats, no extra memory is required, and calculations are much faster. If you want to know more, see the technical article Mozilla’s new JavaScript value representation.
      Inline caching
      Property accesses, like o.p, are common in JavaScript. Without special help from the engine, they are complicated, and thus slow: first the engine has to search the object and its prototypes for the property, next find out where the value is stored, and only then read the value. The idea behind inline caching is: “What if we could skip all that other junk and just read the value?” Here’s how it works: The engine assigns every object a shape that describes its prototype and properties. At first, the JIT generates machine code for o.p that gets the property by laborious searching. But once that code runs, the JITs finds out what o's shape is and how to get the property. The JIT then generates specialized machine code that simply verifies that the shape is the same and gets the property. For the rest of the program, that o.p runs about as fast as possible. See the technical article PICing on JavaScript for fun and profit for much more about inline caching.

      Register allocation
      Code generated by basic JITs spends a lot of time reading and writing memory: for code like x+y, the machine code first reads x, then reads y, adds them, and then writes the result to temporary storage. With 64-bit values, that's up to 6 memory accesses. A more advanced JIT, such as JägerMonkey, generates code that tries to hold most values in registers. JägerMonkey also does some related optimizations, like trying to avoid storing values at all when they are constant or just a copy of some other value.

    Here's what JägerMonkey does to our benchmark scores:

    That's more than 3x improvement on SunSpider and Kraken and more than 6x on V8!

    Fast Graphics: GPU-powered browsing.
    For Firefox 4, we sped up how Firefox draws and composites web pages using the Graphics Processing Unit (GPU) in most modern computers.

    On Windows Vista and Windows 7, all web pages are hardware accelerated using Direct2D . This provides a great speedup for many complex web sites and demo pages.

    On Windows and Mac, Firefox uses 3D frameworks (Direct3D or OpenGL) to accelerate the composition of web page elements. This same technique is also used to accelerate the display of HTML5 video .

    Final take
    Fast, hardware-accelerated graphics combined plus fast JavaScript means cutting-edge HTML5 games, demos, and apps run great in Firefox 4. You see it on some of the sites we enjoyed making fast. There's plenty more to try in the Mozilla Labs Gaming entries and of course, be sure to check out the Web O' Wonder.

  2. ECMAScript 5 strict mode in Firefox 4

    Editor’s note: This article is posted by Chris Heilmann but authored by Jeff Walden – credit where credit is due.

    Developers in the Mozilla community have made major improvements to the JavaScript engine in Firefox 4. We have devoted much effort to improving performance, but we’ve also worked on new features. We have particularly focused on ECMAScript 5, the latest update to the standard underlying JavaScript.

    Strict mode is arguably the most interesting new feature in ECMAScript 5. It’s a way to opt in to a restricted variant of JavaScript. Strict mode isn’t just a subset: it intentionally has different semantics from normal code. Browsers not supporting strict mode will run strict mode code with different behavior from browsers that do, so don’t rely on strict mode without feature-testing for support for the relevant aspects of strict mode.

    Strict mode code and non-strict mode code can coexist, so scripts can opt into strict mode incrementally. Strict mode blazes a path to future ECMAScript editions where new code with a particular <script type="..."> will likely automatically be executed in strict mode.

    What does strict mode do? First, it eliminates some JavaScript pitfalls that didn’t cause errors by changing them to produce errors. Second, it fixes mistakes that make it difficult for JavaScript engines to perform optimizations: strict mode code can sometimes be made to run faster than identical code that’s not strict mode. Firefox 4 generally hasn’t optimized strict mode yet, but subsequent versions will. Third, it prohibits some syntax likely to be defined in future versions of ECMAScript.

    Invoking strict mode

    Strict mode applies to entire scripts or to individual functions. It doesn’t apply to block statements enclosed in {} braces; attempting to apply it to such contexts does nothing. eval code, event handler attributes, strings passed to setTimeout, and the like are entire scripts, and invoking strict mode in them works as expected.

    Strict mode for scripts

    To invoke strict mode for an entire script, put the exact statement "use strict"; (or 'use strict';) before any other statements.

    // Whole-script strict mode syntax
    "use strict";
    var v = "Hi!  I'm a strict mode script!";

    This syntax has a trap that has already bitten a major site: it isn’t possible to blindly concatenate non-conflicting scripts. Consider concatenating a strict mode script with a non-strict mode script: the entire concatenation looks strict! The inverse is also true: non-strict plus strict looks non-strict. Concatenation of strict mode scripts with each other is fine, and concatenation of non-strict mode scripts is fine. Only crossing the streams by concatenating strict and non-strict scripts is problematic.

    Strict mode for functions

    Likewise, to invoke strict mode for a function, put the exact statement "use strict"; (or 'use strict';) in the function’s body before any other statements.

    function strict()
    {
      // Function-level strict mode syntax
      'use strict';
      function nested() { return "And so am I!"; }
      return "Hi!  I'm a strict mode function!  " + nested();
    }
    function notStrict() { return "I'm not strict."; }

    Changes in strict mode

    Strict mode changes both syntax and runtime behavior. Changes generally fall into these categories:

    • Converting mistakes into errors (as syntax errors or at runtime)
    • Simplifying how the particular variable for a given use of a name is computed
    • Simplifying eval and arguments
    • Making it easier to write “secure” JavaScript
    • Anticipating future ECMAScript evolution

    Converting mistakes into errors

    Strict mode changes some previously-accepted mistakes into errors. JavaScript was designed to be easy for novice developers, and sometimes it gives operations which should be errors non-error semantics. Sometimes this fixes the immediate problem, but sometimes this creates worse problems in the future. Strict mode treats these mistakes as errors so that they’re discovered and promptly fixed.

    First, strict mode makes it impossible to accidentally create global variables. In normal JavaScript, mistyping a variable in an assignment creates a new property on the global object and continues to “work” (although future failure is possible: likely, in modern JavaScript). Assignments which would accidentally create global variables instead throw errors in strict mode:

    "use strict";
    mistypedVaraible = 17; // throws a ReferenceError

    Second, strict mode makes assignments which would otherwise silently fail throw an exception. For example, NaN is a non-writable global variable. In normal code assigning to NaN does nothing; the developer receives no failure feedback. In strict mode assigning to NaN throws an exception. Any assignment that silently fails in normal code will throw errors in strict mode:

    "use strict";
    NaN = 42; // throws a TypeError
    var obj = { get x() { return 17; } };
    obj.x = 5; // throws a TypeError
    var fixed = {};
    Object.preventExtensions(fixed);
    fixed.newProp = "ohai"; // throws a TypeError

    Third, if you attempt to delete undeletable properties, strict mode throws errors (where before the attempt would simply have no effect):

    "use strict";
    delete Object.prototype; // throws a TypeError

    Fourth, strict mode requires that all properties named in an object literal be unique. Normal code may duplicate property names, with the last one determining the property’s value. But since only the last one does anything, the duplication is simply a vector for bugs, if the code is modified to change the property value other than by changing the last instance. Duplicate property names are a syntax error in strict mode:

    "use strict";
    var o = { p: 1, p: 2 }; // !!! syntax error

    Fifth, strict mode requires that function argument names be unique. In normal code the last duplicated argument hides previous identically-named arguments. Those previous arguments remain available through arguments[i], so they’re not completely inaccessible. Still, this hiding makes little sense and is probably undesirable (it might hide a typo, for example), so in strict mode duplicate argument names are a syntax error:

    function sum(a, a, c) // !!! syntax error
    {
      "use strict";
      return a + b + c; // wrong if this code ran
    }

    Sixth, strict mode forbids octal syntax. Octal syntax isn’t part of ECMAScript, but it’s supported in all browsers by prefixing the octal number with a zero: 0644 === 420 and "\\045" === "%". Novice developers sometimes believe a leading zero prefix has no semantic meaning, so they use it as an alignment device — but this changes the number’s meaning! Octal syntax is rarely useful and can be mistakenly used, so strict mode makes octal a syntax error:

    "use strict";
    var sum = 015 + // !!! syntax error
              197 +
              142;

    Simplifying variable uses

    Strict mode simplifies how variable uses map to particular variable definitions in the code. Many compiler optimizations rely on the ability to say that this variable is stored in this location: this is critical to fully optimizing JavaScript code. JavaScript sometimes makes this basic mapping of name to variable definition in the code impossible to perform except at runtime. Strict mode removes most cases where this happens, so the compiler can better optimize strict mode code.

    First, strict mode prohibits with. The problem with with is that any name in it might map either to a property of the object passed to it, or to a variable in surrounding code, at runtime: it’s impossible to know which beforehand. Strict mode makes with a syntax error, so there’s no chance for a name in a with to refer to an unknown location at runtime:

    "use strict";
    var x = 17;
    with (obj) // !!! syntax error
    {
      // If this weren't strict mode, would this be var x, or
      // would it instead be obj.x?  It's impossible in general
      // to say without running the code, so the name can't be
      // optimized.
      x;
    }

    The simple alternative of assigning the object to a variable, then accessing the corresponding property on that variable, stands ready to replace with.

    Second, eval of strict mode code does not introduce new variables into the surrounding code. In normal code eval("var x;") introduces a variable x into the surrounding function or the global scope. This means that, in general, in a function containing a call to eval, every name not referring to an argument or local variable must be mapped to a particular definition at runtime (because that eval might have introduced a new variable that would hide the outer variable). In strict mode eval creates variables only for the code being evaluated, so eval can’t affect whether a name refers to an outer variable or some local variable:

    var x = 17;
    var evalX = eval("'use strict'; var x = 42; x");
    assert(x === 17);
    assert(evalX === 42);

    Relatedly, if the function eval is invoked by an expression of the form eval(...) in strict mode code, the code will be evaluated as strict mode code. The code may explicitly invoke strict mode, but it’s unnecessary to do so.

    function strict1(str)
    {
      "use strict";
      return eval(str); // str will be treated as strict mode code
    }
    function strict2(f, str)
    {
      "use strict";
      return f(str); // not eval(...): str is strict iff it invokes strict mode
    }
    function nonstrict(str)
    {
      return eval(str); // str is strict iff it invokes strict mode
    }
    strict1("'Strict mode code!'");
    strict1("'use strict'; 'Strict mode code!'");
    strict2(eval, "'Non-strict code.'");
    strict2(eval, "'use strict'; 'Strict mode code!'");
    nonstrict("'Non-strict code.'");
    nonstrict("'use strict'; 'Strict mode code!'");

    Third, strict mode forbids deleting plain names. Thus names in strict mode eval code behave identically to names in strict mode code not being evaluated as the result of eval. Using delete name in strict mode is a syntax error:

    "use strict";
    eval("var x; delete x;"); // !!! syntax error

    Making eval and arguments simpler

    Strict mode makes arguments and eval less bizarrely magical. Both involve a considerable amount of magical behavior in normal code: eval to add or remove bindings and to change binding values, and arguments by its indexed properties aliasing named arguments. Strict mode makes great strides toward treating eval and arguments as keywords, although full fixes will not come until a future edition of ECMAScript.

    First, the names eval and arguments can’t be bound or assigned in language syntax. All these attempts to do so are syntax errors:

    "use strict";
    eval = 17;
    arguments++;
    ++eval;
    var obj = { set p(arguments) { } };
    var eval;
    try { } catch (arguments) { }
    function x(eval) { }
    function arguments() { }
    var y = function eval() { };
    var f = new Function("arguments", "'use strict'; return 17;");

    Second, strict mode code doesn’t alias properties of arguments objects created within it. In normal code within a function whose first argument is arg, setting arg also sets arguments[0], and vice versa (unless no arguments were provided or arguments[0] is deleted). For strict mode functions, arguments objects store the original arguments when the function was invoked. The value of arguments[i] does not track the value of the corresponding named argument, nor does a named argument track the value in the corresponding arguments[i].

    function f(a)
    {
      "use strict";
      a = 42;
      return [a, arguments[0]];
    }
    var pair = f(17);
    assert(pair[0] === 42);
    assert(pair[1] === 17);

    Third, arguments.callee is no longer supported. In normal code arguments.callee refers to the enclosing function. This use case is weak: simply name the enclosing function! Moreover, arguments.callee substantially hinders optimizations like inlining functions, because it must be made possible to provide a reference to the un-inlined function if arguments.callee is accessed. For strict mode functions, arguments.callee is a non-deletable property which throws an error when set or retrieved:

    "use strict";
    var f = function() { return arguments.callee; };
    f(); // throws a TypeError

    “Securing” JavaScript

    Strict mode makes it easier to write “secure” JavaScript. Some websites now provide ways for users to write JavaScript which will be run by the website on behalf of other users. JavaScript in browsers can access the user’s private information, so such JavaScript must be partially transformed before it is run, to censor access to forbidden functionality. JavaScript’s flexibility makes it effectively impossible to do this without many runtime checks. Certain language functions are so pervasive that performing runtime checks has considerable performance cost. A few strict mode tweaks, plus requiring that user-submitted JavaScript be strict mode code and that it be invoked in a certain manner, substantially reduce the need for those runtime checks.

    First, the value passed as this to a function in strict mode isn’t boxed into an object. For a normal function, this is always an object: the provided object if called with an object-valued this; the value, boxed, if called with a Boolean, string, or number this; or the global object if called with an undefined or null this. (Use call, apply, or bind to specify a particular this.) Automatic boxing is a performance cost, but exposing the global object in browsers is a security hazard, because the global object provides access to functionality “secure” JavaScript environments must invariably. Thus for a strict mode function, the specified this is used unchanged:

    "use strict";
    function fun() { return this; }
    assert(fun() === undefined);
    assert(fun.call(2) === 2);
    assert(fun.apply(null) === null);
    assert(fun.call(undefined) === undefined);
    assert(fun.bind(true)() === true);

    (Tangentially, built-in methods also now won’t box this if it is null or undefined. [This change is independent of strict mode but is motivated by the same concern about exposing the global object.] Historically, passing null or undefined to a built-in method like Array.prototype.sort() would act as if the global object had been specified. Now passing either value as this to most built-in methods throws a TypeError. Booleans, numbers, and strings are still boxed by these methods: it’s only when these methods would otherwise act on the global object that they’ve been changed.)

    Second, in strict mode it’s no longer possible to “walk” the JavaScript stack via commonly-implemented extensions to ECMAScript. In normal code with these extensions, when a function fun is in the middle of being called, fun.caller is the function that most recently called fun, and fun.arguments is the arguments for that invocation of fun. Both extensions are problematic for “secure” JavaScript, because they allow “secured” code to access “privileged” functions and their (potentially unsecured) arguments. If fun is in strict mode, both fun.caller and fun.arguments are non-deletable properties which throw an error when set or retrieved:

    function restricted()
    {
      "use strict";
      restricted.caller;    // throws a TypeError
      restricted.arguments; // throws a TypeError
    }
    function privilegedInvoker()
    {
      return restricted();
    }
    privilegedInvoker();

    Third, arguments for strict mode functions no longer provide access to the corresponding function call’s variables. In some old ECMAScript implementations arguments.caller was an object whose properties aliased variables in that function. This is a security hazard because it breaks the ability to hide privileged values via function abstraction; it also precludes most optimizations. For these reasons no recent browsers implement it. Yet because of its historical functionality, arguments.caller for a strict mode function is also a non-deletable property which throws an error when set or retrieved:

    "use strict";
    function fun(a, b)
    {
      "use strict";
      var v = 12;
      return arguments.caller; // throws a TypeError
    }
    fun(1, 2); // doesn't expose v (or a or b)

    Paving the way for future ECMAScript versions

    Future ECMAScript versions will likely introduce new syntax, and strict mode in ECMAScript 5 applies some restrictions to ease the transition. It will be easier to make some changes if the foundations of those changes are prohibited in strict mode.

    First, in strict mode a short list of identifiers become reserved keywords. These words are implements, interface, let, package, private, protected, public, static, and yield. In strict mode, then, you can’t name or use variables or arguments with these names. A Mozilla-specific caveat: if your code is JavaScript 1.7 or greater (you’re chrome code, or you’ve used the right <script type="">) and is strict mode code, let and yield have the functionality they’ve had since those keywords were first introduced. But strict mode code on the web, loaded with <script src=""> or <script>...</script>, won’t be able to use let/yield as identifiers.

    Second, strict mode prohibits function statements not at the top level of a script or function. In normal code in browsers, function statements are permitted “everywhere”. This is not part of ES5! It’s an extension with incompatible semantics in different browsers. Future ECMAScript editions hope to specify new semantics for function statements not at the top level of a script or function. Prohibiting such function statements in strict mode “clears the deck” for specification in a future ECMAScript release:

    "use strict";
    if (true)
    {
      function f() { } // !!! syntax error
      f();
    }
    for (var i = 0; i &lt; 5; i++)
    {
      function f2() { } // !!! syntax error
      f2();
    }
    function baz() // kosher
    {
      function eit() { } // also kosher
    }

    This prohibition isn’t strict mode proper, because such function statements are an extension. But it is the recommendation of the ECMAScript committee, and browsers will implement it.

    Strict mode in browsers

    Firefox 4 is the first browser to fully implement strict mode. The Nitro engine found in many WebKit browsers isn’t far behind with nearly-complete strict mode support. Chrome has also started to implement strict mode. Internet Explorer and Opera haven’t started to implement strict mode; feel free to send those browser makers feedback requesting strict mode support.

    Browsers don’t reliably implement strict mode, so don’t blindly depend on it. Strict mode changes semantics. Relying on those changes will cause mistakes and errors in browsers which don’t implement strict mode. Exercise caution in using strict mode, and back up reliance on strict mode with feature tests that check whether relevant features of strict mode are implemented.

    To test out strict mode, download a Firefox nightly and start playing. Also consider its restrictions when writing new code and when updating existing code. (To be absolutely safe, however, it’s probably best to wait to use it in production until it’s shipped in browsers.)

  3. Fun With Fast JavaScript

    This post is by Vladimir Vukićević and is a re-post from his personal weblog.

    Fast JavaScript is a cornerstone of the modern web. In the past, application authors had to wait for browser developers to implement any complex functionality in the browser itself, so that they could access it from script code. Today, many of those functions can move straight into JavaScript itself. This has many advantages for application authors: there’s no need to wait for a new version of a browser before you can develop or ship your app, you can tailor the functionality to exactly what you need, and you can improve it directly (make it faster, higher quality, more precise, etc.).

    Here are two examples that show off what can be done with the improved JS engine and capabilities that will be present in Firefox 4. The first example shows a simple web-based Darkroom that allows you to perform color correction on an image. The HTML+JS is around 700 lines of code, not counting jQuery. This is based on a demo that’s included with Google’s Native Client (NaCl) SDK; in that demo, the color correction work is done inside native code going through NaCl. That demo (originally presented as “too slow to run in JavaScript”) is a few thousand lines of code, and involves downloading and installing platform-specific compilers, multiple steps to test/deploy code, and installing a plugin on the browser side.

    I get about 15-16 frames per second with the default zoomed out image (around 5 million pixels per second — that number won’t be affected by image size) on my MacBook Pro, which is definitely fast enough for live manipulation. The algorithm could be tightened up to make this faster still. Further optimizations to the JS engine could help here as well; for example, I noticed that we spend a lot of time doing floating point to integer conversions for writing the computed pixels back to the display canvas, due to how the canvas API specifies image data handling.

    The Web Darkroom tool also supports drag & drop, so you can take any image from your computer and drop it onto the canvas to load it. A long (long!) time ago, back in 2006, I wrote an addon called “Croppr!”. It was intended to be used with Flickr, allowing users to play around with custom crops of any image, and then leave crop suggestions in comments to be viewed using Croppr. It almost certainly doesn’t work any more, but it would be neat to update it: this time with both cropping and color correction. Someone with the addon (perhaps a Jetpack now!) could then visit a Flickr photo and experiment, and leave suggestions for the photographer.

    The second example is based on some work that Dave Humphrey and others have been doing to bring audio manipulation to the web platform. Originally, their spec included a pre-computed FFT with each audio frame delivered to the web app. I suggested that there’s no need for this — while a FFT is useful for some applications, for others it would be wasted work. Those apps that want a FFT could implement one in JS. Some benchmark numbers backed this up — using the typed arrays originally created for WebGL, computing an FFT in JS was approaching the speed of native code. Again, both could be sped up (perhaps using SSE2 or something like Mono.Simd on the JS side), but it’s fast enough to be useful already.

    The demo shows this in action. A numeric benchmark isn’t really all that interesting, so instead I take a video clip, and as it’s playing, I extract a portion of the green channel of each frame and compute its 2D FFT, which is then displayed. The original clip plays at 24 frames per second, so that’s the upper bound of this demo. Using Float32 typed arrays, the computation and playback proceeds at around 22-24fps for me.

    You can grab the video controls and scrub to a specific frame. (The frame rate calculation is only correct while the video is playing normally, not while you’re scrubbing.) The video source uses Theora, so you’ll need a browser that can play Theora content. (I didn’t have a similar clip that uses WebM, or I could have used that.)

    These examples are demonstrating the strength of the trace-based JIT technique that Firefox has used for accelerating JavaScript since Firefox 3.5. However, not all code can see such dramatic speedups from that type of acceleration. Because of that, we’ll be including a full method-based JIT for Firefox 4 (for more details, see David Anderson’s blog, as well as David Mandelin’s blog). This will provide significantly faster baseline JS performance, with the trace JIT becoming a turbocharger for code that it would naturally apply to.

    Combining fast JavaScript performance alongside new web platform technologies such as WebGL and Audio will make for some pretty exciting web apps, and I’m looking forward to seeing what developers do with them!

    Edit: Made some last-minute changes to the demos, which ended up pulling in a slightly broken version of jQuery UI that wasn’t all that happy with Safari. Should be fixed now!

  4. improving JavaScript performance with JägerMonkey

    In August 2008, Mozilla introduced TraceMonkey. The new engine, which we shipped in Firefox 3.5, heralded a new era of performance to build the next generation of web browsers and web applications. Just after the introduction of our new engine Google introduced V8 with Chrome. Apple also introduced their own engine to use in Safari, and even Opera has a new engine that they’ve introduced with their latest browser beta.

    As a direct result of these new engines we’ve started to see new types of applications start to emerge. People experimenting with bringing Processing to the web, people experimenting with real-time audio manipulation, games and many other things. (For some good examples have a look at our list of Canvas demos.)

    We’ve learned two things at Mozilla about how our JavaScript engine interacts with these new applications:

    1. That the approach that we’ve taken with tracing tends to interact poorly with certain styles of code. (That NES game example above, for example, tends to perform very badly in our engine – it’s essentially a giant switch statement.)
    2. That when we’re able to “stay on trace” (more on this later) TraceMonkey wins against every other engine.

    Mozilla’s engine is fundamentally different than every other engine: everyone else uses what’s called a “method-based JIT”. That is, they take all incoming JS code, compile it to machine code and then execute it. Firefox uses a “tracing JIT.” We interpret all incoming JS code and record as we’re interpreting it. When we detect a hot path, we turn that into machine code and then execute that inner part. (For more background on tracing, see this post on hacks from last year.)

    The downside of the tracing JIT is that we have to switch back and forth between the interpreter and the machine code whenever we reach certain conditions. When we have to jump back from machine code to the interpreter this is what we call being “knocked off trace.” The interpreter is, of course, much slower than running native machine code. And it turns out that happens a lot – more than anyone expected.

    So what we’re doing in our 2nd generation engine is to combine the best elements of both approaches:

    1. We’re using some chunks of the WebKit JS engine and building a full-method JIT to execute JavaScript code. This should get us fast baseline JS performance like the other engines. And most important, it will be consistent – no more jumping on and off trace and spending a huge amount of time in interpreted code.
    2. We’ll be bolting our tracing engine into the back of that machine code to generate super-fast code for inner loops. This means that we’ll be able to still have the advantages of a tracing engine with the consistency of the method-based JIT.

    This work is still in the super-early stages, to the point where it’s not even worth demoing, but we thought it would be worth posting about so people understand the basics of what’s going on.

    You can find more information about this on David Mandelin‘s and David Anderson‘s weblogs as well as the project page for the the new engine.

  5. JavaScript speedups in Firefox 3.6

    This post was written by David Mandelin who works on Mozilla’s JavaScript team.

    Firefox 3.5 introduced TraceMonkey, our new JavaScript engine that traces loops and JIT compiles them to native (x86/ARM) code. Many JavaScript programs ran 3-4x faster in TraceMonkey compared to Firefox 3. (See our previous article for technical details.)

    For JavaScript performance in Firefox 3.6, we focused on the areas that we thought needed further improvement the most:

    • Some JavaScript code was not trace-compiled in Firefox 3.5. Tracing was disabled by default for Firefox UI JavaScript and add-on JavaScript, so those programs did not benefit from tracing. Also, many advanced JavaScript features were not trace-compiled. For Firefox 3.6, we wanted to trace more programs and more JS features.
    • Animations coded with JavaScript were often choppy because of garbage collection pauses. We wanted to improve GC performance to make pauses shorter and animations smoother.

    In this article, I’ll explain the most important JS performance improvements that come with Firefox 3.6. I’ll focus on listing what kinds of JS code get faster, including sample programs that show the improvements Fx3.6 makes over Fx3.5.

    JIT for Browser UI JavaScript

    Firefox runs JavaScript code in one of two contexts:content and chrome (no relation to Google Chrome). JavaScript that is part of web content runs in a content context. JavaScript that is part of the browser UI or browser add-ons runs in a chrome context and has extra privileges. For example, chrome JS can alter the main browser UI, but content JS is not allowed to.

    The TraceMonkey JIT can be enabled or disabled separately for content and chrome JS using about:config. Because bugs affecting chrome JS are a greater risk for security and reliability, in Firefox 3.5 we chose to disable the JIT for chrome JS by default. After extensive testing, we’ve decide to enable the JIT for chrome JS by default, something we did not have time to fully investigate for Fx3.5. Turning on the JIT for chrome should make the JS behind the Firefox UI and add-ons run faster. This difference is probably not very noticeable for general browser usage, because the UI was designed and coded to perform well with the older JS engines. The difference should be more noticeable for add-ons that do heavy JS computation.

    Option Fx3.5 Default Fx3.6 Default
    javascript.options.jit.chrome false true
    javascript.options.jit.content true true
    about:config options for the JIT

    Garbage Collector Performance

    JavaScript is a garbage-collected language, so periodically the JavaScript engine must reclaim unused memory. Our garbage collector (GC) pauses all JavaScript programs while it works. This is fine as long as the pauses are “short”. But if the pauses are even a little too long, they can make animations jerky. Animations need to run at 30-60 frames per second to look smooth, which means it should take no longer than 17-33 ms to render one frame. Thus, GC pauses longer than 40 ms cause jerkiness, while pauses under 10 ms should be almost unnoticeable. In Firefox 3.5, pause times were noticeably long, and JavaScript animations are increasingly common on the web, so reducing pause times was a major goal for JavaScript in Firefox 3.6.

    Demo: GC Pauses and Animation

    Demo.
    The spinning dial animation shown here illustrates pause times. Besides animating the dial, this demo creates one million 100-character strings per second, so it requires frequent GC. The frame delay meter gives the average time between frames in milliseconds. The estimated GC delay meter gives the average estimated GC delay, based on the assumption that if a frame has a delay of 1.7 times the average delay or more, then exactly one GC ran during that frame. (This procedure may not be valid for other browsers, so it is not valid for comparing different browsers. Note also that the GC time also depends on other live JavaScript sessions, so for a direct comparison of two browsers, have the same tabs open in each.) On my machine, I get an estimated GC delay of about 80 ms in Fx3.5, but only 30 ms in Fx3.6.

    But it’s a lot easier to see the difference by opening the demo in Fx3.5, watching it a bit, and then trying it in Fx3.6.
    In Fx3.5, I see frequent pauses and the animation looks noticeably jerky. In Fx3.6, it looks pretty smooth, and it’s hard for me even to tell exactly when the GC is running.

    How Fx3.6 does it better. We’ve made many improvements to the garbage collector and memory allocator. I want to give a little more technical details on the big two changes that really cut our pause times.

    First, we noticed that a large fraction of the pause time was spent calling free to reclaim the unused memory. We can’t do much to make freeing memory faster, but we realized we could do it on a separate thread. In Fx3.6, the main JS thread simply adds unused memory chunks to a queue, and another thread frees them during idle time or on a separate processor. This means machines with 2 or more cores will benefit more from this change. But even when one core, freeing might be delayed to an idle time when it will not affect scripts.

    Second, we knew that in Fx3.5 running GC clears out all the native code compiled by the JIT as well as some other caches that speed up JS. The reason is that the tracing JIT and GC did not know about each other, so if the GC ran, it might reclaim objects being used by a compiled trace. The result was that immediately after a GC, JS ran a bit slower as the caches and compiled traces were built back up. This would be experienced as either an extended GC pause or a brief hiccup of slow animation right after the GC pause. In Fx3.6, we taught the GC and the JIT to work together, and now the GC does not clear caches or wipe out native code, so it resumes running normally right after GC.

    Tracing More JavaScript Constructs

    In my article on TraceMonkey for the Fx3.5 release, I noted that certain code constructs, such as the arguments object, were not traced and did not get performance improvements from the JIT. A major goal for JS in Fx3.6 was to trace more stuff, so more programs can run faster. We do trace more stuff now, in particular:

    • DOM Properties. DOM objects are special and harder for the trace compiler to work with. For Fx3.5, we implemented tracing of DOM methods, but not DOM properties. Now we trace DOM properties (and other “native” C++ getters and setters) as well. We still do not trace scripted getters and setters.
    • Closures. Fx3.5 traced only a few operations involving closures (by which I mean functions that refer to variables defined in lexically enclosing functions). Fx3.6 can trace more programs that use closures. The main operation that is still not traced yet is creating an anonymous function that modifies closure variables. But calling such a function and actually writing to the closure variables are traced.
    • arguments. We now trace most common uses of the arguments keyword. “Exotic” uses, such as setting elements of arguments, are not traced.
    • switch. We have improved performance when tracing switch statements that use densely packed numeric case labels. These are particularly important for emulators and VMs.

    These improvements are particularly important for jQuery and Dromaeo, which heavily use arguments, closures, and the DOM. I suspect many other complex JavaScript applications will also benefit. For example, we recently heard from the author that this R-tree library performs much better in Fx3.6.

    Here is a pair of demos of new things we trace. The first sets a DOM property in a loop. The second calls a sum function implemented with arguments I get a speedup of about 2x for both of them in Fx3.6 vs. Fx3.5.

    Demo: Fx3.6 Tracing DOM properties and arguments


    DOM Property Set:

    Sum using arguments:

    String and RegExp Improvements

    Fx3.6 includes several improvements to string and regular expression performance. For example, the regexp JIT compiler now supports a larger class of regular expressions, including the ever-popular \w+. We also made some of our basic operations faster, like indexOf, match, and search. Finally, we made concatenating sequences of several strings inside a function (a common operation in building up HTML or other kinds of textual output) much faster.

    Technical aside on how we made string concatenation faster: The C++ function that concatenates two strings S1 and S2 does this: Allocate a buffer big enough to hold the result, then copy the characters of S1 and S2 into the buffer. To concatenate more than two strings, as in JS s + "foo" + t, Fx3.5 simply concatenates two at a time from left to right.

    Using the Fx3.5 algorithm, to concatenate N strings each of length K, we need to do N-1 memory allocations, and all but one of them are for temporary strings. Worse, the first two input strings are copied N-1 times, the next one is copied N-2 times, and so on. The total number of characters copied is K(N-1)(N+2)/2, which is O(N^2).

    Clearly, we can do a lot better. The minimum work we can do is to copy each input string exactly once to the output string, for a total of KN characters copied. Fx3.6 achieves this by detecting sequences of concatenation in JS programs and combining the entire sequence into one operation that uses the optimal algorithm.

    Here are a few string benchmarks you can try that are faster in Fx3.6:

    Demo: Fx3.6 String Operations


    /\w+/:

    indexOf('foo'):

    match('foo'):

    Build HTML:

    Final Thoughts and Next Steps

    We also made a lot of little improvements that don't fit into the big categories above. Most importantly, Adobe, Mozilla, Intel, Sun, and other contributors continue to improve nanojit, the compiler back-end used by TraceMonkey. We have improved its use of memory, made trace recording and compiling faster, and also improved the speed of the generated native code. A better nanojit gives a boost to all JS that runs in the JIT.

    There are two big items that didn't make the cut for Fx3.6, but will be in the next version of Firefox and are already available in nightly builds:

    • JITting recursion. Recursive code, like explicit looping code, is likely to be hot code, so it should be JITted. Nightly builds JIT directly recursive functions. Mutual recursion (g calls f calls g) is not traced yet.
    • AMD x64 nanojit backend. Nanojit now has a backend that generates AMD x64 code, which gives the possibility of better performance on that plaform.

    And if you try a nightly build, you'll find that many of these demos are already even faster than in Fx3.6!

  6. Firefox 3.6 Alpha 1 – web developer changes

    As covered on the Mozilla Developer Center, Firefox 3.6 Alpha 1 is now available for download. And we’ve been busy since Firefox 3.5.

    Web developers will be interested in a number of features that are new in Firefox 3.6 Alpha 1:

    • The TraceMonkey JavaScript engine has continued to get faster.
    • We’ve made a huge number of improvements to overall DOM and element layout performance. In some cases we’re much, much faster. We’ll cover details on those in a later post.
    • The compositor landing has made it possible to fix a large number of interactions between web content, CSS and plugins. We’ll be talking about this in a later post as well.
    • We now support the -moz-background-size CSS property which lets you set the size of background images.
    • We now support CSS Gradients.
    • We now support multiple background images.
    • We now support the rem unit as a CSS unit.
    • image-rendering is supported for images, background images, videos and canvases.
    • We now send a reorder event to embedded frames and iframes when their document is loaded.
    • We’ve removed the getBoxObjectFor() method. It was non-standard and exposed all kinds of non-standard stuff to the web.
    • We now send a hashchange event to a page whenever the URI part after the # changes.
    • We now have Geolocation address support for user-readable position information.
    • We now support the complete attribute on document.readystate.

    You can keep track of this list and other features for XUL and add-ons developers on the Firefox 3.6 for developers page on developer.mozilla.org

    Unlike the year that passed between Firefox 3 and Firefox 3.5, we expect that this 3.6 release will be released in a small number of months. Our main focus for the 3.6 release will be end-user perceived performance, TraceMonkey and DOM performance and new web developer features.

    Enjoy and test away!

  7. an overview of TraceMonkey

    This post was written by David Mandelin who works on Mozilla’s JavaScript team.

    Firefox 3.5 has a new JavaScript engine, TraceMonkey, that runs many JavaScript programs 3-4x faster than Firefox 3, speeding up existing web apps and enabling new ones. This article gives a peek under the hood at the major parts of TraceMonkey and how they speed up JS. This will also explain what kinds of programs get the best speedup from TraceMonkey and what kinds of things you can do to get your program to run faster.

    Why it’s hard to run JS fast: dynamic typing

    High-level dynamic languages such as JavaScript and Python make programming more productive, but they have always been slow compared to statically typed languages like Java or C. A typical rule of thumb was that a JS program might be 10x slower than an equivalent Java program.

    There are two main reasons JS and other dynamic scripting languages usually run slower than Java or C. The first reason is that in dynamic languages it is generally not possible to determine the types of values ahead of time. Because of this, the language must store all values in a generic format and process values using generic operations.

    In Java, by contrast, the programmer declares types for variables and methods, so the compiler can determine the types of values ahead of time. The compiler can then generate code that uses specialized formats and operations that run much faster than generic operations. I will call these type-specialized operations.

    The second main reason that dynamic languages run slower is that scripting languages are usually implemented with interpreters, while statically typed languages are compiled to native code. Interpreters are easier to create, but they incur extra runtime overhead for tracking their internal state. Languages like Java compile to machine language, which requires almost no state tracking overhead.

    Let’s make this concrete with a picture. Here are the slowdowns in picture form for a simple numeric add operation: a + b, where a and b are integers. For now, ignore the rightmost bar and focus on the comparison of the Firefox 3 JavaScript interpreter vs. a Java JIT. Each column shows the steps that have to be done to complete the add operation in each programming language. Time goes downward, and the height of each box is proportional to the time it takes to finish the steps in the box.

    time diagram of add operation

    In the middle, Java simply runs one machine language add instruction, which runs in time T (one processor cycle). Because the Java compiler knows that the operands are standard machine integers, it can use a standard integer add machine language instruction. That’s it.

    On the left, SpiderMonkey (the JS interpreter in FF3) takes about 40 times as long. The brown boxes are interpreter overhead: the interpreter must read the add operation and jump to the interpreter’s code for a generic add. The orange boxes represent extra work that has to be done because the interpreter doesn’t know the operand types. The interpreter has to unpack the generic representations of a and i, figure out their types, select the specific addition operation, convert the values to the right types, and at the end, convert the result back to a generic format.

    The diagram shows that using an interpreter instead of a compiler is slowing things down a little bit, but not having type information is slowing things down a lot. If we want JS to run more than a little faster than in FF3, by Amdahl’s law, we need to do something about types.

    Getting types by tracing

    Our goal in TraceMonkey is to compile type-specialized code. To do that, TraceMonkey needs to know the types of variables. But JavaScript doesn’t have type declarations, and we also said that it’s practically impossible for a JS engine to figure out the types ahead of time. So if we want to just compile everything ahead of time, we’re stuck.

    So let’s turn the problem around. If we let the program run for a bit in an interpreter, the engine can directly observe the types of values. Then, the engine can use those types to compile fast type-specialized code. Finally, the engine can start running the type-specialized code, and it will run much faster.

    There are a few key details about this idea. First, when the program runs, even if there are many if statements and other branches, the program always goes only one way. So the engine doesn’t get to observe types for a whole method — the engine observes types through the paths, which we call traces, that the program actually takes. Thus, while standard compilers compile methods, TraceMonkey compiles traces. One side benefit of trace-at-a-time compilation is that function calls that happen on a trace are inlined, making traced function calls very fast.

    Second, compiling type-specialized code takes time. If a piece of code is going to run only one or a few times, which is common with web code, it can easily take more time to compile and run the code than it would take to simply run the code in an interpreter. So it only pays to compile hot code (code that is executed many times). In TraceMonkey, we arrange this by tracing only loops. TraceMonkey initially runs everything in the interpreter, and starts recording traces through a loop once it gets hot (runs more than a few times).

    Tracing only hot loops has an important consequence: code that runs only a few times won’t speed up in TraceMonkey. Note that this usually doesn’t matter in practice, because code that runs only a few times usually runs too fast to be noticeable. Another consequence is that paths through a loop that are not taken at all never need to be compiled, saving compile time.

    Finally, above we said that TraceMonkey figures out the types of values by observing execution, but as we all know, past performance does not guarantee future results: the types might be different the next time the code is run, or the 500th next time. And if we try to run code that was compiled for numbers when the values are actually strings, very bad things will happen. So TraceMonkey must insert type checks into the compiled code. If a check doesn’t pass, TraceMonkey must leave the current trace and compile a new trace for the new types. This means that code with many branches or type changes tends to run a little slower in TraceMonkey, because it takes time to compile the extra traces and jump from one to another.

    TraceMonkey in action

    Now, we’ll show tracing in action by example on this sample program, which adds the first N whole numbers to a starting value:

     function addTo(a, n) {
       for (var i = 0; i < n; ++i)
         a = a + i;
       return a;
     }
     
     var t0 = new Date();
     var n = addTo(0, 10000000);
     print(n);
     print(new Date() - t0);

    TraceMonkey always starts running the program in the interpreter. Every time the program starts a loop iteration, TraceMonkey briefly enters monitoring mode to increment a counter for that loop. In FF3.5, when the counter reaches 2, the loop is considered hot and it’s time to trace.

    Now TraceMonkey continues running in the interpreter but starts recording a trace as the code runs. The trace is simply the code that runs up to the end of the loop, along with the types used. The types are determined by looking at the actual values. In our example, the loop executes this sequence of JavaScript statements, which becomes our trace:

        a = a + i;    // a is an integer number (0 before, 1 after)
        ++i;          // i is an integer number (1 before, 2 after)
        if (!(i < n)) // n is an integer number (10000000)
          break;

    That’s what the trace looks like in a JavaScript-like notation. But TraceMonkey needs more information in order to compile the trace. The real trace looks more like this:

      trace_1_start:
        ++i;            // i is an integer number (0 before, 1 after)
        temp = a + i;   // a is an integer number (1 before, 2 after)
        if (lastOperationOverflowed())
          exit_trace(OVERFLOWED);
        a = temp;
        if (!(i < n))   // n is an integer number (10000000)
          exit_trace(BRANCHED);
        goto trace_1_start;

    This trace represents a loop, and it should be compiled as a loop, so we express that directly using a goto. Also, integer addition can overflow, which requires special handling (for example, redoing with floating-point addition), which in turn requires exiting the trace. So the trace must include an overflow check. Finally, the trace exits in the same way if the loop condition is false. The exit codes tell TraceMonkey why the trace was exited, so that TraceMonkey can decide what to do next (such as whether to redo the add or exit the loop). Note that traces are recorded in a special internal format that is never exposed to the programmer — the notation used above is just for expository purposes.

    After recording, the trace is ready to be compiled to type-specialized machine code. This compilation is performed by a tiny JIT compiler (named, appropriately enough, nanojit) and the results are stored in memory, ready to be executed by the CPU.

    The next time the interpreter passes the loop header, TraceMonkey will start executing the compiled trace. The program now runs very fast.

    On iteration 65537, the integer addition will overflow. (2147450880 + 65537 = 2147516417, which is greater than 2^31-1 = 2147483647, the largest signed 32-bit integer.) At this point, the trace exits with an OVERFLOWED code. Seeing this, TraceMonkey will return to interpreter mode and redo the addition. Because the interpreter does everything generically, the addition overflow is handled and everything works as normal. TraceMonkey will also start monitoring this exit point, and if the overflow exit point ever becomes hot, a new trace will be started from that point.

    But in this particular program, what happens instead is that the program passes the loop header again. TraceMonkey knows it has a trace for this point, but TraceMonkey also knows it can’t use that trace because that trace was for integer values, but a is now in a floating-point format. So TraceMonkey records a new trace:

      trace_2_start:
        ++i;            // i is an integer number
        a = a + i;      // a is a floating-point number
        if (!(i < n))   // n is an integer number (10000000)
          exit_trace(BRANCHED);
        goto trace_2_start;

    TraceMonkey then compiles the new trace, and on the next loop iteration, starts executing it. In this way, TraceMonkey keeps the JavaScript running as machine code, even when types change. Eventually the trace will exit with a BRANCHED code. At this point, TraceMonkey will return to the interpreter, which takes over and finishes running the program.

    Here are the run times for this program on my laptop (2.2GHz MacBook Pro):

    System Run Time (ms)
    SpiderMonkey (FF3) 990
    TraceMonkey (FF3.5) 45
    Java (using int) 25
    Java (using double) 74
    C (using int) 5
    C (using double) 15

    This program gets a huge 22x speedup from tracing and runs about as fast as Java! Functions that do simple arithmetic inside loops usually get big speedups from tracing. Many of the bit operation and math SunSpider benchmarks, such bitops-3bit-bits-in-byte, ctrypto-sha1, and math-spectral-norm get 6x-22x speedups.

    Functions that use more complex operations, such as object manipulation, get a smaller speedup, usually 2-5x. This follows mathematically from Amdahl’s law and the fact that complex operations take longer. Looking back at the time diagram, consider a more complex operation that takes time 30T for the green part. The orange and brown parts will still be about 30T together, so eliminating them gives a 2x speedup. The SunSpider benchmark string-fasta is an example of this kind of program: most of the time is taken by string operations that have a relatively long time for the green box.

    Here is a version of our example program you can try in the browser:

    Numerical result:

    Run time:

    Average run time:

    Understanding and fixing performance problems

    Our goal is to make TraceMonkey reliably fast enough that you can write your code in the way that best expresses your ideas, without worrying about performance. If TraceMonkey isn’t speeding up your program, we hope you’ll report it as a bug so we can improve the engine. That said, of course, you may need your program to run faster in today’s FF3.5. In this section, we’ll explain some tools and techniques for fixing performance of a program that doesn’t get a good (2x or more) speedup with the tracing JIT enabled. (You can disable the jit by going to about:config and setting the pref javascript.options.jit.content to false.)

    The first step is understanding the cause of the problem. The most common cause is a trace abort, which simply means that TraceMonkey was unable to finish recording a trace, and gave up. The usual result is that the loop containing the abort will run in the interpreter, so you won’t get a speedup on that loop. Sometimes, one path through the loop is traced, but there is an abort on another path, which will cause TraceMonkey to switch back and forth between interpreting and running native code. This can leave you with a reduced speedup, no speedup, or even a slowdown: switching modes takes time, so rapid switching can lead to poor performance.

    With a debug build of the browser or a JS shell (which I build myself – we don’t publish these builds) you can tell TraceMonkey to print information about aborts by setting the TMFLAGS environment variable. I usually do it like this:

    TMFLAGS=minimal,abort
    dist/MinefieldDebug.app/Contents/MacOS/firefox
    

    The minimal option prints out all the points where recording starts and where recording successfully finishes. This gives a basic idea of what the tracer is trying to do. The abort option prints out all the points where recording was aborted due to an unsupported construct. (Setting TMFLAGS=help will print the list of other TMFLAGS options and then exit.)

    (Note also that TMFLAGS is the new way to print the debug information. If you are using a debug build of the FF3.5 release, the environment variable setting is TRACEMONKEY=abort.)

    Here’s an example program that doesn’t get much of a speedup in TraceMonkey.

    function runExample2() {
      var t0 = new Date;
     
      var sum = 0;
      for (var i = 0; i < 100000; ++i) {
        sum += i;
      }
     
      var prod = 1;
      for (var i = 1; i < 100000; ++i) {
        eval("prod *= i");
      }
      var dt = new Date - t0;
      document.getElementById(example2_time').innerHTML = dt + ' ms';
    }

    Run time:

    If we set TMFLAGS=minimal,abort, we’ll get this:

    Recording starting from ab.js:5@23
    recording completed at  ab.js:5@23 via closeLoop
     
    Recording starting from ab.js:5@23
    recording completed at  ab.js:5@23 via closeLoop
     
    Recording starting from ab.js:10@63
    Abort recording of tree ab.js:10@63 at ab.js:11@70: eval.
     
    Recording starting from ab.js:10@63
    Abort recording of tree ab.js:10@63 at ab.js:11@70: eval.
     
    Recording starting from ab.js:10@63
    Abort recording of tree ab.js:10@63 at ab.js:11@70: eval.

    The first two pairs of lines show that the first loop, starting at line 5, traced fine. The following lines showed that TraceMonkey started tracing the loop on line 10, but failed each time because of an eval.

    An important note about this debug output is that you will typically see some messages referring to inner trees growing, stabilizing, and so on. These really aren’t problems: they usually just indicate a delay in finishing tracing a loop because of the way TraceMonkey links inner and outer loops. And in fact, if you look further down the output after such aborts, you will usually see that the loops eventually do trace.

    Otherwise, aborts are mainly caused by JavaScript constructs that are not yet supported by tracing. The trace recording process is easier to implement for a basic operation like + than it is for an advanced feature like arguments. We didn’t have time to do robust, secure tracing of every last JavaScript feature in time for the FF3.5 release, so some of the more advanced ones, like arguments, aren’t traced in FF3.5.0. Other advanced features that are not traced include getters and setters, with, and eval. There is partial support for closures, depending on exactly how they are used. Refactoring to avoid these constructs can help performance.

    Two particularly important JavaScript features that are not traced are:

    • Recursion. TraceMonkey doesn’t see repetition that occurs through recursion as a loop, so it doesn’t try to trace it. Refactoring to use explicit for or while loops will generally give better performance.
    • Getting or setting a DOM property. (DOM method calls are fine.) Avoiding these constructs is generally impossible, but refactoring the code to move DOM property access out of hot loops and performance-critical segments should help.

    We are actively working on tracing all the features named above. For example, support for tracing arguments is already available in nightly builds.

    Here is the slow example program refactored to avoid eval. Of course, I could have simply done the multiplication inline. Instead, I used a function created by eval because that’s a more general way of refactoring an eval. Note that the eval still can’t be traced, but it only runs once so it doesn’t matter.

    function runExample3() {
      var t0 = new Date;
     
      var sum = 0;
      for (var i = 0; i < 100000; ++i) {
        sum += i;
      }
     
      var prod = 1;
      var mul = eval("(function(i) { return prod * i; })");
     
      for (var i = 1; i < 100000; ++i) {
        prod = mul(i);
      }
      var dt = new Date - t0;
      document.getElementById('example3_time').innerHTML = dt + ' ms';
    }

    Run time:

    There are a few more esoteric situations that can also hurt tracing performance. One of them is trace explosion, which happens when a loop has many paths through it. Consider a loop with 10 if statements in a row: the loop has 1024 paths, potentially causing 1024 traces to be recorded. That would use up too much memory, so TraceMonkey caps each loop at 32 traces. If the loop has fewer than 32 hot traces, it will perform well. But if each path occurs with equal frequency, then only 3% of the paths are traced, and performance will suffer.

    This kind of problem is best analyzed with TraceVis, which creates visualizations of TraceMonkey performance. Currently, the build system only supports enabling TraceVis for shell builds, but the basic system can also run in the browser, and there is ongoing work to enable TraceVis in a convenient form in the browser.

    The blog post on TraceVis is currently the most detailed explanation of what the diagrams mean and how to use them to diagnose performance problems. The post also contains a detailed analysis of a diagram that is helpful in understanding how TraceMonkey works in general.

    Comparative JITerature

    Here I will give a few comparisons to other JavaScript JIT designs. I’ll focus more on hypothetical designs than competing engines, because I don’t know details about them — I’ve read the release information and skimmed a few bits of code. Another big caveat is that real-world performance depends at least as much on engineering details as it does on engine architecture.

    One design option could be a called a per-method non-specializing JIT. By this, I mean a JIT compiler that compiles a method at a time and generates generic code, just like what the interpreter does. Thus, the brown boxes from our diagrams are cut out. This kind of JIT doesn’t need to take time to record and compile traces, but it also does not type-specialize, so the orange boxes remain. Such an engine can still be made pretty fast by carefully designing and optimizing the orange box code. But the orange box can’t be completely eliminated in this design, so the maximum performance on numeric programs won’t be as good as a type-specializing engine.

    As far as I can tell, as of this writing Nitro and V8 are both lightweight non-specializing JITs. (I’m told that V8 does try to guess a few types by looking at the source code (such as guessing that a is an integer in a >> 2) in order to do a bit of type specialization.) It seem that TraceMonkey is generally faster on numeric benchmarks, as predicted above. But TraceMonkey suffers a bit on benchmarks that use more objects, because our object operations and memory management haven’t been optimized as heavily.

    A further development of the basic JIT is the per-method type-specializing JIT. This kind of a JIT tries to type-specialize a method based on the argument types the method is called with. Like TraceMonkey, this requires some runtime observation: the basic design checks the argument types each time a method is called, and if those types have not been seen before, compiles a new version of the method. Also like TraceMonkey, this design can heavily specialize code and remove both the brown and orange boxes.

    I’m not aware that anyone has deployed a per-method type-specializing JIT for JavaScript, but I wouldn’t be surprised if people are working on it.

    The main disadvantage of a per-method type-specializing JIT compared to a tracing JIT is that the basic per-method JIT only directly observes the input types to a method. It must try to infer types for variables inside the method algorithmically, which is difficult for JavaScript, especially if the method reads object properties. Thus, I would expect that a per-method type-specializing JIT would have to use generic operations for some parts of the method. The main advantage of the per-method design is that the method needs to be compiled exactly once per set of input types, so it’s not vulnerable to trace explosion. In turn, I think a per-method JIT would tend to be faster on methods that have many paths, and a tracing JIT would tend to be faster on highly type-specializable methods, especially if the method also reads a lot of values from properties.

    Conclusion

    By now, hopefully you have a good idea of what makes JavaScript engines fast, how TraceMonkey works, and how to analyze and fix some performance issues that may occur running JavaScript under TraceMonkey. Please report bugs if you run into any significant performance problems. Bug reports are also a good place for us to give additional tuning advice. Finally, we’re trying to improve constantly, so check out nightly TraceMonkey builds if you’re into the bleeding edge.

  8. what does tracemonkey feel like?

    One of our goals with Firefox 3.5 is to help upgrade the web. Over the lifecycle of this release we’ve invested heavily in developer features. One of the features that we’ve invested in is TraceMonkey – a tracing interpreter that turns commonly-run JavaScript code into machine code so that it can run at near-native speeds. We consider it to be both an end user feature because it makes existing web applications faster as well as a developer feature because of the new kinds of applications it enables.

    We’re always challenged to try and come up with ways to describe what that means in a way that’s not a dry benchmark. How can we explain what it feels like?

    We’ve made a video to help describe both what it means by the numbers, but also shows what it feels like. If you want to try the demo we suggest you try it in both Firefox 3 and Firefox 3.5. It’s something you can really feel.

    Sadfaces. Your browser doesn’t support native video. Maybe you can try the .ogv version or the .mp4 version and hope for the best?