In this article, I’ll explain the most important JS performance improvements that come with Firefox 3.6. I’ll focus on listing what kinds of JS code get faster, including sample programs that show the improvements Fx3.6 makes over Fx3.5.
The TraceMonkey JIT can be enabled or disabled separately for content and chrome JS using
about:config. Because bugs affecting chrome JS are a greater risk for security and reliability, in Firefox 3.5 we chose to disable the JIT for chrome JS by default. After extensive testing, we’ve decide to enable the JIT for chrome JS by default, something we did not have time to fully investigate for Fx3.5. Turning on the JIT for chrome should make the JS behind the Firefox UI and add-ons run faster. This difference is probably not very noticeable for general browser usage, because the UI was designed and coded to perform well with the older JS engines. The difference should be more noticeable for add-ons that do heavy JS computation.
|Option||Fx3.5 Default||Fx3.6 Default|
Garbage Collector Performance
Demo: GC Pauses and Animation
But it’s a lot easier to see the difference by opening the demo in Fx3.5, watching it a bit, and then trying it in Fx3.6.
In Fx3.5, I see frequent pauses and the animation looks noticeably jerky. In Fx3.6, it looks pretty smooth, and it’s hard for me even to tell exactly when the GC is running.
How Fx3.6 does it better. We’ve made many improvements to the garbage collector and memory allocator. I want to give a little more technical details on the big two changes that really cut our pause times.
First, we noticed that a large fraction of the pause time was spent calling
free to reclaim the unused memory. We can’t do much to make freeing memory faster, but we realized we could do it on a separate thread. In Fx3.6, the main JS thread simply adds unused memory chunks to a queue, and another thread frees them during idle time or on a separate processor. This means machines with 2 or more cores will benefit more from this change. But even when one core, freeing might be delayed to an idle time when it will not affect scripts.
Second, we knew that in Fx3.5 running GC clears out all the native code compiled by the JIT as well as some other caches that speed up JS. The reason is that the tracing JIT and GC did not know about each other, so if the GC ran, it might reclaim objects being used by a compiled trace. The result was that immediately after a GC, JS ran a bit slower as the caches and compiled traces were built back up. This would be experienced as either an extended GC pause or a brief hiccup of slow animation right after the GC pause. In Fx3.6, we taught the GC and the JIT to work together, and now the GC does not clear caches or wipe out native code, so it resumes running normally right after GC.
In my article on TraceMonkey for the Fx3.5 release, I noted that certain code constructs, such as the
arguments object, were not traced and did not get performance improvements from the JIT. A major goal for JS in Fx3.6 was to trace more stuff, so more programs can run faster. We do trace more stuff now, in particular:
- DOM Properties. DOM objects are special and harder for the trace compiler to work with. For Fx3.5, we implemented tracing of DOM methods, but not DOM properties. Now we trace DOM properties (and other “native” C++ getters and setters) as well. We still do not trace scripted getters and setters.
- Closures. Fx3.5 traced only a few operations involving closures (by which I mean functions that refer to variables defined in lexically enclosing functions). Fx3.6 can trace more programs that use closures. The main operation that is still not traced yet is creating an anonymous function that modifies closure variables. But calling such a function and actually writing to the closure variables are traced.
arguments. We now trace most common uses of the
argumentskeyword. “Exotic” uses, such as setting elements of
arguments, are not traced.
switch. We have improved performance when tracing
switchstatements that use densely packed numeric case labels. These are particularly important for emulators and VMs.
These improvements are particularly important for jQuery and Dromaeo, which heavily use
Here is a pair of demos of new things we trace. The first sets a DOM property in a loop. The second calls a sum function implemented with
arguments I get a speedup of about 2x for both of them in Fx3.6 vs. Fx3.5.
Demo: Fx3.6 Tracing DOM properties and
DOM Property Set:
String and RegExp Improvements
Fx3.6 includes several improvements to string and regular expression performance. For example, the regexp JIT compiler now supports a larger class of regular expressions, including the ever-popular
w+. We also made some of our basic operations faster, like
search. Finally, we made concatenating sequences of several strings inside a function (a common operation in building up HTML or other kinds of textual output) much faster.
Technical aside on how we made string concatenation faster: The C++ function that concatenates two strings S1 and S2 does this: Allocate a buffer big enough to hold the result, then copy the characters of S1 and S2 into the buffer. To concatenate more than two strings, as in JS
s + "foo" + t, Fx3.5 simply concatenates two at a time from left to right.
Using the Fx3.5 algorithm, to concatenate N strings each of length K, we need to do N-1 memory allocations, and all but one of them are for temporary strings. Worse, the first two input strings are copied N-1 times, the next one is copied N-2 times, and so on. The total number of characters copied is K(N-1)(N+2)/2, which is O(N^2).
Clearly, we can do a lot better. The minimum work we can do is to copy each input string exactly once to the output string, for a total of KN characters copied. Fx3.6 achieves this by detecting sequences of concatenation in JS programs and combining the entire sequence into one operation that uses the optimal algorithm.
Here are a few string benchmarks you can try that are faster in Fx3.6:
Demo: Fx3.6 String Operations
Final Thoughts and Next Steps
We also made a lot of little improvements that don't fit into the big categories above. Most importantly, Adobe, Mozilla, Intel, Sun, and other contributors continue to improve nanojit, the compiler back-end used by TraceMonkey. We have improved its use of memory, made trace recording and compiling faster, and also improved the speed of the generated native code. A better nanojit gives a boost to all JS that runs in the JIT.
There are two big items that didn't make the cut for Fx3.6, but will be in the next version of Firefox and are already available in nightly builds:
- JITting recursion. Recursive code, like explicit looping code, is likely to be hot code, so it should be JITted. Nightly builds JIT directly recursive functions. Mutual recursion (g calls f calls g) is not traced yet.
- AMD x64 nanojit backend. Nanojit now has a backend that generates AMD x64 code, which gives the possibility of better performance on that plaform.
And if you try a nightly build, you'll find that many of these demos are already even faster than in Fx3.6!