Dave Mandelin from the JS team and Joe Drew from the Graphics team summarize the key performance improvements in Firefox 4.
The web wants fast browsers. Cutting-edge HTML5 web pages play games, mash up and share maps, sound, and videos, show spreadsheets and presentations, and edit photos. Only a high-performance browser can do that. What the web wants, it’s our job to make, and we’ve been working hard to make Firefox 4 fast.
Firefox 4 comes with performance improvements in almost every area. The most dramatic improvements are in JavaScript and graphics, which are critical for modern HTML5 apps and games. In the rest of this article, we’ll profile the key performance technologies and show how they make the web that much “more awesomer”.
Fast JavaScript: Uncaging the JägerMonkey
JavaScript is the programming language of the web, powering most of the dynamic content and behavior, so fast JavaScript is critical for rich apps and games. Firefox 4 gets fast JavaScript from a beast we call JägerMonkey. In techno-gobbledygook, JägerMonkey is a multi-architecture per-method JavaScript JIT compiler with 64-bit NaN-boxing, inline caching, and register allocation. Let’s break that down:
-
Multi-architecture
JägerMonkey has full support for x86, x64, and ARM processors, so we’re fast on both traditional computers and mobile devices. W00t!
(Crunchy technical stuff ahead: if you don’t care how it works, skip the rest of the sections.)
Per-method JavaScript JIT compilation
The basic idea of JägerMonkey is to translate (compile) JavaScript to machine code, “just in time” (JIT). JIT-compiling JavaScript isn’t new: previous versions of Firefox feature the TraceMonkey JIT, which can generate very fast machine code. But some programs can’t be “jitted” by TraceMonkey. JägerMonkey has a simpler design that is able to compile everything in exchange for not doing quite as much optimization. But it’s still fast. And TraceMonkey is still there, to provide a turbo boost when it can.
-
64-bit NaN-boxing
That’s the technical name for the new 64-bit formats the JavaScript engine uses to represent program values. These formats are designed to help the JIT compilers and tuned for modern hardware. For example, think about floating-point numbers, which are 64 bits. With the old 32-bit value formats, floating-point calculations required the engine to allocate, read, write, and deallocate extra memory, all of which is slow, especially now that processors are much faster than memory. With the new 64-bit formats, no extra memory is required, and calculations are much faster. If you want to know more, see the technical article Mozilla’s new JavaScript value representation.
-
Inline caching
Property accesses, like
o.p
, are common in JavaScript. Without special help from the engine, they are complicated, and thus slow: first the engine has to search the object and its prototypes for the property, next find out where the value is stored, and only then read the value. The idea behind inline caching is: “What if we could skip all that other junk and just read the value?” Here’s how it works: The engine assigns every object a shape that describes its prototype and properties. At first, the JIT generates machine code for o.p
that gets the property by laborious searching. But once that code runs, the JITs finds out what o‘s shape is and how to get the property. The JIT then generates specialized machine code that simply verifies that the shape is the same and gets the property. For the rest of the program, that o.p
runs about as fast as possible. See the technical article PICing on JavaScript for fun and profit for much more about inline caching.
-
Register allocation
Code generated by basic JITs spends a lot of time reading and writing memory: for code like
x+y
, the machine code first reads x
, then reads y
, adds them, and then writes the result to temporary storage. With 64-bit values, that’s up to 6 memory accesses. A more advanced JIT, such as JägerMonkey, generates code that tries to hold most values in registers. JägerMonkey also does some related optimizations, like trying to avoid storing values at all when they are constant or just a copy of some other value.
Here’s what JägerMonkey does to our benchmark scores:
That’s more than 3x improvement on SunSpider and Kraken and more than 6x on V8!
Fast Graphics: GPU-powered browsing.
For Firefox 4, we sped up how Firefox draws and composites web pages using the Graphics Processing Unit (GPU) in most modern computers.
On Windows Vista and Windows 7, all web pages are hardware accelerated using Direct2D . This provides a great speedup for many complex web sites and demo pages.
On Windows and Mac, Firefox uses 3D frameworks (Direct3D or OpenGL) to accelerate the composition of web page elements. This same technique is also used to accelerate the display of HTML5 video .
Final take
Fast, hardware-accelerated graphics combined plus fast JavaScript means cutting-edge HTML5 games, demos, and apps run great in Firefox 4. You see it on some of the sites we enjoyed making fast. There’s plenty more to try in the Mozilla Labs Gaming entries and of course, be sure to check out the Web O’ Wonder.
About Stormy Peters
Stormy Peters is Director of Websites and Developer Engagement at Mozilla. She is passionate about open source software and educates companies and communities on how open source software is changing the software industry. She is a compelling speaker who engages her audiences during and after her presentations and frequently speaks on business aspects of open source software.
33 comments