While Firefox 3.0 improved typographic rendering by introducing support for kerning, ligatures, and multiple weights along with support for rendering complex scripts, authors are still limited to using commonly available fonts in their designs. Firefox 3.5 removes this restriction by introducing support for the CSS @font-face rule, a way of linking to TrueType and OpenType fonts just as code and images are linked to today. Safari has supported this type of font linking since version 3.1, and Opera has announced that they plan to support it in Opera 10.
Using @font-face for font linking is relatively straightforward. Within a stylesheet, each @font-face rule defines a family name to be used, the font resource to be loaded, and the style characteristics of a given face such as whether it’s bold or italic. Firefox 3.5 only downloads the fonts as needed, so a stylesheet can list a whole set of fonts of which only a select few will actually be used.
/* Graublau Sans Web (www.fonts.info) */@font-face {font-family: Graublau Sans Web;
src:url(GraublauWeb.otf) format("opentype");}
body {font-family: Graublau Sans Web, Lucida Grande,sans-serif;}
Browsers that support @font-face will render text using Graublau Sans Web while older browsers will render text using either Lucida Grande or the default sans-serif face. Example here:
Digging A Little Deeper
Most font families today consist of only four faces: regular, bold, italic and bold italic. To define each of these faces the font-weight and font-style descriptors are used. These define the style of the face; there’s no concept of a cascade or inheritance that applies here. Without an explicit definition each of these defaults to a value of ‘normal’:
/* Gentium by SIL International *//* http://scripts.sil.org/gentium */@font-face {font-family: Gentium;
src:url(Gentium.ttf);/* font-weight, font-style ==> default to normal */}@font-face {font-family: Gentium;
src:url(GentiumItalic.ttf);font-style:italic;}
body {font-family: Gentium, Times New Roman,serif;}
The sample text below when rendered with this font family:
A feature that’s easy to overlook is that @font-face allows the creation of families with more than just regular and bold faces — up to nine weights can be defined for a single family. This is true even on Windows, where underlying platform limitations usually restrict font families to just regular and bold weights. Fonts such as those of the Japanese open source M+ Fonts project have as many as seven weights. A selection of these are used in the sample below:
In some situations, authors may prefer to use fonts available locally and only download fonts when those faces aren’t available. This is possible with the use of local() in the definition of the src descriptor of an @font-face rule. The browser will iterate over the list of fonts in the src descriptor until it successfully loads one.
/* MgOpen Moderna *//* http://www.zvr.gr/typo/mgopen/index */@font-face {font-family: MyHelvetica;
src: local("Helvetica Neue"),
local("HelveticaNeue"),url(MgOpenModernaRegular.ttf);}@font-face {font-family: MyHelvetica;
src: local("Helvetica Neue Bold"),
local("HelveticaNeue-Bold"),url(MgOpenModernaBold.ttf);font-weight:bold;}
body {font-family: MyHelvetica,sans-serif;}
The screenshot below shows from top to bottom the results on Mac OS X, Windows and Linux for a simple testcase that uses the font family defined above:
The Helvetica Neue font family is available on most Mac OS X systems but generally on neither Windows nor Linux machines. When the example here is rendered on Mac OS X, the local Helvetica Neue faces are used and no font is downloaded. Under Windows and Linux the attempt to load a local font fails and a substitute font — MgOpen Moderna — is downloaded and used instead. MgOpen Moderna is designed to be a substitute for Helvetica, so it renders similarly to Helvetica Neue. This way, an author can guarantee text appearance but avoid a font download when it’s unnecessary.
The name used to refer to a specific font face is the full name for an individual font. Generally it’s the family name plus the style name (e.g. “Helvetica Bold”). Under Mac OS X, the name is listed in the information panel for a given face. Select a single face and choose ‘Show Font Info’ from the Preview menu in FontBook:
Similar tools exist under Linux. On Windows, use the font properties extension, a free download from Microsoft to view these names. With the extension installed, the properties panel shows information about an individual font. The full name is referred to as the “Font Name” on the Name tab:
Safari only supports PostScript name lookup under Mac OS X, so under Mac OS X Postscript names are also supported. For OpenType PS fonts (often labeled with an .otf extension) the full name is the same as the PostScript name under Windows. So for these fonts, authors are advised to include both the full name and the PostScript name for cross-platform interoperability.
Supporting Many Languages
Many languages suffer from a lack of commonly available fonts. For minority languages and ancient scripts, the options often dwindle to just a handful. The use of @font-face allows authors using these languages to ameliorate this by including fonts with their pages.
@font-face {font-family: Scheherazade;
src:url(fonts/ScheherazadeRegAAT.ttf) format("truetype-aat"),url(fonts/ScheherazadeRegOT.ttf) format("opentype");}
body {font-family: Scheherazade,serif;}
Languages such as Arabic require text shaping, where the display of a given character is affected by the characters surrounding it. Different platforms support different rendering technologies to enable text shaping; under Mac OS X, AAT fonts are required while under Windows and Linux OpenType fonts are required. Without a font in a format required for a given platform, text shaping will not be rendered correctly.
Under Mac OS X, the AAT version of the font is downloaded. On Windows and Linux, where rendering with AAT fonts is not supported, the download of the AAT font is skipped and the OpenType font is used instead. Hence, the text is rendered correctly on all platforms.
Cross-Site Font Usage
By default, Firefox 3.5 only allows fonts to be loaded for pages served from the same site. This prevents sites from freely using fonts found on other sites. For sites that explicitly want to allow cross-site font sharing, an online font library for example, Firefox 3.5 supports the use of access control headers to control this behavior. By adding an additional header to the HTTP headers sent with font files, sites can enable cross-site usage.
# example Apache .htaccess file to add access control header
<FilesMatch "\.(ttf|otf)$">
<IfModule mod_headers.c>
Header set Access-Control-Allow-Origin "*"
</IfModule>
</FilesMatch>
With this HTTP header enabled, any page can access the fonts on this site, not just pages from the same site.
Font Licensing Issues
When using a font for a website, it’s important to always confirm that the font license permits use on a website. If the license agreement is filled with opaque legalese, err on the side of caution and check with the font vendor before using a font on a site. If the license permits its use, it’s a good idea to add a comment near the @font-face rules that points to the license, for future reference.
“I found a free font, can I use it on my site?”
Maybe, maybe not. Some free fonts are distributed as teaser products to encourage a purchase and don’t allow for redistribution or posting on a web server. Check the license, even for free fonts.
“I just want to use [insert favorite font name here] on my site. Is that possible?”
Right now, probably not. The use of font linking on the web is still in its infancy. Most fonts that ship with proprietary OS’s today have licenses that limit their use to standard desktop applications, so uploading these fonts to a web server is almost certainly not permitted. Piracy has plagued the font industry in the past, so most font vendors are wary of allowing their fonts to be used except in relatively limited contexts. Many font vendors are focused on the needs of publishing and printing industries, and the relative complexity of their license agreements often reflects this. In the future, some font designers may conclude that the sales of fonts as web fonts will outweigh any potential loss in sales due to piracy, others may not. Their license agreements will reflect this choice and should be respected.
The stock photography market is often described as a $2 billion market but the web font market is still close to a $0 market, it can only go up!
Font Linking In Internet Explorer
Font linking has been possible with Internet Explorer but only for linking to fonts in the proprietary EOT font format. The only way to create EOT fonts is to use the Microsoft WEFT tool, available only on Windows. Only TrueType and OpenType TT fonts can be converted to EOT format, OpenType PS (.otf) fonts cannot be used.
Internet Explorer only recognizes the font-family and src descriptors within @font-face rules, so each family can only contain a single face. It doesn’t understand format() hints and will ignore any @font-face rule containing these hints. This behavior can be used enable font linking cross platform:
/* Font definition for Internet Explorer *//* (*must* be first) */@font-face {font-family: Gentium;
src:url(Gentium.eot)/* can't use format() */;}/* Font definition for other browsers */@font-face {font-family: Gentium;
src:url(Gentium.ttf) format("opentype");}
Future Work
For Firefox 3.5, the font-stretch and unicode-range descriptors are not supported. Fonts defined in SVG documents are also not supported yet. These are under consideration for inclusion in future releases. As always, patches are welcome!
Most images on the web are untagged. If you don’t know the difference between tagged images and untagged images the odds are good are you won’t notice this change. However, we suggest that you read on to learn about what it might mean for you if you want to include them and how future Firefox releases might change the interactions between CSS colors and images.
What’s a color profile?
People who spend a lot of time taking photographs or any kind of high-resolution color printing will understand that many output devices – LCDs, CRTs, paper etc – all have very different interpretations of what various colors mean. For example, uncorrected red will look very different on an LCD vs. a CRT. You can see this if you set up two very different monitors next to each other and the operating system doesn’t do color correction – colors will look more washed out in one of them vs. the other.
JPG and PNG images both have support for color profiles. These color profiles allow Firefox to take colors in the image and translate them into colors that are independent of any particular device.
While images contain color profiles it’s also important to note that output devices like monitors also have color profiles. As an example an output device may be better at displaying reds than blues. When you’re getting ready to show something that’s “red” on that device it might need to be converted from the neutral #ff0000 value to #f40301 in order to show up as red on the screen.
What this means is that there are actually two conversions that take place with color profiles. The first is to take the original color information in the image and, using the color profile, convert it into a device-independent color space. Then once it’s in that independent space you convert it again using the output device’s color profile to get it ready to display on the output device.
So what about CSS colors?
It’s important to understand how color profiles work and how they are converted to properly understand how CSS interacts with these color spaces.
In Firefox 3.5 we consider CSS colors to already be in the device output’s color space. Another way of saying this is that CSS colors are not in the neutral color space and are not converted into the output device like tagged images are.
What this means is that if you have a tagged image where a color is expected to match the CSS that’s right next to it, it won’t. Or at least it’s likely that it won’t on some output device – maybe not the one that you happen to be using for development. Remember that different output devices have different color profiles. Here’s an example of what that looks like:
In Firefox 3, this will look like one contiguous block of purple. In Firefox 3.5 and Safari you will notice that there’s a purple box within a purple box (unless your system profile is sRGB.) This is because the image is color corrected while the surrounding CSS is not.
This is where the statement about the future comes in. In a future release of Firefox we are likely to make it possible for people to turn on color correction for tagged images and CSS. You can test this setting today by changing the pref listed on the page on color correction on the Mozilla Developer Center to “Full color management.” In that case untagged images should continue to work as we will be rendering both CSS and untagged images in the sRGB color space.
Image support and tools
PNG’s can be tagged in three different ways. First they can have an iCCP chunk that contains the associated ICC profile. Second they can be explicitly tagged as sRGB using a sRGB chunk. Finally, they can contain gAMA and cHRM chunks that describe the image’s gamma and chromaticies. Using any of thse methods will cause Firefox to color correct the image.
You can remove all of the color correction chunks resulting in an untagged image using pngcrush:
Over the next 35 days we’ll be talking about all of the new developer features in Firefox 3.5.
The upcoming release of Firefox 3.5 is a big upgrade for users. It includes new privacy features, improvements in interactive performance and a new JavaScript engine that will improve the experience for users using script-heavy web sites. These are all features that users will appreciate and talk about.
But what Firefox 3.5 really represents is a huge upgrade to the web itself. We’ve included support for audio, video, threads in JavaScript, new canvas features, tons of new CSS features, downloadable fonts and geolocation. While Firefox 3 was a signifigant upgrade for the web’s users, Firefox 3.5 does the same for developers.
Because of the sheer volume of new features for developers in Firefox 3.5 we’ll be spending the next 35 days featuring all of them, with two posts a day, every day. Each day’s first post will describe one new feature in Firefox in depth – how to use it, why it’s there and some examples of what it can do for you. The second post will be all fun – examples of demos that people have put together using all of the features that are in Firefox 3.5.
Like much of what we do at Mozilla this is a community-led project. We’ve reached out to members of the Mozilla and web developer community for posts and demos. Some posts will be written by core Mozilla hackers, but many of them will be written by people who are just excited about the chance to use this technology in the wild.
My personal belief is that these posts represent Mozilla’s dreams for the future of the web. We are not a company like every other browser vendor – our core mission is to improve and protect the web. Firefox 3.5 represents what we think the next stage of the web will be – interactive applications, video, web sites collaborating and sharing information – all via the browser. Our hope is that each of you will take what we post here, encourage other browser vendors to implement similar features where they haven’t already, and do your part to keep the web vibrant.
This post was written by David Mandelin who works on Mozilla’s JavaScript team.
Firefox 3.5 has a new JavaScript engine, TraceMonkey, that runs many JavaScript programs 3-4x faster than Firefox 3, speeding up existing web apps and enabling new ones. This article gives a peek under the hood at the major parts of TraceMonkey and how they speed up JS. This will also explain what kinds of programs get the best speedup from TraceMonkey and what kinds of things you can do to get your program to run faster.
Why it’s hard to run JS fast: dynamic typing
High-level dynamic languages such as JavaScript and Python make programming more productive, but they have always been slow compared to statically typed languages like Java or C. A typical rule of thumb was that a JS program might be 10x slower than an equivalent Java program.
There are two main reasons JS and other dynamic scripting languages usually run slower than Java or C. The first reason is that in dynamic languages it is generally not possible to determine the types of values ahead of time. Because of this, the language must store all values in a generic format and process values using generic operations.
In Java, by contrast, the programmer declares types for variables and methods, so the compiler can determine the types of values ahead of time. The compiler can then generate code that uses specialized formats and operations that run much faster than generic operations. I will call these type-specialized operations.
The second main reason that dynamic languages run slower is that scripting languages are usually implemented with interpreters, while statically typed languages are compiled to native code. Interpreters are easier to create, but they incur extra runtime overhead for tracking their internal state. Languages like Java compile to machine language, which requires almost no state tracking overhead.
Let’s make this concrete with a picture. Here are the slowdowns in picture form for a simple numeric add operation: a + b, where a and b are integers. For now, ignore the rightmost bar and focus on the comparison of the Firefox 3 JavaScript interpreter vs. a Java JIT. Each column shows the steps that have to be done to complete the add operation in each programming language. Time goes downward, and the height of each box is proportional to the time it takes to finish the steps in the box.
In the middle, Java simply runs one machine language add instruction, which runs in time T (one processor cycle). Because the Java compiler knows that the operands are standard machine integers, it can use a standard integer add machine language instruction. That’s it.
On the left, SpiderMonkey (the JS interpreter in FF3) takes about 40 times as long. The brown boxes are interpreter overhead: the interpreter must read the add operation and jump to the interpreter’s code for a generic add. The orange boxes represent extra work that has to be done because the interpreter doesn’t know the operand types. The interpreter has to unpack the generic representations of a and i, figure out their types, select the specific addition operation, convert the values to the right types, and at the end, convert the result back to a generic format.
The diagram shows that using an interpreter instead of a compiler is slowing things down a little bit, but not having type information is slowing things down a lot. If we want JS to run more than a little faster than in FF3, by Amdahl’s law, we need to do something about types.
Getting types by tracing
Our goal in TraceMonkey is to compile type-specialized code. To do that, TraceMonkey needs to know the types of variables. But JavaScript doesn’t have type declarations, and we also said that it’s practically impossible for a JS engine to figure out the types ahead of time. So if we want to just compile everything ahead of time, we’re stuck.
So let’s turn the problem around. If we let the program run for a bit in an interpreter, the engine can directly observe the types of values. Then, the engine can use those types to compile fast type-specialized code. Finally, the engine can start running the type-specialized code, and it will run much faster.
There are a few key details about this idea. First, when the program runs, even if there are many if statements and other branches, the program always goes only one way. So the engine doesn’t get to observe types for a whole method — the engine observes types through the paths, which we call traces, that the program actually takes. Thus, while standard compilers compile methods, TraceMonkey compiles traces. One side benefit of trace-at-a-time compilation is that function calls that happen on a trace are inlined, making traced function calls very fast.
Second, compiling type-specialized code takes time. If a piece of code is going to run only one or a few times, which is common with web code, it can easily take more time to compile and run the code than it would take to simply run the code in an interpreter. So it only pays to compile hot code (code that is executed many times). In TraceMonkey, we arrange this by tracing only loops. TraceMonkey initially runs everything in the interpreter, and starts recording traces through a loop once it gets hot (runs more than a few times).
Tracing only hot loops has an important consequence: code that runs only a few times won’t speed up in TraceMonkey. Note that this usually doesn’t matter in practice, because code that runs only a few times usually runs too fast to be noticeable. Another consequence is that paths through a loop that are not taken at all never need to be compiled, saving compile time.
Finally, above we said that TraceMonkey figures out the types of values by observing execution, but as we all know, past performance does not guarantee future results: the types might be different the next time the code is run, or the 500th next time. And if we try to run code that was compiled for numbers when the values are actually strings, very bad things will happen. So TraceMonkey must insert type checks into the compiled code. If a check doesn’t pass, TraceMonkey must leave the current trace and compile a new trace for the new types. This means that code with many branches or type changes tends to run a little slower in TraceMonkey, because it takes time to compile the extra traces and jump from one to another.
TraceMonkey in action
Now, we’ll show tracing in action by example on this sample program, which adds the first N whole numbers to a starting value:
function addTo(a, n){for(var i =0; i < n;++i)
a = a + i;return a;}var t0 =new Date();var n = addTo(0,10000000);print(n);print(new Date()- t0);
TraceMonkey always starts running the program in the interpreter. Every time the program starts a loop iteration, TraceMonkey briefly enters monitoring mode to increment a counter for that loop. In FF3.5, when the counter reaches 2, the loop is considered hot and it’s time to trace.
Now TraceMonkey continues running in the interpreter but starts recording a trace as the code runs. The trace is simply the code that runs up to the end of the loop, along with the types used. The types are determined by looking at the actual values. In our example, the loop executes this sequence of JavaScript statements, which becomes our trace:
a = a + i;// a is an integer number (0 before, 1 after)++i;// i is an integer number (1 before, 2 after)if(!(i < n))// n is an integer number (10000000)break;
That’s what the trace looks like in a JavaScript-like notation. But TraceMonkey needs more information in order to compile the trace. The real trace looks more like this:
trace_1_start:++i;// i is an integer number (0 before, 1 after)
temp = a + i;// a is an integer number (1 before, 2 after)if(lastOperationOverflowed())
exit_trace(OVERFLOWED);
a = temp;if(!(i < n))// n is an integer number (10000000)
exit_trace(BRANCHED);goto trace_1_start;
This trace represents a loop, and it should be compiled as a loop, so we express that directly using a goto. Also, integer addition can overflow, which requires special handling (for example, redoing with floating-point addition), which in turn requires exiting the trace. So the trace must include an overflow check. Finally, the trace exits in the same way if the loop condition is false. The exit codes tell TraceMonkey why the trace was exited, so that TraceMonkey can decide what to do next (such as whether to redo the add or exit the loop). Note that traces are recorded in a special internal format that is never exposed to the programmer — the notation used above is just for expository purposes.
After recording, the trace is ready to be compiled to type-specialized machine code. This compilation is performed by a tiny JIT compiler (named, appropriately enough, nanojit) and the results are stored in memory, ready to be executed by the CPU.
The next time the interpreter passes the loop header, TraceMonkey will start executing the compiled trace. The program now runs very fast.
On iteration 65537, the integer addition will overflow. (2147450880 + 65537 = 2147516417, which is greater than 2^31-1 = 2147483647, the largest signed 32-bit integer.) At this point, the trace exits with an OVERFLOWED code. Seeing this, TraceMonkey will return to interpreter mode and redo the addition. Because the interpreter does everything generically, the addition overflow is handled and everything works as normal. TraceMonkey will also start monitoring this exit point, and if the overflow exit point ever becomes hot, a new trace will be started from that point.
But in this particular program, what happens instead is that the program passes the loop header again. TraceMonkey knows it has a trace for this point, but TraceMonkey also knows it can’t use that trace because that trace was for integer values, but a is now in a floating-point format. So TraceMonkey records a new trace:
trace_2_start:++i;// i is an integer number
a = a + i;// a is a floating-point numberif(!(i < n))// n is an integer number (10000000)
exit_trace(BRANCHED);goto trace_2_start;
TraceMonkey then compiles the new trace, and on the next loop iteration, starts executing it. In this way, TraceMonkey keeps the JavaScript running as machine code, even when types change. Eventually the trace will exit with a BRANCHED code. At this point, TraceMonkey will return to the interpreter, which takes over and finishes running the program.
Here are the run times for this program on my laptop (2.2GHz MacBook Pro):
System
Run Time (ms)
SpiderMonkey (FF3)
990
TraceMonkey (FF3.5)
45
Java (using int)
25
Java (using double)
74
C (using int)
5
C (using double)
15
This program gets a huge 22x speedup from tracing and runs about as fast as Java! Functions that do simple arithmetic inside loops usually get big speedups from tracing. Many of the bit operation and math SunSpider benchmarks, such bitops-3bit-bits-in-byte, ctrypto-sha1, and math-spectral-norm get 6x-22x speedups.
Functions that use more complex operations, such as object manipulation, get a smaller speedup, usually 2-5x. This follows mathematically from Amdahl’s law and the fact that complex operations take longer. Looking back at the time diagram, consider a more complex operation that takes time 30T for the green part. The orange and brown parts will still be about 30T together, so eliminating them gives a 2x speedup. The SunSpider benchmark string-fasta is an example of this kind of program: most of the time is taken by string operations that have a relatively long time for the green box.
Here is a version of our example program you can try in the browser:
Numerical result:
Run time:
Average run time:
Understanding and fixing performance problems
Our goal is to make TraceMonkey reliably fast enough that you can write your code in the way that best expresses your ideas, without worrying about performance. If TraceMonkey isn’t speeding up your program, we hope you’ll report it as a bug so we can improve the engine. That said, of course, you may need your program to run faster in today’s FF3.5. In this section, we’ll explain some tools and techniques for fixing performance of a program that doesn’t get a good (2x or more) speedup with the tracing JIT enabled. (You can disable the jit by going to about:config and setting the pref javascript.options.jit.content to false.)
The first step is understanding the cause of the problem. The most common cause is a trace abort, which simply means that TraceMonkey was unable to finish recording a trace, and gave up. The usual result is that the loop containing the abort will run in the interpreter, so you won’t get a speedup on that loop. Sometimes, one path through the loop is traced, but there is an abort on another path, which will cause TraceMonkey to switch back and forth between interpreting and running native code. This can leave you with a reduced speedup, no speedup, or even a slowdown: switching modes takes time, so rapid switching can lead to poor performance.
With a debug build of the browser or a JS shell (which I build myself – we don’t publish these builds) you can tell TraceMonkey to print information about aborts by setting the TMFLAGS environment variable. I usually do it like this:
The minimal option prints out all the points where recording starts and where recording successfully finishes. This gives a basic idea of what the tracer is trying to do. The abort option prints out all the points where recording was aborted due to an unsupported construct. (Setting TMFLAGS=help will print the list of other TMFLAGS options and then exit.)
(Note also that TMFLAGS is the new way to print the debug information. If you are using a debug build of the FF3.5 release, the environment variable setting is TRACEMONKEY=abort.)
Here’s an example program that doesn’t get much of a speedup in TraceMonkey.
function runExample2(){var t0 =new Date;var sum =0;for(var i =0; i <100000;++i){
sum += i;}var prod =1;for(var i =1; i <100000;++i){eval("prod *= i");}var dt =new Date - t0;
document.getElementById(example2_time').innerHTML = dt + ' ms';
}
Run time:
If we set TMFLAGS=minimal,abort, we’ll get this:
Recording starting from ab.js:5@23
recording completed at ab.js:5@23 via closeLoop
Recording starting from ab.js:5@23
recording completed at ab.js:5@23 via closeLoop
Recording starting from ab.js:10@63
Abort recording of tree ab.js:10@63 at ab.js:11@70: eval.
Recording starting from ab.js:10@63
Abort recording of tree ab.js:10@63 at ab.js:11@70: eval.
Recording starting from ab.js:10@63
Abort recording of tree ab.js:10@63 at ab.js:11@70: eval.
The first two pairs of lines show that the first loop, starting at line 5, traced fine. The following lines showed that TraceMonkey started tracing the loop on line 10, but failed each time because of an eval.
An important note about this debug output is that you will typically see some messages referring to inner trees growing, stabilizing, and so on. These really aren’t problems: they usually just indicate a delay in finishing tracing a loop because of the way TraceMonkey links inner and outer loops. And in fact, if you look further down the output after such aborts, you will usually see that the loops eventually do trace.
Otherwise, aborts are mainly caused by JavaScript constructs that are not yet supported by tracing. The trace recording process is easier to implement for a basic operation like + than it is for an advanced feature like arguments. We didn’t have time to do robust, secure tracing of every last JavaScript feature in time for the FF3.5 release, so some of the more advanced ones, like arguments, aren’t traced in FF3.5.0. Other advanced features that are not traced include getters and setters, with, and eval. There is partial support for closures, depending on exactly how they are used. Refactoring to avoid these constructs can help performance.
Two particularly important JavaScript features that are not traced are:
Recursion. TraceMonkey doesn’t see repetition that occurs through recursion as a loop, so it doesn’t try to trace it. Refactoring to use explicit for or while loops will generally give better performance.
Getting or setting a DOM property. (DOM method calls are fine.) Avoiding these constructs is generally impossible, but refactoring the code to move DOM property access out of hot loops and performance-critical segments should help.
We are actively working on tracing all the features named above. For example, support for tracing arguments is already available in nightly builds.
Here is the slow example program refactored to avoid eval. Of course, I could have simply done the multiplication inline. Instead, I used a function created by eval because that’s a more general way of refactoring an eval. Note that the eval still can’t be traced, but it only runs once so it doesn’t matter.
function runExample3(){var t0 =new Date;var sum =0;for(var i =0; i <100000;++i){
sum += i;}var prod =1;var mul =eval("(function(i) { return prod * i; })");for(var i =1; i <100000;++i){
prod = mul(i);}var dt =new Date - t0;
document.getElementById('example3_time').innerHTML= dt +' ms';}
Run time:
There are a few more esoteric situations that can also hurt tracing performance. One of them is trace explosion, which happens when a loop has many paths through it. Consider a loop with 10 if statements in a row: the loop has 1024 paths, potentially causing 1024 traces to be recorded. That would use up too much memory, so TraceMonkey caps each loop at 32 traces. If the loop has fewer than 32 hot traces, it will perform well. But if each path occurs with equal frequency, then only 3% of the paths are traced, and performance will suffer.
This kind of problem is best analyzed with TraceVis, which creates visualizations of TraceMonkey performance. Currently, the build system only supports enabling TraceVis for shell builds, but the basic system can also run in the browser, and there is ongoing work to enable TraceVis in a convenient form in the browser.
The blog post on TraceVis is currently the most detailed explanation of what the diagrams mean and how to use them to diagnose performance problems. The post also contains a detailed analysis of a diagram that is helpful in understanding how TraceMonkey works in general.
Comparative JITerature
Here I will give a few comparisons to other JavaScript JIT designs. I’ll focus more on hypothetical designs than competing engines, because I don’t know details about them — I’ve read the release information and skimmed a few bits of code. Another big caveat is that real-world performance depends at least as much on engineering details as it does on engine architecture.
One design option could be a called a per-method non-specializing JIT. By this, I mean a JIT compiler that compiles a method at a time and generates generic code, just like what the interpreter does. Thus, the brown boxes from our diagrams are cut out. This kind of JIT doesn’t need to take time to record and compile traces, but it also does not type-specialize, so the orange boxes remain. Such an engine can still be made pretty fast by carefully designing and optimizing the orange box code. But the orange box can’t be completely eliminated in this design, so the maximum performance on numeric programs won’t be as good as a type-specializing engine.
As far as I can tell, as of this writing Nitro and V8 are both lightweight non-specializing JITs. (I’m told that V8 does try to guess a few types by looking at the source code (such as guessing that a is an integer in a >> 2) in order to do a bit of type specialization.) It seem that TraceMonkey is generally faster on numeric benchmarks, as predicted above. But TraceMonkey suffers a bit on benchmarks that use more objects, because our object operations and memory management haven’t been optimized as heavily.
A further development of the basic JIT is the per-method type-specializing JIT. This kind of a JIT tries to type-specialize a method based on the argument types the method is called with. Like TraceMonkey, this requires some runtime observation: the basic design checks the argument types each time a method is called, and if those types have not been seen before, compiles a new version of the method. Also like TraceMonkey, this design can heavily specialize code and remove both the brown and orange boxes.
I’m not aware that anyone has deployed a per-method type-specializing JIT for JavaScript, but I wouldn’t be surprised if people are working on it.
The main disadvantage of a per-method type-specializing JIT compared to a tracing JIT is that the basic per-method JIT only directly observes the input types to a method. It must try to infer types for variables inside the method algorithmically, which is difficult for JavaScript, especially if the method reads object properties. Thus, I would expect that a per-method type-specializing JIT would have to use generic operations for some parts of the method. The main advantage of the per-method design is that the method needs to be compiled exactly once per set of input types, so it’s not vulnerable to trace explosion. In turn, I think a per-method JIT would tend to be faster on methods that have many paths, and a tracing JIT would tend to be faster on highly type-specializable methods, especially if the method also reads a lot of values from properties.
Conclusion
By now, hopefully you have a good idea of what makes JavaScript engines fast, how TraceMonkey works, and how to analyze and fix some performance issues that may occur running JavaScript under TraceMonkey. Please report bugs if you run into any significant performance problems. Bug reports are also a good place for us to give additional tuning advice. Finally, we’re trying to improve constantly, so check out nightly TraceMonkey builds if you’re into the bleeding edge.
One of our goals with Firefox 3.5 is to help upgrade the web. Over the lifecycle of this release we’ve invested heavily in developer features. One of the features that we’ve invested in is TraceMonkey – a tracing interpreter that turns commonly-run JavaScript code into machine code so that it can run at near-native speeds. We consider it to be both an end user feature because it makes existing web applications faster as well as a developer feature because of the new kinds of applications it enables.
We’re always challenged to try and come up with ways to describe what that means in a way that’s not a dry benchmark. How can we explain what it feels like?
We’ve made a video to help describe both what it means by the numbers, but also shows what it feels like. If you want to try the demo we suggest you try it in both Firefox 3 and Firefox 3.5. It’s something you can really feel.
I’d like to show an example of a visual effect that can be accomplished using the new -moz-transform CSS transformation property that is available in the Firefox 3.5 browser. I was very excited to see this feature added to Firefox because I knew it would allow me to produce a faux-3D isometric effect, also sometimes called 2.5D. I created a demo where HTML content is mapped onto the faces of a “3D” isometric cube:
The -moz-transform property can take a list of CSS transform functions including rotate, scale, skew, and translate. Or, multiple transformations can be done simultaneously using a single 2D affine transformation matrix. Because all of the CSS transformation functions work in two dimensions, true 3D transformations with perspective projection cannot yet be accomplished. In this demo, I use the rotate and skew transformation functions in order to create the isometric effect.
First, I define a container div for the cube, and then a square div for the top, left, and right faces of the cube.
Without getting too wrapped up in isometric projection math, we have to rotate and skew the cube faces in order to turn each square div into a parallelogram where the angles of the vertices are 60° or 120°. Here are what those transformations look like using the new -moz-transform CSS property:
Now all that is left to do is to use absolute positioning to connect each transformed div in order to form the faces of the isometric cube. You can use matrix math to do this, or you could just move the faces around until they line up and it looks right.
To add to the 3D effect, I’ve shaded the isometric cube by assigning different colors to the faces. I also gave the cube a shadow. You can see that the shadow is basically just a copy of the top face of the cube moved down to the bottom left, and then I gave the shadow div a black background color and set opacity: 0.5; in the CSS to make it filter over the page background.
Any HTML, such as the text and form buttons in this example, can be put inside the div for any face of the cube. It will be translated into the correct perspective. Christopher Blizzard gave me the idea to throw video onto one side of the cube using the new HTML5 video tag now supported in Firefox 3.5. As you can see, it works great.
Finally, by making a copy of the cube markup and using two more of the CSS transformation functions, I was able to create a second cube. I made the cube 50% as big using a scale(0.5) transformation, and I moved it into place by using translate(600px, 400px).
I hope you found this demo to be interesting and that it opened up your mind to some exciting possibilities that Firefox 3.5 CSS transformations bring to the web.
CSS transformations are also supported by Safari 3.1+ and Chrome using the -webkit-transform CSS property.
In a previous post on this blog we talked about using JavaScript to create video elements on the fly. While that was a good use case for the Mozilla’s support site in this post we offer another set of methods that will likely find more use on the web. In fact you can see it in use on Mozilla’s What’s New in Firefox 3.5 video, this blog and we’re starting to see pickup on the web as a whole.
The examples here should work with Safari 4, Firefox 3.5 and IE using a flash fallback. If you’re looking for a complete example that includes support for the video element, quicktime, windows media, iPhone support, and finally flash support you might try checking out Video for Everybody. It’s been tested with a huge pile of browsers in different configurations and is a good start point if you’re looking for support for everything under the universe.
You might also check out Michael Verdi’s video tags with fallbacks article where he uses a much simpler, but somewhat easier to understand method.
A simple example
The video tag looks something like this in a web page:
<video src="zombie.ogg" controls></video>
What this does is insert a video element into your page. Firefox 3.5 will load the video, determine its size and re-size the element to the natural size of the video.
Firefox 3.5 supports the Ogg Theora video format, a free and open standard for video. Opera and Google Chrome will also include support for Theora in later versions. Safari 4 can also support the same format when using the Xiph Quicktime component but is not guaranteed to be present. Apple has licensed the mpeg4 format for use in Safari 4 and it is supported by default.
If you want to support both formats you need to be able to provide more than one type of format. You also need to change the markup to tell the browser about the types of the files, which order they are preferred and then let the browser choose which one it should use to display the video element. For our example the code would look something like this:
In this case the browser will first check to see if it supports the video/ogg video type. If it does, it will use that and load it. If it doesn’t it will move on to the next entry, the mp4 file and use it if it’s supported.
While most modern browsers have specific plans to support the video element, Microsoft has not made their intentions clear. So in order to support IE users into the near future you should provide fallbacks until Microsoft adopts this part of the new HTML5 standard. For our example we’ll use a simple flash fallback.
One nice thing about the video element is how easily it degrades to older browsers. If in the above example your browser doesn’t support the video tag and doesn’t support the source tag it will simple fall back to whatever happened to be included inside them. This makes it very easy to support fallbacks in a very easy way. Here’s what a flash fallback would look like that uses blip.tv:
And that’s it. This example supports all browsers, degrades nicely and helps to move the web forward all in one blow. HTML5 isn’t finished, but web developers can certainly start taking advantage of the features that it provides in modern browsers today.
This post counts as both a demo and commentary about the changing nature of typography on the web. Ian Lynam and Craig Mod have put together a page that is an excellent example of typography in action, but also offer some suggestions on what the next steps for typography on the web might look like. The page itself takes advantage of a number of typefaces that Craig and Ian got permission to use and uses a pleasing multi-column layout. Please click through to the complete article to get the full effect.
XMLHttpRequest is used within many Ajax libraries, but till the release of browsers such as Firefox 3.5 and Safari 4 has only been usable within the framework of the same-origin policy for JavaScript. This meant that a web application using XMLHttpRequest could only make HTTP requests to the domain it was loaded from, and not to other domains. Developers expressed the desire to safely evolve capabilities such as XMLHttpRequest to make cross-site requests, for better, safer mash-ups within web applications. The Cross-Origin Resource Sharing (CORS) specification consists of a simple header exchange between client-and-server, and is used by IE8′s proprietary XDomainRequest object as well as by XMLHttpRequest in browsers such as Firefox 3.5 and Safari 4 to make cross-site requests. These browsers make it possible to make asynchronous HTTP calls within script to other domains, provided the resources being retrieved are returned with the appropriate CORS headers.
A Quick Overview of CORS
Firefox 3.5 and Safari 4 implement the CORS specification, using XMLHttpRequest as an “API container” that sends and receives the appropriate headers on behalf of the web developer, thus allowing cross-site requests. IE8 implements part of the CORS specification, using XDomainRequest as a similar “API container” for CORS, enabling simple cross-site GET and POST requests. Notably, these browsers send the ORIGIN header, which provides the scheme (http:// or https://) and the domain of the page that is making the cross-site request. Server developers have to ensure that they send the right headers back, notably the Access-Control-Allow-Origin header for the ORIGIN in question (or ” * ” for all domains, if the resource is public) .
The CORS standard works by adding new HTTP headers that allow servers to serve resources to permitted origin domains. Browsers support these headers and enforce the restrictions they establish. Additionally, for HTTP request methods that can cause side-effects on user data (in particular, for HTTP methods other than GET, or for POST usage with certain MIME types), the specification mandates that browsers “preflight” the request, soliciting supported methods from the server with an HTTP OPTIONS request header, and then, upon “approval” from the server, sending the actual request with the actual HTTP request method. Servers can also notify clients whether “credentials” (including Cookies and HTTP Authentication data) should be sent with requests.
Capability Detection
XMLHttpRequest can make cross-site requests in Firefox 3.5 and in Safari 4; cross-site requests in previous versions of these browsers will fail. It is always possible to try to initiate the cross-site request first, and if it fails, to conclude that the browser in question cannot handle cross-site requests from XMLHttpRequest (based on handling failure conditions or exceptions, e.g. not getting a 200status code back). In Firefox 3.5 and Safari 4, a cross-site XMLHttpRequest will not successfully obtain the resource if the server doesn’t provide the appropriate CORS headers (notably the Access-Control-Allow-Origin header) back with the resource, although the request will go through. And in older browsers, an attempt to make a cross-site XMLHttpRequest will simply fail (a request won’t be sent at all).
Both Safari 4 and Firefox 3.5 provide the withCredentials property on XMLHttpRequest in keeping with the emerging XMLHttpRequest Level 2 specification, and this can be used to detect an XMLHttpRequest object that implements CORS (and thus allows cross-site requests). This allows for a convenient “object detection” mechanism:
if(XMLHttpRequest){var request =new XMLHttpRequest();if(request.withCredentials!== undefined){// make cross-site requests}}
Alternatively, you can also use the “in” operator:
if("withCredentials"in request){// make cross-site requests}
Thus, the withCredentials property can be used in the context of capability detection. We’ll discuss the use of “withCredentials” as a means to send Cookies and HTTP-Auth data to sites later on in this article.
“Simple” Requests using GET or POST
IE8, Safari 4, and Firefox 3.5 allow simple GET and POST cross-site requests. “Simple” requests don’t set custom headers, and the request body only uses plain text (namely, the text/plain Content-Type).
Let us assume the following code snippet is served from a page on site http://foo.example and is making a call to http://bar.other:
var url ="http://bar.other/publicNotaries/"if(XMLHttpRequest){var request =new XMLHttpRequest();if("withCredentials"in request){// Firefox 3.5 and Safari 4
request.open('GET', url,true);
request.onreadystatechange= handler;
request.send();}elseif(XDomainRequest){// IE8var xdr =new XDomainRequest();
xdr.open("get", url);
xdr.send();// handle XDR responses -- not shown here :-)}// This version of XHR does not support CORS // Handle accordingly}
Firefox 3.5, IE8, and Safari 4 take care of sending and receiving the right headers. Here is the Simple Request example. It is also instructive to look at the headers sent back by the server. Notably, amongst the other request headers, the browser would send the following in order to enable the simple request above:
GET /publicNotaries/ HTTP/1.1
Referer: http://foo.example/notary-mashup/
Origin: http://foo.example
Note the use of the “Origin” HTTP header that is part of the CORS specification.
And, amongst the other response headers, the server at http://bar.other would include:
The CORS specification mandates that requests that use methods other than POST or GET, or that use custom headers, or request bodies other than text/plain, are preflighted. A preflighted request first sends the OPTIONS header to the resource on the other domain, to check and see if the actual request is safe to send. This capability is currently not supported by IE8′s XDomainRequest object, but is supported by Firefox 3.5 and Safari 4 with XMLHttpRequest. The web developer does not need to worry about the mechanics of preflighting, since the implementation handles that.
The code snippet below shows code from a web page on http://foo.example calling a resource on http://bar.other. For simplicity, we leave out the section on object and capability detection, since we’ve covered that already:
var invocation =new XMLHttpRequest();var url ='http://bar.other/resources/post-here/';var body ='
Arun';function callOtherDomain(){if(invocation){
invocation.open('POST', url,true);
invocation.setRequestHeader('X-PINGOTHER','pingpong');
invocation.setRequestHeader('Content-Type','application/xml');
invocation.onreadystatechange= handler;
invocation.send(body);}
In this case, before Firefox 3.5 sends the request, it first uses the OPTIONS header:
OPTIONS /resources/post-here/ HTTP/1.1
Origin: http://foo.example
Access-Control-Request-Method: POST
Access-Control-Request-Headers: X-PINGOTHER
Then, amongst the other response headers, the server responds with:
HTTP/1.1 200 OK
Access-Control-Allow-Origin: http://arunranga.com
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Allow-Headers: X-PINGOTHER
Access-Control-Max-Age: 1728000
At which point, the actual response is sent:
POST /resources/post-here/ HTTP/1.1
...
Content-Type: application/xml; charset=UTF-8
X-PINGOTHER: pingpong
...
Credentialed Requests
By default, “credentials” such as Cookies and HTTP Auth information are not sent in cross-site requests using XMLHttpRequest. In order to send them, you have to set the withCredentials property of the XMLHttpRequest object. This is a new property introduced in Firefox 3.5 and Safari 4. IE8′s XDomainRequest object does not have this capability.
Again, let us assume some JavaScript on a page on http://foo.example wishes to call a resource on http://bar.other and send Cookies with the request, such that the response is cognizant of Cookies the user may have acquired.
In general, data requested from a remote site should be treated as untrusted. Executing JavaScript code retrieved from a third-party site without first determining its validity is NOT recommended. Server administrators should be careful about leaking private data, and should judiciously determine that resources can be called in a cross-site manner.
The Mozilla Support Project and support.mozilla.com (SUMO for short) is an open and volunteer powered community that helps over 3 million Firefox users a week get support and help with their favorite browser. The Firefox support community maintains a knowledge base with articles in over 30 languages and works directly with users through our support forums or live chat service. They’ve put together the following demonstration of how to use open video and Flash-based video at the same time to provide embedded videos to users regardless of their browser. This demo article was written by Laura Thomson, Cheng Wang and Eric Cooper
Note: This article shows how to add video objects to a page using JavaScript. For most pages we suggest that you read the article that contains information on how to use the video tag to provide simple markup fallbacks. Markup-based fallbacks are much more elegant than JavaScript solutions and are generally recommended for use on the web.
One of the challenges to open video adoption on the web is making sure that online video still performs well on browsers that don’t currently support open video. Rather than asking users with these browsers to download the video file and use a separate viewer, the new <video> tag degrades gracefully to allow web developers to provide a good experience for everyone. As Firefox 3.5 upgrades the web, users in this transitional period can be shown open video or video using an existing plugin in an entirely seamless way depending on what their browser supports.
If you visit the page using Firefox 3.5, or another browser with open video support, and click on the “Watch a video of these instructions” link, you get a screencast using an Ogg-formatted file. If you visit with a browser that does not support <video> you get the exact same user experience using an open source .flv player and the same video encoded in the .flv format, or in some cases using the SWF Flash format. These alternate viewing methods use the virtually ubiquitous Adobe Flash plugin which is one of the most common ways to show video on the web.
The code works as follows.
In the page which contains the screencasts, we include some JavaScript. Excerpts from this code follow, but you can see or check out the complete listing from Mozilla SVN.
The code begins by setting up an object to represent the player:
We allocate a priority to each of the possible video formats. You’ll notice we also have the 'canUseVideo' attribute, which defaults to false.
Later on in the code, we test the user’s browser to see if it is video-capable:
var detect = document.createElement('video');if(typeof detect.canPlayType==='function'&&
detect.canPlayType('video/ogg;codecs="theora,vorbis"')=='probably'){
Screencasts.Player.canUseVideo=true;
Screencasts.Player.priority.ogg=
Screencasts.Player.priority.flv+2}
If we can create a video element and it indicates that it can play the Ogg Theora format we set canUseVideo to true and increase the priority of the Ogg file to be greater than the priority of the .flv file. (Note that you could also detect if the browser could play .mp4 files to support Safari out of the box.)
Finally, we use the priority to select which file is actually played, by iterating through the list of files to find the one that has the highest priority:
for(var x=0; x < file.length; x++){if(!best ){
best = file[x];}else{if(this.priority[best]<this.priority[file[x]])
best = file[x];}}
With these parts in place, the browser displays only the highest priority video and in a format that it can handle.
If you want to learn more about the Mozilla Support Project or get involved in helping Firefox users, check out their guide to getting started.
* Note: To view this demo using Safari 4 on Mac OSX, you will need to add Ogg support to Quicktime using the Xiph Quicktime plugin available from http://www.xiph.org/quicktime/download.html