Mozilla

Performance Articles

Sort by:

View:

  1. No Single Benchmark for the Web

    Google released a new JavaScript benchmark a few days ago called Octane. New benchmarks are always welcome, as they push browsers to new levels of performance in new areas. I was particularly pleased to see the inclusion of pdf.js, which unlike most benchmarks is real-world code, as well as the GB Emulator which is a very interesting type of performance-intensive code. However, every benchmark suite has limitations, and it is worth keeping that in mind, especially given the new benchmark’s title in the announcement and in the project page as “The JavaScript Benchmark Suite for the Modern Web” – which is a high goal to set for a single benchmark.

    Now, every benchmark must pick some code to run out of all the possible code out there, and picking representative code is very hard. So it is always understandable that benchmarks are never 100% representative of the code that exists and is important. However, even taking that into account, I have concerns with some of the code selected to appear in Octane: There are better versions of two of the five new benchmarks, and performance on those better versions is very different than the versions that do appear in Octane.

    Benchmarking black boxes

    One of the new benchmarks in Octane is “Mandreel”, which is the Bullet physics engine compiled by Mandreel, a C++ to JS compiler. Bullet is definitely interesting code to include in a benchmark. However the choice of Mandreel’s port is problematic. One issue is that Mandreel is a closed-source compiler, a black box, making it hard to learn from it what kind of code is efficient and what should be optimized. We just have a generated code dump, which, as a commercial product, would cost money for anyone to reproduce those results with modifications to the original C++ being run or a different codebase. We also do not have the source code compiled for this particular benchmark: Bullet itself is open source, but we don’t know the specific version compiled here, nor do we have the benchmark driver code that uses Bullet, both of which would be necessary to reproduce these results using another compiler.

    An alternative could have been to use Bullet compiled by Emscripten, an open source compiler that similarly compiles C++ to JS (disclaimer: I am an Emscripten dev). Aside from being open, Emscripten also has a port of Bullet (a demo can be seen here) that can interact in a natural way with regular JS, making it usable in normal web games and not just compiled ones, unlike Mandreel’s port. This is another reason for preferring the Emscripten port of Bullet instead.

    Is Mandreel representative of the web?

    The motivation Google gives for including Mandreel in Octane is that Mandreel is “used in countless web-based games.” It seems that Mandreel is primarily used in the Chrome Web Store (CWS) and less outside in the normal web. The quoted description above is technically accurate: Mandreel games in the CWS are indeed “web-based” (written in JS+HTML+WebGL) even if they are not actually “on the web”, where by “on the web” I mean outside of the walled garden of the CWS and in the normal web that all browsers can access. And it makes perfect sense that Google cares about the performance of code that runs in the CWS, since Google runs and profits from that store. But it does call into question the title of the Octane benchmark as “The JavaScript Benchmark Suite for the Modern Web.”

    Performance of generated code is highly variable

    With that said, it is still fair to say that compiler-generated code is increasing in importance on the web, so some benchmark must be chosen to represent it. The question is how much the specific benchmark chosen represents compiled code in general. On the one hand the compiled output of Mandreel and Emscripten is quite similar: both use large typed arrays, the same Relooper algorithm, etc., so we could expect performance to be similar. That doesn’t seem to always be the case, though. When we compare Bullet compiled by Mandreel with Bullet compiled by Emscripten – I made a benchmark of that a while back, it’s available here – then on my MacBook pro, Chrome is 1.5x slower than Firefox on the Emscripten version (that is, Chrome takes 1.5 times as long to execute in this case), but 1.5x faster on the Mandreel version that Google chose to include in Octane (that is, Chrome receives a score 1.5 times larger in this case). (I tested with Chrome Dev, which is the latest version available on Linux, and Firefox Aurora which is the best parallel to it. If you run the tests yourself, note that in the Emscripten version smaller numbers are better while the opposite is true in the Octane version.)

    (An aside, not only does Chrome have trouble running the Emscripten version quickly, but that benchmark also exposes a bug in Chrome where the tab consistently crashes when the benchmark is reloaded – possibly a dupe of this open issue. A serious problem of that nature, that does not happen on the Mandreel-compiled version, could indicate that the two were optimized differently as a result of having received different amounts of focus by developers.)

    Another issue with the Mandreel benchmark is the name. Calling it Mandreel implies it represents all Mandreel-generated code, but there can be huge differences in performance depending on what C/C++ code is compiled, even with a single compiler. For example, Chrome can be 10-15x slower than Firefox on some Emscripten-compiled benchmarks (example 1, example 2) while on others it is quite speedy (example). So calling the benchmark “Mandreel-Bullet” would have been better, to indicate it is just one Mandreel-compiled codebase, which cannot represent all compiled code.

    Box2DWeb is not the best port of Box2D

    “Box2DWeb” is another new benchmark in Octane, in which a specific port of Box2D to JavaScript is run, namely Box2DWeb. However, as seen here (see also this), Box2DWeb is significantly slower than other ports of Box2D to the web, specifically Mandreel and Emscripten’s ports from the original C++ that Box2D is written in. Now, you can justify excluding the Mandreel version because it cannot be used as a library from normal JS (just as with Bullet before), but the Emscripten-compiled one does not have that limitation and can be found here. (Demos can be seen here and here.)

    Another reason for preferring the Emscripten version is that it uses Box2D 2.2, whereas Box2DWeb uses the older Box2D 2.1. Compiling the C++ code directly lets the Emscripten port stay up to date with the latest upstream features and improvements far more easily.

    It is possible that Google surveyed websites and found that the slower Box2DWeb was more popular, although I have no idea whether that was the case, but if so that would partially justify preferring the slower version. However, even if that were true, I would argue that it would be better to use the Emscripten version because as mentioned earlier it is faster and more up to date. Another factor to consider is that the version included in Octane will get attention and likely an increase in adoption, which makes it all the more important to select the one that is best for the web.

    I put up a benchmark of Emscripten-compiled Box2D here, and on my machine Chrome is 3x slower than Firefox on that benchmark, but 1.6x faster on the version Google chose to include in Octane. This is a similar situation to what we saw earlier with the Mandreel/Bullet benchmark and it raises the same questions about how representative a single benchmark can be.

    Summary

    As mentioned at the beginning, all benchmarks are imperfect. And the fact that the specific code samples in Octane are ones that Chrome runs well does not mean the code was chosen for that reason: The opposite causation is far more likely, that Google chose to focus on optimizing those and in time made Chrome fast on them. And that is how things properly work – you pick something to optimize for, and then optimize for it.

    However, in 2 of the 5 new benchmarks in Octane there are good reasons for preferring alternative, better versions of those two benchmarks as we saw before. Now, it is possible that when Google started to optimize for Octane, the better options were not yet available – I don’t know when Google started that effort – but the fact that better alternatives exist in the present makes substantial parts of Octane appear less relevant today. Of course, if performance on the better versions was not much different than the Octane versions then this would not matter, but as we saw there were in fact significant differences when comparing browsers on those versions: One browser could be significantly better on one version of the same benchmark but significantly slower on another.

    What all of this shows is that there cannot be a single benchmark for the modern web. There are simply too many kinds of code, and even when we focus on one of them, different benchmarks of that particular task can behave very differently.

    With that said, we shouldn’t be overly skeptical: Benchmarks are useful. We need benchmarks to drive us forward, and Octane is an interesting new benchmark that, even with the problems mentioned above, does contain good ideas and is worth focusing on. But we should always be aware of the limitations of any single benchmark, especially when a single benchmark claims to represent the entire modern web.

     

  2. Getting snappy – performance optimizations in Firefox 13

    Back in the fall of 2011, we took a targeted look at Firefox responsiveness issues. We identified a number of short term projects that together could achieve significant responsiveness improvements in day-to-day Firefox usage. Project Snappy kicked off at the end of the year with the goal of improving Firefox responsiveness.

    Although Snappy first contributed fixes to Firefox 11, Snappy’s most noticeable contributions to date are landing with Firefox 13. Currently in beta, this release includes a number of responsiveness related fixes, most notably tabs-on-demand, cycle collector improvements, and start-up optimization.

    Tabs -on-Demand

    Tabs-on-demand is a feature that reduces start-up time for Firefox windows with many tabs. In Firefox 12, all tabs are loaded on start-up. For windows with many tabs this may cause a delay before you can interact with Firefox as each tab must load its content. In Firefox 13, only the active tab will load. Loading of background tabs is deferred until a tab is selected. This results in Firefox starting faster as tabs-on-demand reduces processing requirements, network usage, and memory consumption.

    Cycle Collector

    As you interact with the browser and Web content, memory is allocated as needed. The Firefox cycle collector works to automatically free some of this memory when it is no longer needed. This action reduces Firefox’s memory usage. In Firefox 13, the cycle collector is more efficient, spending less time examining memory that is still in use, which results in less pauses as you use Firefox.

    Start-up

    Firefox start-up time is visible to all users. Our investigation into start-up has identified a number of unoptimized routines in the code that executes before what we call “first paint”. “First paint” signifies when the Firefox user interface is first visible on your screen. In Firefox 13 we have optimized file calls, audio sessions, drag and drop, and overall IO, just to name a few. We are continuing to profile the Firefox start-up sequence to identify further optimizations that can be made in future releases.

    There are numerous other Snappy fixes in Firefox 13 including significant improvements to IO contention, font enumeration, and livemark overhead. All of these fixes contribute to a more responsive experience. We are already working on further responsiveness fixes for future Firefox releases. You can expect to see Snappy improvements in upcoming releases in areas such as memory usage, shutdown time, network cache and connections, menus, and graphics.

  3. There is no simple solution for local storage

    TL;DR: we have to stop advocating localStorage as a great opportunity for storing data as it performs badly. Sadly enough the alternatives are not nearly as supported or simple to implement.

    When it comes to web development you will always encounter things that sound too good to be true. Sometimes they are good, and all that stops us from using them is our notion of being conspicuous about *everything* as developers. In a lot of cases, however, they really are not as good as they seem but we only find out after using them for a while that we are actually “doing it wrong”.

    One such case is local storage. There is a storage specification (falsely attributed to HTML5 in a lot of examples) with an incredibly simple API that was heralded as the cookie killer when it came out. All you have to do to store content on the user’s machine is to access the navigator.localStorage (or sessionStorage if you don’t need the data to be stored longer than the current browser session):

    localStorage.setItem( 'outofsight', 'my data' );
    console.log( localStorage.getItem( 'outofsight' ) ); // -> 'my data'

    This local storage solution has a few very tempting features for web developers:

    • It is dead simple
    • It uses strings for storage instead of complex databases (and you can store more complex data using JSON encoding)
    • It is well supported by browsers
    • It is endorsed by a lot of companies (and was heralded as amazing when iPhones came out)

    A few known issues with it are that there is no clean way to detect when you reach the limit of local storage and there is no cross-browser way to ask for more space. There are also more obscure issues around sessions and HTTPS, but that is just the tip of the iceberg.

    The main issue: terrible performance

    LocalStorage also has a lot of drawbacks that aren’t quite documented and certainly not covered as much in “HTML5 tutorials”. Especially performance oriented developers are very much against its use.

    When we covered localStorage a few weeks ago using it to store images and files in localStorage it kicked off a massive thread of comments and an even longer internal mailing list thread about the evils of localStorage. The main issues are:

    • localStorage is synchronous in nature, meaning when it loads it can block the main document from rendering
    • localStorage does file I/O meaning it writes to your hard drive, which can take long depending on what your system does (indexing, virus scanning…)
    • On a developer machine these issues can look deceptively minor as the operating system cached these requests – for an end user on the web they could mean a few seconds of waiting during which the web site stalls
    • In order to appear snappy, web browsers load the data into memory on the first request – which could mean a lot of memory use if lots of tabs do it
    • localStorage is persistent. If you don’t use a service or never visit a web site again, the data is still loaded when you start the browser

    This is covered in detail in a follow-up blog post by Taras Glek of the Mozilla performance team and also by Andrea Giammarchi of Nokia.

    In essence this means that a lot of articles saying you can use localStorage for better performance are just wrong.

    Alternatives

    Of course, browsers always offered ways to store local data, some you probably never heard of as shown by evercookie (I think my fave when it comes to the “evil genius with no real-world use” factor is the force-cached PNG image to be read out in canvas). In the internal discussions there was a massive thrust towards advocating IndexedDB for your solutions instead of localStorage. We then published an article how to store images and files in IndexedDB and found a few issues – most actually related to ease-of-use and user interaction:

    • IndexedDB is a full-fledged DB that requires all the steps a SQL DB needs to read and write data – there is no simple key/value layer like localStorage available
    • IndexedDB asks the user for permission to store data which can spook them
    • The browser support is not at all the same as localStorage, right now IndexedDB is supported in IE10, Firefox and Chrome and there are differences in their implementations
    • Safari, Opera, iOS, Opera Mobile, Android Browser favour WebSQL instead (which is yet another standard that has been officially deprecated by the W3C)

    As always when there are differences in implementation someone will come up with an abstraction layer to work around that. Parashuram Narasimhan does a great job with that – even providing a jQuery plugin. It feels wrong though that we as implementers have to use these. It is the HTML5 video debate of WebM vs. H264 all over again.

    Now what?

    There is no doubt that the real database solutions and their asynchronous nature are the better option in terms of performance. They are also more matured and don’t have the “shortcut hack” feeling of localStorage. On the other hand they are hard to use in comparison, we already have a lot of solutions out there using localStorage and asking the user to give us access to storing local files is unacceptable for some implementations in terms of UX.

    The answer is that there is no simple solution for storing data on the end users’ machines and we should stop advocating localStorage as a performance boost. What we have to find is a solution that makes everybody happy and doesn’t break the current implementations. This might prove hard to work around. Here are some ideas:

    • Build a polyfill library that overrides the localStorage API and stores the content in IndexedDB/WebSQL instead? This is dirty and doesn’t work around the issue of the user being asked for permission
    • Implement localStorage in an asynchronous fashion in browsers – actively disregarding the spec? (this could set a dangerous precedent though)
    • Change the localStorage spec to store asynchronously instead of synchronously? We could also extend it to have a proper getStorageSpace interface and allow for native JSON support
    • Define a new standard that allows browser vendors to map the new API to the existing supported API that matches the best for the use case?

    We need to fix this as it doesn’t make sense to store things locally and sacrifice performance at the same time. This is a great example of how new web standards give us much more power but also make us face issues we didn’t have to deal with before. With more access to the OS, we also have to tread more carefully.

  4. Faster Canvas Pixel Manipulation with Typed Arrays

    Edit: See the section about Endiannes.

    Typed Arrays can significantly increase the pixel manipulation performance of your HTML5 2D canvas Web apps. This is of particular importance to developers looking to use HTML5 for making browser-based games.

    This is a guest post by Andrew J. Baker. Andrew is a professional software engineer currently working for Ibuildings UK where his time is divided equally between front- and back-end enterprise Web development. He is a principal member of the browser-based games channel #bbg on Freenode, spoke at the first HTML5 games conference in September 2011, and is a scout for Mozilla’s WebFWD innovation accelerator.


    Eschewing the higher-level methods available for drawing images and primitives to a canvas, we’re going to get down and dirty, manipulating pixels using ImageData.

    Conventional 8-bit Pixel Manipulation

    The following example demonstrates pixel manipulation using image data to generate a greyscale moire pattern on the canvas.

    JSFiddle demo.

    Let’s break it down.

    First, we obtain a reference to the canvas element that has an id attribute of canvas from the DOM.

    var canvas = document.getElementById('canvas');

    The next two lines might appear to be a micro-optimisation and in truth they are. But given the number of times the canvas width and height is accessed within the main loop, copying the values of canvas.width and canvas.height to the variables canvasWidth and canvasHeight respectively, can have a noticeable effect on performance.

    var canvasWidth  = canvas.width;
    var canvasHeight = canvas.height;

    We now need to get a reference to the 2D context of the canvas.

    var ctx = canvas.getContext('2d');

    Armed with a reference to the 2D context of the canvas, we can now obtain a reference to the canvas’ image data. Note that here we get the image data for the entire canvas, though this isn’t always necessary.

    var imageData = ctx.getImageData(0, 0, canvasWidth, canvasHeight);

    Again, another seemingly innocuous micro-optimisation to get a reference to the raw pixel data that can also have a noticeable effect on performance.

    var data = imageData.data;

    Now comes the main body of code. There are two loops, one nested inside the other. The outer loop iterates over the y axis and the inner loop iterates over the x axis.

    for (var y = 0; y < canvasHeight; ++y) {
        for (var x = 0; x < canvasWidth; ++x) {

    We draw pixels to image data in a top-to-bottom, left-to-right sequence. Remember, the y axis is inverted, so the origin (0,0) refers to the top, left-hand corner of the canvas.

    The ImageData.data property referenced by the variable data is a one-dimensional array of integers, where each element is in the range 0..255. ImageData.data is arranged in a repeating sequence so that each element refers to an individual channel. That repeating sequence is as follows:

    data[0]  = red channel of first pixel on first row
    data[1]  = green channel of first pixel on first row
    data[2]  = blue channel of first pixel on first row
    data[3]  = alpha channel of first pixel on first row
     
    data[4]  = red channel of second pixel on first row
    data[5]  = green channel of second pixel on first row
    data[6]  = blue channel of second pixel on first row
    data[7]  = alpha channel of second pixel on first row
     
    data[8]  = red channel of third pixel on first row
    data[9]  = green channel of third pixel on first row
    data[10] = blue channel of third pixel on first row
    data[11] = alpha channel of third pixel on first row
     
     
    ...

    Before we can plot a pixel, we must translate the x and y coordinates into an index representing the offset of the first channel within the one-dimensional array.

            var index = (y * canvasWidth + x) * 4;

    We multiply the y coordinate by the width of the canvas, add the x coordinate, then multiply by four. We must multiply by four because there are four elements per pixel, one for each channel.

    Now we calculate the colour of the pixel.

    To generate the moire pattern, we multiply the x coordinate by the y coordinate then bitwise AND the result with hexadecimal 0xff (decimal 255) to ensure that the value is in the range 0..255.

            var value = x * y & 0xff;

    Greyscale colours have red, green and blue channels with identical values. So we assign the same value to each of the red, green and blue channels. The sequence of the one-dimensional array requires us to assign a value for the red channel at index, the green channel at index + 1, and the blue channel at index + 2.

            data[index]   = value;	// red
            data[++index] = value;	// green
            data[++index] = value;	// blue

    Here we’re incrementing index, as we recalculate it with each iteration, at the start of the inner loop.

    The last channel we need to take into account is the alpha channel at index + 3. To ensure that the plotted pixel is 100% opaque, we set the alpha channel to a value of 255 and terminate both loops.

            data[++index] = 255;	// alpha
        }
    }

    For the altered image data to appear in the canvas, we must put the image data at the origin (0,0).

    ctx.putImageData(imageData, 0, 0);

    Note that because data is a reference to imageData.data, we don’t need to explicitly reassign it.

    The ImageData Object

    At time of writing this article, the HTML5 specification is still in a state of flux.

    Earlier revisions of the HTML5 specification declared the ImageData object like this:

    interface ImageData {
        readonly attribute unsigned long width;
        readonly attribute unsigned long height;
        readonly attribute CanvasPixelArray data;
    }

    With the introduction of typed arrays, the type of the data attribute has altered from CanvasPixelArray to Uint8ClampedArray and now looks like this:

    interface ImageData {
        readonly attribute unsigned long width;
        readonly attribute unsigned long height;
        readonly attribute Uint8ClampedArray data;
    }

    At first glance, this doesn’t appear to offer us any great improvement, aside from using a type that is also used elsewhere within the HTML5 specification.

    But, we’re now going to show you how you can leverage the increased flexibility introduced by deprecating CanvasPixelArray in favour of Uint8ClampedArray.

    Previously, we were forced to write colour values to the image data one-dimensional array a single channel at a time.

    Taking advantage of typed arrays and the ArrayBuffer and ArrayBufferView objects, we can write colour values to the image data array an entire pixel at a time!

    Faster 32-bit Pixel Manipulation

    Here’s an example that replicates the functionality of the previous example, but uses unsigned 32-bit writes instead.

    NOTE: If your browser doesn’t use Uint8ClampedArray as the type of the data property of the ImageData object, this example won’t work!

    JSFiddle demo.

    The first deviation from the original example begins with the instantiation of an ArrayBuffer called buf.

    var buf = new ArrayBuffer(imageData.data.length);

    This ArrayBuffer will be used to temporarily hold the contents of the image data.

    Next we create two ArrayBuffer views. One that allows us to view buf as a one-dimensional array of unsigned 8-bit values and another that allows us to view buf as a one-dimensional array of unsigned 32-bit values.

    var buf8 = new Uint8ClampedArray(buf);
    var data = new Uint32Array(buf);

    Don’t be misled by the term ‘view’. Both buf8 and data can be read from and written to. More information about ArrayBufferView is available on MDN.

    The next alteration is to the body of the inner loop. We no longer need to calculate the index in a local variable so we jump straight into calculating the value used to populate the red, green, and blue channels as we did before.

    Once calculated, we can proceed to plot the pixel using only one assignment. The values of the red, green, and blue channels, along with the alpha channel are packed into a single integer using bitwise left-shifts and bitwise ORs.

            data[y * canvasWidth + x] =
                (255   << 24) |	// alpha
                (value << 16) |	// blue
                (value <<  8) |	// green
                 value;		// red
        }
    }

    Because we’re dealing with unsigned 32-bit values now, there’s no need to multiply the offset by four.

    Having terminated both loops, we must now assign the contents of the ArrayBuffer buf to imageData.data. We use the Uint8ClampedArray.set() method to set the data property to the Uint8ClampedArray view of our ArrayBuffer by specifying buf8 as the parameter.

    imageData.data.set(buf8);

    Finally, we use putImageData() to copy the image data back to the canvas.

    Testing Performance

    We’ve told you that using typed arrays for pixel manipulation is faster. We really should test it though, and that’s what this jsperf test does.

    At time of writing, 32-bit pixel manipulation is indeed faster.

    Wrapping Up

    There won’t always be occasions where you need to resort to manipulating canvas at the pixel level, but when you do, be sure to check out typed arrays for a potential performance increase.

    EDIT: Endianness

    As has quite rightly been highlighted in the comments, the code originally presented does not correctly account for the endianness of the processor on which the JavaScript is being executed.

    The code below, however, rectifies this oversight by testing the endianness of the target processor and then executing a different version of the main loop dependent on whether the processor is big- or little-endian.

    JSFiddle demo.

    A corresponding jsperf test for this amended code has also been written and shows near-identical results to the original jsperf test. Therefore, our final conclusion remains the same.

    Many thanks to all commenters and testers.

  5. Firefox 7 is lean and fast

    Based on a blog post originally posted here by Nicholas Nethercote, Firefox Developer.

    tl;dr
    Firefox 7 now uses much less memory than previous versions: often 20% to 30% less, and sometimes as much as 50% less. This means that Firefox and the websites you use will be snappier, more responsive, and suffer fewer pauses. It also means that Firefox is less likely to crash or abort due to running out of memory.

    These benefits are most noticeable if you do any of the following:
    – keep Firefox open for a long time;
    – have many tabs open at once, particularly tabs with many images;
    – view web pages with large amounts of text;
    – use Firefox on Windows
    – use Firefox at the same time as other programs that use lots of memory.

    Background

    Mozilla engineers started an effort called MemShrink, the aim of which is to improve Firefox’s speed and stability by reducing its memory usage. A great deal of progress has been made, and thanks to Firefox’s faster development cycle, each improvement made will make its way into a final release in only 12–18 weeks. The newest update to Firefox is the first general release to benefit from MemShrink’s successes, and the benefits are significant.

    Quantifying the improvements
    Measuring memory usage is difficult: there are no standard benchmarks, there are several different metrics you can use, and memory usage varies enormously depending on what the browser is doing. Someone who usually has only a handful of tabs open will have an entirely different experience from someone who usually has hundreds of tabs open. (This latter case is not uncommon, by the way, even though the idea of anyone having that many tabs open triggers astonishment and disbelief in many people. E.g. see the comment threads here and here.)

    Endurance tests
    Dave Hunt and others have been using the MozMill add-on to perform “endurance tests“, where they open and close large numbers of websites and track memory usage in great detail. Dave recently performed an endurance test comparison of development versions of Firefox, repeatedly opening and closing pages from 100 widely used websites in 30 tabs.

    [The following numbers were run while the most current version of Firefox was in Beta and capture the average and peak “resident” memory usage for each browser version over five runs of the tests. “Resident” memory usage is the amount of physical RAM that is being used by Firefox, and is thus arguably the best measure of real machine resources being used.]

    2
    3

    The measurements varied significantly between runs. If we do a pair-wise comparison of runs, we see the following relative reductions in memory usage:

    Minimum resident: 1.1% — 23.5% (median 6.6%)
    Maximum resident: -3.5% — 17.9% (median 9.6%)
    Average resident: 4.4% — 27.3% (median 20.0%)

    The following two graphs showing how memory usage varied over time during Run 1 for each version. Firefox 6’s graph is first, with the latest version second. (Note: Compare only to the purple “resident” lines; the meaning of the green “explicit” line changed between the versions and so the two green lines cannot be sensibly compared.)
    Firefox 7 is clearly much better; its graph is both lower and has less variation.

    ff6
    ff7


    MemBench

    Gregor Wagner has a memory stress test called MemBench. It opens 150 websites in succession, one per tab, with a 1.5 second gap between each site. The sites are mostly drawn from Alexa’s Top sites list. I ran this test on 64-bit builds of Firefox 6 and 7 on my Ubuntu Linux machine, which has 16GB of RAM. Each time, I let the stress test complete and then opened about:memory to get measurements for the peak resident usage. Then I hit the “Minimize memory usage” button in about:memory several times until the numbers stabilized again, and then re-measured the resident usage. (Hitting this button is not something normal users do, but it’s useful for testing purposes because causes Firefox to immediately free up memory that would be eventually freed when garbage collection runs.)

    For Firefox 6, the peak resident usage was 2,028 MB and the final resident usage was 669 MB. For Firefox 7, the peak usage was 1,851 MB (a 8.7% reduction) and the final usage was 321 MB (a 52.0% reduction). This latter number clearly shows that fragmentation is a much smaller problem in Firefox 7.
    (On a related note, Gregor recently measured cutting-edge development versions of Firefox and Google Chrome on MemBench.)


    Conclusion

    Obviously, these tests are synthetic and do not match exactly how users actually use Firefox. (Improved benchmarking is one thing we’re working on as part of MemShrink, but we’ve got a long way to go. ) Nonetheless, the basic operations (opening and closing web pages in tabs) are the same, and we expect the improvements in real usage will mirror improvements in the tests.

    This means that users should see Firefox 7 using less memory than earlier versions — often 20% to 30% less, and sometimes as much as 50% less — though the improvements will depend on the exact workload. Indeed, we have had lots of feedback from early users that the latest Firefox update feels faster, is more responsive, has fewer pauses, and is generally more pleasant to use than previous versions.

    Mozilla’s MemShrink efforts are continuing. The endurance test results above show that the Beta version of Firefox already has even better memory usage, and I expect we’ll continue to make further improvements as time goes on.

  6. Detecting and generating CSS animations in JavaScript

    When writing of the hypnotic spiral demo the issue appeared that I wanted to use CSS animation when possible but have a fallback to rotate an element. As I didn’t want to rely on CSS animation I also considered it pointless to write it by hand but instead create it with JavaScript when the browser supports it. Here’s how that is done.

    Testing for the support of animations means testing if the style attribute is supported:

    var animation = false,
        animationstring = 'animation',
        keyframeprefix = '',
        domPrefixes = 'Webkit Moz O ms Khtml'.split(' '),
        pfx  = '';
     
    if( elm.style.animationName ) { animation = true; }
     
    if( animation === false ) {
      for( var i = 0; i < domPrefixes.length; i++ ) {
        if( elm.style[ domPrefixes[i] + 'AnimationName' ] !== undefined ) {
          pfx = domPrefixes[ i ];
          animationstring = pfx + 'Animation';
          keyframeprefix = '-' + pfx.toLowerCase() + '-';
          animation = true;
          break;
        }
      }
    }

    [Update – the earlier code did not check if the browser supports animation without a prefix – this one does]

    This checks if the browser supports animation without any prefixes. If it does, the animation string will be ‘animation’ and there is no need for any keyframe prefixes. If it doesn’t then we go through all the browser prefixes (to date :)) and check if there is a property on the style collection called browser prefix + AnimationName. If there is, the loop exits and we define the right animation string and keyframe prefix and set animation to true. On Firefox this will result in MozAnimation and -moz- and on Chrome in WebkitAnimation and -webkit- so on. This we can then use to create a new CSS animation in JavaScript. If none of the prefix checks return a supported style property we animate in an alternative fashion.

    if( animation === false ) {
     
      // animate in JavaScript fallback
     
    } else {
      elm.style[ animationstring ] = 'rotate 1s linear infinite';
     
      var keyframes = '@' + keyframeprefix + 'keyframes rotate { '+
                        'from {' + keyframeprefix + 'transform:rotate( 0deg ) }'+
                        'to {' + keyframeprefix + 'transform:rotate( 360deg ) }'+
                      '}';
     
      if( document.styleSheets && document.styleSheets.length ) {
     
          document.styleSheets[0].insertRule( keyframes, 0 );
     
      } else {
     
        var s = document.createElement( 'style' );
        s.innerHTML = keyframes;
        document.getElementsByTagName( 'head' )[ 0 ].appendChild( s );
     
      }
     
    }

    With the animation string defined we can set a (shortcut notation) animation on our element. Now, adding the keyframes is trickier. As they are not part of the original Animation but disconnected from it in the CSS syntax (to give them more flexibility and allow re-use) we can’t set them in JavaScript. Instead we need to write them out as a CSS string.

    If there is already a style sheet applied to the document we add this keyframe definition string to that one, if there isn’t a style sheet available we create a new style block with our keyframe and add it to the document.

    You can see the detection in action and a fallback JavaScript solution on JSFiddle:

    JSFiddle demo.

    Quite simple, but also a bit more complex than I originally thought. You can also dynamically detect and change current animations as this post by Wayne Pan and this demo by Joe Lambert explains but this also seems quite verbose.

    I’d love to have a CSSAnimations collection for example where you could store different animations in JSON or as a string and have their name as the key. Right now, creating a new rule dynamically and adding it either to the document or append it to the ruleset seems to be the only cross-browser way. Thoughts?

  7. Aurora 7 is here

    Aurora Logo

    Download Aurora

    Keeping up the pace with our new development cycle, today we release Aurora 7. Enjoy its new features and performance improvements: CSS “text-overflow: ellipsis“, Navigation Timing API, reduced memory usage, a faster javascript parser, and the first steps of Azure, our new graphics API.

    text-overflow: ellipsis;

    It is now possible to get Firefox to display “” to give a visual clue that a text is longer than the element containing it.

    At last, with text-overflow implemented in Aurora 7 it’s now possible to create a cross-browser ellipsis!

    Navigation Timing

    Performance is a key parameter of the user experience on the Web. To help Web developers monitor efficiently the performance of their Web pages, Aurora 7 implements the Navigation Timing specification: using the window.performance.timing object, developers will be able to know the time when different navigation steps (such as navigationStart, connectStart/End, responseStart/End, domLoading/Complete) happened and deduce how long one step or a sequence of steps took to complete.

    Reduced Memory Usage

    Our continuous efforts to monitor and reduce memory consumption in Firefox will substantially pay off with Aurora 7:

    • The memory “zone” where javascript objects reside gets fragmented as objects are created and deleted. To reduce the negative impact of this fragmentation, long-lived objects created by the browser’s own UI have been separated from the objects created by Web pages. The browser can now free memory more efficiently when a tab is closed or after a garbage collection.
    • Speaking of garbage collection, as we successfully reduced the cost of this operation, we are able to execute it more often. Not only is memory freed more rapidly, but this also leads to shorter GC pauses(the period where javascript execution stops to let the garbage collector do his job, which is sometime noticeable during heavy animations).
    • All those improvements are reflected in the about:memory page, which is now able to tell how much memory a particular Web page or the browser’s own UI, is using.

    More frequent updates and detailed explanations of the memshrink effort are posted on Nicholas Nethercote’s blog.

    Faster Javascript Parsing

    A javascript parser is the part of the browser that reads the javascript before it gets executed by the javascript engine. With modern Web applications such as Gmail or Facebook sending close to 1Mb of javascript, being able to still read all of that code instantly matters in the quest of responsive user experience.
    Thanks to Nicholas’s work, our parser is now almost twice as fast as it used to. This adds up well with our constant work to improve the execution speed of our javascript engine.

    First Steps of Azure

    After the layout engine (Gecko) has computed the visual appearance (position, dimension, colors, …) of all elements in the window, the browser asks the Operating System to actually draw them on the screen. The browser needs an abstraction layer to be able to talk to the different graphics libraries of the different OSes, but this layer has to be as thin and as adaptable as possible to deliver the promises of hardware acceleration.
    Azure is the name of the new and better graphics API/abstraction layer that is going to gradually replace Cairo in hardware accelerated environments. In Aurora 7, it is already able to interact with Windows 7’s Direct2D API to render the content of a <canvas> element (in a 2D context). You can read a detailed explanation of the Azure project and its next steps on Joe Drew’s blog.

    Other Improvements

    HTML

    Canvas

    • Specifying invalid values when calling setTransform(), bezierCurveTo(), or arcTo() no longer throws an exception; these calls are now correctly silently ignored.
    • Calling strokeRect with a zero width and height now correctly does nothing. (see bug 663190 )
    • Calling drawImage with a zero width or height <canvas> now throws INVALID_STATE_ERR. (see bug 663194 )
    • toDataURL() method now accepts a second argument to control JPEG quality (see bug 564388 )

    CSS

    MathML

    • XLink href has been restored and the MathML3 href attribute is now supported. Developers are encouraged to move to the latter syntax.
    • Support for the voffset attribute on <mpadded> elements has been added and behavior of lspace attribute has been fixed.
    • The top-level <math> element accepts any attributes of the <mstyle> element.
    • The medium line thickness of fraction bars in <mfrac> elements has been corrected to match the default thickness.
    • Names for negative spaces are now supported.

    DOM

    • The File interface’s non-standard methods getAsBinary(), getAsDataURL(), and getAsText() have been removed as well as the non-standard properties fileName and fileSize.
    • The FileReader readAsArrayBuffer() method is now implemented. (see bug 632255 )
    • document.createEntityReference has been removed. It was never properly implemented and is not implemented in most other browsers. (see bug 611983 )
    • document.normalizeDocument has been removed. Use Node.normalize instead. (see bug 641190 )
    • DOMTokenList.item now returns undefined if the index is out of bounds, previously it returned null. (see bug 529328 )
    • Node.getFeature has been removed. (see bug 659053 )

    JavaScript

    Net

    • WebSockets are now available on Firefox Mobile. (see bug 537787 )

    console API

    • Implement console.dir(), console.time(), console.timeEnd(), console.group() and console.groupEnd() methods.
    • Message logged with console.log before the WebConsole is opened are now stored and displayed once the WebConsole is opened.

    (see the Web Console page in the Wiki)

    Web Timing

  8. Firefox 5 is here

    Today, three months after the release of Firefox 4, we release Firefox 5, thanks to our new development cycle. Developers will be able to create richer animations using CSS3 Animations. This release comes with various improvements, performance optimization and bug fixes.

    CSS3 Animations

    CSS Animations (check out the documentation) are a new way to create animations using CSS. Like CSS Transitions, they are efficient and run smoothly (see David Baron’s article), and the developers have a better controls over the intermediate steps (keyframes), and can now create much more complex animations.

    Notable changes

    Other Bug Fixes and Performance Improvements:

    HTML

    Canvas improvements

    • The <canvas> 2D drawing context now supports specifying an ImageData object as the input to the createImageData() method; this creates a new ImageData object initialized with the same dimensions as the specified object, but still with all pixels preset to transparent black.
    • Specifying non-finite values when adding color stops through a call to the CanvasGradient method addColorStop() now correctly throws INDEX_SIZE_ERR instead of SYNTAX_ERR.
    • The HTMLCanvasElement method toDataURL() now correctly lower-cases the specified MIME type before matching.
    • getImageData() now correctly accepts rectangles that extend beyond the bounds of the canvas; pixels outside the canvas are returned as transparent black.
    • drawImage() and createImageData() now handle negative arguments in accordance with the specification, by flipping the rectangle around the appropriate axis.
    • Specifying non-finite values when calling createImageData() now properly throws a NOT_SUPPORTED_ERR exception.
    • createImageData() and getImageData() now correctly return at least one pixel’s worth of image data if a rectangle smaller than one pixel is specified.
    • Specifying a negative radius when calling createRadialGradient() now correctly throws INDEX_SIZE_ERR.
    • Specifying a null or undefined image when calling createPattern() or drawImage() now correctly throws a TYPE_MISMATCH_ERR exception.
    • Specifying invalid values for globalAlpha no longer throws a SYNTAX_ERR exception; these are now correctly silently ignored.
    • Specifying invalid values when calling translate(), transform(), rect(), clearRect(), fillRect(), strokeRect(), lineTo(), moveTo(), quadraticCurveTo(), or arc() no longer throws an exception; these calls are now correctly silently ignored.
    • Setting the value of shadowOffsetX, shadowOffsetY, or shadowBlur to an invalid value is now silently ignored.
    • Setting the value of rotate or scale to an invalid value is now silently ignored.

    CSS

    • Support for CSS animations has been added, using the -moz- prefix for now.

    DOM

    • The selection object’s modify() method has been changed so that the “word” selection granularity no longer includes trailing spaces; this makes it more consistent across platforms and matches the behavior of WebKit’s implementation.
    • The window.setTimeout() method now clamps to send no more than one timeout per second in inactive tabs. In addition, it now clamps nested timeouts to the smallest value allowed by the HTML5 specification: 4 ms (instead of the 10 ms it used to clamp to).
    • Similarly, the window.setInterval() method now clamps to no more than one interval per second in inactive tabs.
    • XMLHttpRequest now supports the loadend event for progress listeners. This is sent after any transfer is finished (that is, after the abort, error, or load event). You can use this to handle any tasks that need to be performed regardless of success or failure of a transfer.
    • The Blob and, by extension, the File objects’ slice() method has been removed and replaced with a new, proposed syntax that makes it more consistent with Array.slice() and String.slice() methods in JavaScript. This method is named mozSlice() for now.
    • The value of window.navigator.language is now determined by looking at the value of the Accept-Language HTTP header.

    JavaScript

    • Regular expressions are no longer callable as if they were functions; this change has been made in concert with the WebKit team to ensure compatibility (see WebKit bug 28285).
    • The Function.prototype.isGenerator() method is now supported; this lets you determine if a function is a generator.

    SVG

    • The class SVG attribute can now be animated.
    • The following SVG-related DOM interfaces representing lists of objects are now indexable and can be accessed like arrays; in addition, they have a length property indicating the number of items in the lists: SVGLengthList , SVGNumberList , SVGPathSegList , and SVGPointList.

    HTTP

    • Firefox no longer sends the “Keep-Alive” HTTP header; we weren’t formatting it correctly, and it was redundant since we were also sending the Connection: or Proxy-Connection: header with the value “keep-alive” anyway.
    • The HTTP transaction model has been updated to be more intelligent about reusing connections in the persistent connection pool; instead of treating the pool as a FIFO queue, Necko now attempts to sort the pool with connections with the largest congestion window (CWND) first. This can reduce the round-trip time (RTT) of HTTP transactions by avoiding the need to grow connections’ windows in many cases.
    • Firefox now handles the Content-Disposition HTTP response header more effectively if both the filename and filename* parameters are provided; it looks through all provided names, using the filename* parameter if one is available, even if a filename parameter is included first. Previously, the first matching parameter would be used, thereby preventing a more appropriate name from being used. See bug 588781 .

    MathML

    Developer tools

    • The Web Console’s Console object now has a debug() method, which is an alias for its log() method; this improves compatibility with certain existing sites.