Mozilla

Performance Articles

Sort by:

View:

  1. Upgrade your graphics drivers for best results with Firefox 4

    Benoit Jacob from the platform engineering team has a blog post on how to best take advantage of hardware acceleration and WebGL in Firefox 4, namely: Upgrade your graphics drivers!

    Firefox 4 automatically disables the hardware acceleration and WebGL features if the graphics driver on your system has bugs that cause Firefox to crash. You still get all the other benefits of Firefox 4, of course, just not the newest graphics features. But for best results, you need an up-to-date graphics driver that fixes those bugs.

    If you’re planning to develop using WebGL, you need to also spread this message to your users, so they will be able to experience the awesome results of your hard work.

  2. Generational Garbage Collection in Firefox

    Generational garbage collection (GGC) has now been enabled in the SpiderMonkey JavaScript engine in Firefox 32. GGC is a performance optimization only, and should have no observable effects on script behavior.

    So what is it? What does it do?

    GGC is a way for the JavaScript engine to collect short-lived objects faster. Say you have code similar to:

    function add(point1, point2) {
        return [ point1[0] + point2[0], point1[1] + point2[1] ];
    }

    Without GGC, you will have high overhead for garbage collection (from here on, just “GC”). Each call to add() creates a new Array, and it is likely that the old arrays that you passed in are now garbage. Before too long, enough garbage will pile up that the GC will need to kick in. That means the entire JavaScript heap (the set of all objects ever created) needs to be scanned to find the stuff that is still needed (“live”) so that everything else can be thrown away and the space reused for new objects.

    If your script does not keep very many total objects live, this is totally fine. Sure, you’ll be creating tons of garbage and collecting it constantly, but the scan of the live objects will be fast (since not much is live). However, if your script does create a large number of objects and keep them alive, then the full GC scans will be slow, and the performance of your script will be largely determined by the rate at which it produces temporary objects — even when the older objects aren’t changing, and you’re just re-scanning them over and over again to discover what you already knew. (“Are you dead?” “No.” “Are you dead?” “No.” “Are you dead?”…)

    Generational collector, Nursery & Tenured

    With a generational collector, the penalty for temporary objects is much lower. Most objects will be allocated into a separate memory region called the Nursery. When the Nursery fills up, only the Nursery will be scanned for live objects. The majority of the short-lived temporary objects will be dead, so this scan will be fast. The survivors will be promoted to the Tenured region.

    The Tenured heap will also accumulate garbage, but usually at a far lower rate than the Nursery. It will take much longer to fill up. Eventually, we will still need to do a full GC, but under typical allocation patterns these should be much less common than Nursery GCs. To distinguish the two cases, we refer to Nursery collections as minor GCs and full heap scans as major GCs. Thus, with a generational collector, we split our GCs into two types: mostly fast minor GCs, and fewer slower major GCs.

    GGC Overhead

    While it might seem like we should have always been doing this, it turns out to require quite a bit of infrastructure that we previously did not have, and it also incurs some overhead during normal operation. Consider the question of how to figure out whether some Nursery object is live. It might be pointed to by a live Tenured object — for example, if you create an object and store it into a property of a live Tenured object.

    How do you know which Nursery objects are being kept alive by Tenured objects? One alternative would be to scan the entire Tenured heap to find pointers into the Nursery, but this would defeat the whole point of GGC. So we need a way of answering the question more cheaply.

    Note that these Tenured ⇒ Nursery edges in the heap graph won’t last very long, because the next minor GC will promote all survivors in the Nursery to the Tenured heap. So we only care about the Tenured objects that have been modified since the last minor (or major) GC. That won’t be a huge number of objects, so we make the code that writes into Tenured objects check whether it is writing any Nursery pointers, and if so, record the cross-generational edges in a store buffer.

    In technical terms, this is known as a write barrier. Then, at minor GC time, we walk through the store buffer and mark every target Nursery object as being live. (We actually use the source of the edge at the same time, since we relocate the Nursery object into the Tenured area while marking it live, and thus the Tenured pointer into the Nursery needs to be updated.)

    With a store buffer, the time for a minor GC is dependent on the number of newly-created edges from the Tenured area to the Nursery, not just the number of live objects in the Nursery. Also, keeping track of the store buffer records (or even just the checks to see whether a store buffer record needs to be created) does slow down normal heap access a little, so some code patterns may actually run slower with GGC.

    Allocation Performance

    On the flip side, GGC can speed up object allocation. The pre-GGC heap needs to be fully general. It must track in-use and free areas and avoid fragmentation. The GC needs to be able to iterate over everything in the heap to find live objects. Allocating an object in a general heap like this is surprisingly complex. (GGC’s Tenured heap has pretty much the same set of constraints, and in fact reuses the pre-GGC heap implementation.)

    The Nursery, on the other hand, just grows until it is full. You never need to delete anything, at least until you free up the whole Nursery during a minor GC, so there is no need to track free regions. Consequently, the Nursery is perfect for bump allocation: to allocate N bytes you just check whether there is space available, then increment the current end-of-heap pointer by N bytes and return the previous pointer.

    There are even tricks to optimize away the “space available” check in many cases. As a result, objects with a short lifespan never go through the slower Tenured heap allocation code at all.

    Timings

    I wrote a simple benchmark to demonstrate the various possible gains of GGC. The benchmark is sort of a “vector Fibonacci” calculation, where it computes a Fibonacci sequence for both the x and y components of a two dimensional vector. The script allocates a temporary object on every iteration. It first times the loop with the (Tenured) heap nearly empty, then it constructs a large object graph, intended to be placed into the Tenured portion of the heap, and times the loop again.

    On my laptop, the benchmark shows huge wins from GGC. The average time for an iteration through the loop drops from 15 nanoseconds (ns) to 6ns with an empty heap, demonstrating the faster Nursery allocation. It also shows the independence from the Tenured heap size: without GGC, populating the long-lived heap slows down the mean time from 15ns to 27ns. With GGC, the speed stays flat at 6ns per iteration; the Tenured heap simply doesn’t matter.

    Note that this benchmark is intended to highlight the improvements possible with GGC. The actual benefit depends heavily on the details of a given script. In some scripts, the time to initialize an object is significant and may exceed the time required to allocate the memory. A higher percentage of Nursery objects may get tenured. When running inside the browser, we force enough major GCs (eg, after a redraw) that the benefits of GGC are less noticeable.

    Also, the description above implies that we will pause long enough to collect the entire heap, which is not the case — our incremental garbage collector dramatically reduces pause times on many Web workloads already. (The incremental and generational collectors complement each other — each attacks a different part of the problem.)

    Continued…

  3. Doom on the Web

    Update: We had a doubt whether this port of the Open Source Doom respected its term of use. We decided to remove it from our Website before taking an informed and definitive decision.

    This is a guest post written by Alon Zakai. Alon is one of the Firefox Mobile developers, and in his spare time does experiments with JavaScript and new web technologies. One of those experiments is Emscripten, an LLVM-to-JavaScript compiler, and below Alon explains how it uses typed arrays to run the classic first-person shooter Doom on the web.

    As a longtime fan of first-person shooters, I’ve wanted to bring them to the web. Writing one from scratch is very hard, though, so instead I took the original Doom, which is open source, and compiled it from C to JavaScript using Emscripten. The result is a version of Doom that can be played on the web, using standard web technologies.

    Doom renders by writing out pixel data to memory, then copying that pixel data to the screen, after converting colors and so forth. For this demo, the compiled code has memory that is simulated using a large JavaScript array (so element N in that array represents the contents of memory address N in normal native code). That means that rendering, color conversion, and copying to the screen are all operations done on that large JavaScript array. Basically the code has large loops that copy or modify elements of that array. For that to be as fast as possible, the demo optionally uses JavaScript typed arrays, which look like normal JavaScript arrays but are guaranteed to be flat arrays of a particular data type.

    // Create an array which contains only 32-bit Integers
    var buffer = new Int32Array(1000);
    for ( var i = 0 ; i < 1000 ; i++ ) {
        buffer[i] = i;
    }

    When using a typed array, the main difference from a normal JavaScript array is that the elements of the array all have the type that you set. That means that working on that array can be much faster than a normal array, because it corresponds very closely to a normal low-level C or C++ array. In comparison, a normal JavaScript array can also be sparse, which means that it isn't a single contiguous section of memory. In that case, each access of the array has a cost, that of calculating the proper memory address. Finding the memory address is much faster with a typed array because it is simple and direct. As a consequence, in the Doom demo the frame rate is almost twice as fast with typed arrays than without them.

    Typed arrays are very important in WebGL and in the Audio Data API, as well as in Canvas elements (the pixel data received from getImageData() is, in fact, a typed array). However, typed arrays can also be used independently if you are working on large amounts of array-like data, which is exactly the case with the Doom demo. Just be careful that your code also works if the user's browser does not support typed arrays. This is fairly easy to do because typed arrays look and behave, for the most part, like normal ones — you access their elements using square brackets, and so forth. The main potential pitfalls are:

    • Typed arrays do not have the slice(). Instead they have the subarray(), which does not create a copy of the array — instead it's a view into the same data.
    • Don't forget that the type of the typed array is silently enforced. If you write 5.25 to an element of an integer-typed array and then read back that exact same element, you get 5 and not 5.25.
  4. Mozilla developer preview (Gecko 1.9.3a2) now available

    We’ve posted a new release of our Mozilla developer preview series as a way to test new features that we’re putting into the Mozilla platform. These features may or may not make it into a future Firefox release, either for desktops or for mobile phones. But that’s why we do these releases – to get testing and feedback early so we know how to treat them.

    Note that this release does not contain two things that have gotten press recently: D2D or the new JavaScript VM work we’ve been doing.

    Since this is a weblog focused on web developers, I think that it’s important to talk about what’s new for all of you. So we’ll jump right into that:

    Out of Process Plugins

    We did an a1 release about three weeks ago in order to get testing on some of the new web developer features (which we’ll list here again.) The biggest change between that release and this one is the inclusion of out of process plugins for Windows and Linux. (Mac is a little bit more work and we’re working on it as fast as our little fingers will type.)

    There are a lot of plugins out there on the web, and they exist to varying degrees of quality. So we’re pushing plugins out of process so that when one of them crashes it doesn’t take the entire browser with it. (It also has lots of other nice side effects – we can better control memory usage, CPU usage and it also helps with general UI lag.)

    If you want to know more about it, have a look at this post by Ben Smedberg who goes over how it works, what prefs you can set and how you can help test it. It would help us a lot of you did.

    (If you really want to get on the testing train we strongly suggest you start running our nightly builds which are the ultimate in bleeding edge but are generally stable enough for daily use.)

    Anyway, on to the feature list and performance improvements taken from the release announcement:

    Web Developer Features

    • Support for Content Security Policy. This is largely complete, minus the ability to disable eval().
    • The placeholder attribute for <input/> and <textarea> is now supported.
    • Support for SMIL Animation in SVG. Support for animating some SVG attributes is still under development and the animateMotion element isn’t supported yet.
    • Support for CSS Transitions. This support is not quite complete: support for animation of transforms and gradients has not yet been implemented.
    • Support for WebGL, which is disabled by default but can be enabled by changing a preference. See this blog post and this blog post for more details.
    • Support for the getClientRects and getBoundingClientRect methods on Range objects. See bug 396392 for details.
    • Support for the setCapture and releaseCapture methods on DOM elements. See bug 503943 for details.
    • Support for the HTML5 History.pushState() and History.replaceState() methods and the popstate event. See bug 500328 for details.
    • Support for the -moz-image-rect() value for background-image. See bug 113577 for more details.

    For the full list of new web developer features please visit our page on Upcoming Features for Web Developers.

    Performance Improvements

    • We’ve removed link history lookup from the main thread and made it asynchronous. This results in less I/O during page loads and improves overall browser responsiveness.
    • Loading the HTML5 spec no longer causes very long browser pauses.
    • A large number of layout performance improvements have been made, including work around DOM access times, color management performance, text area improvements and many other hot spots in the layout engine.
    • The JavaScript engine has many improvements: String handling is improved, faster closures, some support for recursion in TraceMonkey to name a few.
    • Improvements to the performance of repainting HTML in <foreignObject>.
    • Strings are not copied between the main DOM code and web workers, improving performance for threaded JavaScript which moves large pieces of data between threads.
  5. Firefox OS, Animations & the Dark Cubic-Bezier of the Soul

    I’ve been using Firefox OS daily for a couple of years now (wow, time flies!). While performance has steadily improved with efforts like Project Silk, I’ve often noticed delays in the user interface. I assumed the delays were because the hardware was well below the “flagship” hardware I’ve become accustomed to with Android and iOS devices.

    Last year, I built Firefox OS for a Nexus 4 and started using that as my daily phone. Quickly I realized that even with better hardware, I sometimes had to wait on Firefox OS for basic interactions, even when the task wasn’t computationally intensive. I moved on to a Nexus 5 and then a Sony Z3 Compact, both with better specs than the Nexus 4, and experienced the same thing.

    Time passed. Frustration grew. Whispers of a nameless fear…

    Running the numbers

    While reading Ralph Thomas’s post about creating animations based on physical models, I wondered about the implementation of animations in Firefox OS, and how that might be involved in this problem. I performed an audit of the number of instances of different animations, grouped by their duration. I removed progress indicators and things like the boot shutdown animation. Here are the animation and transition durations in Firefox OS, grouped by duration, for transitional interactions like scaling, opening, closing and sliding:

    • 0.1s: 15
    • 0.2s: 57
    • 0.3s: 79
    • 0.4s: 40
    • 0.5s: 78
    • 0.6s: 8

    A couple of things stand out. First, we have a pretty wide distribution of animation durations. Second, the vast majority of the animations are more than 300ms long!

    In fact, in more than 80 animations we are making the user wait more than half a second. These slow animations are dragging us down, resulting in a poorer overall experience of Firefox OS.

    How did we get here?

    The Firefox OS UX and interaction designers didn’t huddle in a room and design each interaction to be intentionally slow. The engineers who implemented these animations didn’t ever think to themselves “this feels really responsive… let’s make it slower!”

    My theory is that interactions like these don’t feel slow while you’re designing and implementing them, because you’re working with a single interaction at a time. When designing and developing an animation, I look for fluidity of motion, the aesthetics of that single action and how the visual impact enhances the task at hand, and then I iterate on duration and effects until it feels right.

    We do have guidelines for responsiveness and user-perceived performance in Firefox OS, written up by Gordon Brander, which you can see in the screenshot below. (Click the image for a larger, more readable version.) However, those guidelines don’t cover the sub-second period between the initial perception of cause and effect and the next actionable state of the user interface.

    Screenshot 2015-04-18 09.38.10

    Users have an entirely different experience than we do as developers and designers. Users make their way through our animations while hurriedly sending a text message, trying to capture that perfect moment on camera, entering their username and password, or arduously uploading a bunch of images one at a time. People are trying to get from point A to point B. They want to complete a task… well, actually not just one: Smartphone users are trying to complete 221 tasks every day, according to a study in the UK last October by Tecmark. All those animations add up! I assert that the aggregate of those 203 animations in Gaia that are 300ms and longer contributes to the frustrating feeling of slowness I was experiencing before digging into this.

    Making it feel fast

    So I tested this theory, by changing all animation durations in Gaia to 200ms, as a starting point. The result? Firefox OS feels far more responsive. Moving through tasks and navigating around the OS felt quick but not abrupt. The camera snaps to readiness. Texting feels so much more fluid and snappy. Apps pop up, instead of slowly hauling their creaky bones out of bed. The Rocketbar gets closer to living up to its name (though I still think the keyboard should animate up while the bar becomes active).

    Here’s a demo of some of our animations side by side, before and after this patch:

    There are a couple of things we can do about this in Gaia:

    1. I filed a bug to get this change landed in Gaia. The 200ms duration is a first stab at this until we can do further testing. Better to err on the snappy side instead of the sluggish side. We’ve got the thumbs-up from most of the 16 developers who had to review the changes, and are now working with the UX team to sign off before it can land. Kevin Grandon helped by adding a CSS variable that we can use across all of Gaia, which will make it easier to implement these types of changes OS-wide in the future as we learn more.
    2. I’m working with the Firefox OS UX team to define global and consistent best-practices for animations. These guidelines will not be correct 100% of the time, but can be a starting point when implementing new animations, ensuring that the defaults are based on research and experience.
    If you are a Firefox OS user, report bugs if you experience anything that feels slow. By reporting a bug, you can make change happen and help improve the user experience for everyone on Firefox OS.

    If you are a developer or designer, what are your animation best-practices? What user feedback have you received on the animations in your Web projects? Let us know in the comments below!

  6. Optimizing your JavaScript game for Firefox OS

    When developing on a quad core processor with 16 gigabytes of RAM you can easily forget to consider how it will perform on a mobile device. This article will detail some best practices and things to consider for moving a game to Firefox OS or any similar hardware target.

    Making the best of 256 Mb RAM/800 Mhz CPU

    There are many areas of focus to keep in mind while developing a game. When your goal is to draw 60 times a second, garbage collection and inefficient drawing calls start to get in your way. Let’s start with the basics…

    Don’t over-optimize

    This might sound counter-intuitive in an article about game optimization but optimization is the last step; performed on complete, working code. While it’s never a bad idea to keep these tips and tricks in mind, you don’t know whether you’ll need them until you’ve finished the game and played it on a device.

    Optimize Drawing

    Drawing on HTML5 2D canvas is the biggest bottleneck in most JavaScript games, as all other updates are usually just algebra without touching the DOM. Canvas operations are hardware accelerated, which can give you some extra room to breath.

    Use whole-pixel rendering

    Sub-pixel rendering occurs when you render objects on a canvas without whole values.

    ctx.drawImage(myImage, 0.3, 0.5)

    This causes the browser to do extra calculations to create the anti-aliasing effect. To avoid this, make sure to round all co-ordinates used in calls to drawImage using Math.floor or as you’ll reader further in the article, bitwse operators.

    jsPerf – drawImage whole pixels.

    Cache drawing in an offscreen canvas

    If you find yourself with complex drawing operations on each frame, consider creating an offscreen canvas, draw to it once (or whenever it changes) on the offscreen canvas, then on each frame draw the offscreen canvas.

    myEntity.offscreenCanvas = document.createElement(“canvas”);
    myEntity.offscreenCanvas.width = myEntity.width;
    myEntity.offscreenCanvas.height = myEntity.height;
    myEntity.offscreenContext = myEntity.offscreenCanvas.getContext(“2d”);
     
    myEntity.render(myEntity.offscreenContext);

    Use moz-opaque on the canvas tag (Firefox Only)

    If your game uses canvas and doesn’t need to be transparent, set the moz-opaque attribute on the canvas tag. This information can be used internally to optimize rendering.

    <canvas id="mycanvas" moz-opaque></canvas>

    Described more in Bug 430906 – Add moz-opaque attribute on canvas.

    Scaling canvas using CSS3 transform

    CSS3 transforms are faster by using the GPU. Best case is to not scale the canvas or have a smaller canvas and scale up rather than a bigger canvas and scale down. For Firefox OS, target 480 x 320 px.

    var scaleX = canvas.width / window.innerWidth;
    var scaleY = canvas.height / window.innerHeight;
     
    var scaleToFit = Math.min(scaleX, scaleY);
    var scaleToCover = Math.max(scaleX, scaleY);
     
    stage.style.transformOrigin = "0 0"; //scale from top left
    stage.style.transform = "scale(" + scaleToFit + ")";

    See it working in this jsFiddle.

    Nearest-neighbour rendering for scaling pixel-art

    Leading on from the last point, if your game is themed with pixel-art, you should use one of the following techniques when scaling the canvas. The default resizing algorithm creates a blurry effect and ruins the beautiful pixels.

    canvas {
      image-rendering: crisp-edges;
      image-rendering: -moz-crisp-edges;
      image-rendering: -webkit-optimize-contrast;
      -ms-interpolation-mode: nearest-neighbor;
    }

    or

    var context = canvas.getContext(‘2d’);
    context.webkitImageSmoothingEnabled = false;
    context.mozImageSmoothingEnabled = false;
    context.imageSmoothingEnabled = false;

    More documentation is available on MDN for image-rendering.

    CSS for large background images

    If like most games you have a static background image, use a plain DIV element with a CSS background property and position it under the canvas. This will avoid drawing a large image to the canvas on every tick.

    Multiple canvases for layers

    Similar to the last point, you may find you have some elements that are frequently changing and moving around whereas other things (like UI) never change. An optimization in this situation is to create layers using multiple canvas elements.

    For example you could create a UI layer that sits on top of everything and is only drawn during user input. You could create game layer where the frequently updating entities exist and a background layer for entities that rarely update.


    Don’t scale images in drawImage

    Cache various sizes of your images on an offscreen canvas when loading as opposed to constantly scaling them in drawImage.

    jsPerf – Canvas drawImage Scaling Performance.

    Be careful with heavy physics libraries

    If possible, roll your own physics as libraries like Box2D don’t perform well on low-end Firefox OS devices.

    When asm.js support lands in Firefox OS, Emscripten-compiled libraries can take advantage of near-native performance. More reading in Box2d Revisited.

    Use WebGL instead of Context 2D

    Easier said than done but giving all the heavy graphics lifting to the GPU will free up the CPU for greater good. Even though WebGL is 3D, you can use it to draw 2D surfaces. There are some libraries out there that aim to abstract the drawing contexts.

    Minimize Garbage Collection

    JavaScript can spoil us when it comes to memory management. We generally don’t need to worry about memory leaks or conservatively allocating memory. But if we’ve allocated too much and garbage collection occurs in the middle of a frame, that can take up valuable time and result in a visible drop in FPS.

    Pool common objects and classes

    To minimise the amount of objects being cleaned during garbage collection, use a pre-initialised pool of objects and reuse them rather than creating new objects all the time.

    Code example of generic object pool:


    Avoid internal methods creating garbage

    There are various JavaScript methods that create new objects rather than modifying the existing one. This includes: Array.slice, Array.splice, Function.bind.

    Read more about JavaScript garbage collection

    Avoid frequent calls to localStorage

    LocalStorage uses file IO and blocks the main thread to retrieve and save data. Use an in-memory object to cache the values of localStorage and even save writes for when the player is not mid-game.

    Code example of an abstract storage object:


    Async localStorage API with IndexedDB

    IndexedDB is a non-blocking API for storing data on the client but may be overkill for small and simple data. Gaia’s library to make localStorage API asynchronous, using IndexedDB is available on Github: async_storage.js.

    Miscellaneous micro-optimization

    Sometimes when you’ve exhausted all your options and it just won’t go any faster, you can try some micro-optimizations below. However do note that these only start to make a difference in heavy usage when every millisecond counts. Look for them in your hot game loops.

    Use x | 0 instead of Math.floor
    Clear arrays with .length = 0 to avoid creating a new Array
    Sacrifice some CPU time to avoid creating garbage.
    Use if .. else over switch
    jsPerf – switch vs if-else
    Date.now() over (+ new Date)
    jsPerf – Date.now vs new Date().getTime() vs +new Date
    Or performance.now() for a sub-millisecond solution
    Use TypedArrays for floats or integers (e.g. vectors and matrices)
    gl-matrix – Javascript Matrix and Vector library for High Performance WebGL apps

    Conclusion

    Building for mobile devices and not-so-strong hardware is a good and creative exercise, and we hope you will want to make sure your games work well on all platforms!

  7. An easier way of using polyfills

    Polyfills are a fantastic way to enable the use of modern code even while supporting legacy browsers, but currently using polyfills is too hard, so at the FT we’ve built a new service to make it easier. We’d like to invite you to use it, and help us improve it.

    Image from https://www.flickr.com/photos/hamur0w0/6984884135

    More pictures, they said. So here’s a unicorn, which is basically a horse with a polyfill.

    The challenge

    Here are some of the issues we are trying to solve:

    • Developers do not necessarily know which features need to be polyfilled. You load your site in some old version of IE beloved by a frustratingly large number of your users, see that the site doesn’t work, and have to debug it to figure out which feature is causing the problem. Sometimes the culprit is obvious, but often not, especially when legacy browsers also lack good developer tools.
    • There are often multiple polyfills available for each feature. It can be hard to know which one most faithfully emulates the missing feature.
    • Some polyfills come as a big bundle with lots of other polyfills that you don’t need, to provide comprehensive coverage of a large feature set, such as ES6. It should not be necessary to ship all of this code to the browser to fix something very simple.
    • Newer browsers don’t need the polyfill, but typically the polyfill is served to all browsers. This reduces performance in modern browsers in order to improve compatibility with legacy ones. We don’t want to make that compromise. We’d rather serve polyfills only to browsers that lack a native implementation of the feature.

    Our solution: polyfills as a service

    To solve these problems, we created the polyfill service. It’s a similar idea to going to an optometrist, having your eyes tested, and getting a pair of glasses perfectly designed to correct your particular vision problem. We are doing the same for browsers. Here’s how it works:

    1. Developers insert a script tag into their page, which loads the polyfill service endpoint.
    2. The service analyses the browser’s user-agent header and a list of requested features (or uses a default list of everything polyfillable) and builds a list of polyfills that are required for this browser
    3. The polyfills are ordered using a graph sort to place them in the right dependency order.
    4. The bundle is minified and served through a CDN (for which we’re very grateful to Fastly for their support)

    Do we really need this solution? Well, consider this: Modernizr is a big grab bag of feature detects, and all sensible use cases benefit from a custom build, but a large proportion of Modernizr users just use the default build, often from cdnjs.com or as part of html5boilerplate. Why include Modernizr if you aren’t using its feature detects? Maybe you misunderstand the purpose of the library and just think that Modernizr “fixes stuff”? I have to admit, I did, when I first heard the name, and I was mildly disappointed to find that rather than doing any actual modernising, Modernizr actually just defines modernness.

    The polyfill service, on the other hand, does fix stuff. There’s really nothing wrong with not wanting to spend time gaining intimate knowledge of all the foibles of legacy browsers. Let someone figure it out once, and then we can all benefit from it without needing or wanting to understand the details.

    How to use it

    The simplest use case is:

    <script src="//cdn.polyfill.io/v1/polyfill.min.js" async defer></script>

    This includes our default polyfill set. The default set is a manually curated list of features that we think are most essential to modern web development, and where the polyfills are reasonably small and highly accurate. If you want to specify which features you want to polyfill though, go right ahead:

    <!-- Just the Array.from polyfill -->
    <script src="//cdn.polyfill.io/v1/polyfill.min.js?features=Array.from" async defer></script>
     
    <!-- The default set, plus the geolocation polyfill -->
    <script src="//cdn.polyfill.io/v1/polyfill.min.js?features=default,Navigator.prototype.geolocation" async defer></script>

    If it’s important that you have loaded the polyfills before parsing your own code, you can remove the async and defer attributes, or use a script loader (one that doesn’t require any polyfills!).

    Testing and documenting feature support

    This table shows the polyfill service’s effect for a number of key web technologies and a range of popular browsers:

    Polyfill service support grid

    The full list of features we support is shown on our feature matrix. To build this grid we use Sauce Labs’ test automation platform, which runs each polyfill through a barrage of tests in each browser, and documents the results.

    So, er, user-agent sniffing? Really?

    Yes. There are several reasons why UA analysis wins out over feature detection for us:

    • In some cases, we have multiple polyfills for the same feature, because some browsers offer a non-compliant implementation that just needs to be bashed into shape, while others lack any implementation at all. With UA detection you can choose to serve the right variant of the polyfill.
    • With UA detection, the first HTTP request can respond directly with polyfill code. If we used feature detection, the first request would serve feature-detect code, and then a second one would be needed to fetch specific polyfills.

    Almost all websites with significant scale do UA detection. This isn’t to say the stigma attached to it is necessarily bad. It’s easy to write bad UA detect rules, and hard to write good ones. And we’re not ruling out making a way of using the service via feature-detects (in fact there’s an issue in our tracker for it).

    A service for everyone

    The service part of the app is maintained by the FT, and we are working on expanding and improving the tools, documentation, testing and service features all the time. The source is freely available on GitHub so you can easily host it yourself, but we also host an instance of the service on cdn.polyfill.io which you can use for free, and our friends at Fastly are providing free CDN distribution and SSL.

    We’ve made a platform. We need the community’s help to populate it. We already serve some of the best polyfills from Jonathan Neal, Mathias Bynens and others, but we’d love to be more comprehensive. Bring your polyfills, improve our tests, and make this a resource that can help move the web forward!

  8. No Single Benchmark for the Web

    Google released a new JavaScript benchmark a few days ago called Octane. New benchmarks are always welcome, as they push browsers to new levels of performance in new areas. I was particularly pleased to see the inclusion of pdf.js, which unlike most benchmarks is real-world code, as well as the GB Emulator which is a very interesting type of performance-intensive code. However, every benchmark suite has limitations, and it is worth keeping that in mind, especially given the new benchmark’s title in the announcement and in the project page as “The JavaScript Benchmark Suite for the Modern Web” – which is a high goal to set for a single benchmark.

    Now, every benchmark must pick some code to run out of all the possible code out there, and picking representative code is very hard. So it is always understandable that benchmarks are never 100% representative of the code that exists and is important. However, even taking that into account, I have concerns with some of the code selected to appear in Octane: There are better versions of two of the five new benchmarks, and performance on those better versions is very different than the versions that do appear in Octane.

    Benchmarking black boxes

    One of the new benchmarks in Octane is “Mandreel”, which is the Bullet physics engine compiled by Mandreel, a C++ to JS compiler. Bullet is definitely interesting code to include in a benchmark. However the choice of Mandreel’s port is problematic. One issue is that Mandreel is a closed-source compiler, a black box, making it hard to learn from it what kind of code is efficient and what should be optimized. We just have a generated code dump, which, as a commercial product, would cost money for anyone to reproduce those results with modifications to the original C++ being run or a different codebase. We also do not have the source code compiled for this particular benchmark: Bullet itself is open source, but we don’t know the specific version compiled here, nor do we have the benchmark driver code that uses Bullet, both of which would be necessary to reproduce these results using another compiler.

    An alternative could have been to use Bullet compiled by Emscripten, an open source compiler that similarly compiles C++ to JS (disclaimer: I am an Emscripten dev). Aside from being open, Emscripten also has a port of Bullet (a demo can be seen here) that can interact in a natural way with regular JS, making it usable in normal web games and not just compiled ones, unlike Mandreel’s port. This is another reason for preferring the Emscripten port of Bullet instead.

    Is Mandreel representative of the web?

    The motivation Google gives for including Mandreel in Octane is that Mandreel is “used in countless web-based games.” It seems that Mandreel is primarily used in the Chrome Web Store (CWS) and less outside in the normal web. The quoted description above is technically accurate: Mandreel games in the CWS are indeed “web-based” (written in JS+HTML+WebGL) even if they are not actually “on the web”, where by “on the web” I mean outside of the walled garden of the CWS and in the normal web that all browsers can access. And it makes perfect sense that Google cares about the performance of code that runs in the CWS, since Google runs and profits from that store. But it does call into question the title of the Octane benchmark as “The JavaScript Benchmark Suite for the Modern Web.”

    Performance of generated code is highly variable

    With that said, it is still fair to say that compiler-generated code is increasing in importance on the web, so some benchmark must be chosen to represent it. The question is how much the specific benchmark chosen represents compiled code in general. On the one hand the compiled output of Mandreel and Emscripten is quite similar: both use large typed arrays, the same Relooper algorithm, etc., so we could expect performance to be similar. That doesn’t seem to always be the case, though. When we compare Bullet compiled by Mandreel with Bullet compiled by Emscripten – I made a benchmark of that a while back, it’s available here – then on my MacBook pro, Chrome is 1.5x slower than Firefox on the Emscripten version (that is, Chrome takes 1.5 times as long to execute in this case), but 1.5x faster on the Mandreel version that Google chose to include in Octane (that is, Chrome receives a score 1.5 times larger in this case). (I tested with Chrome Dev, which is the latest version available on Linux, and Firefox Aurora which is the best parallel to it. If you run the tests yourself, note that in the Emscripten version smaller numbers are better while the opposite is true in the Octane version.)

    (An aside, not only does Chrome have trouble running the Emscripten version quickly, but that benchmark also exposes a bug in Chrome where the tab consistently crashes when the benchmark is reloaded – possibly a dupe of this open issue. A serious problem of that nature, that does not happen on the Mandreel-compiled version, could indicate that the two were optimized differently as a result of having received different amounts of focus by developers.)

    Another issue with the Mandreel benchmark is the name. Calling it Mandreel implies it represents all Mandreel-generated code, but there can be huge differences in performance depending on what C/C++ code is compiled, even with a single compiler. For example, Chrome can be 10-15x slower than Firefox on some Emscripten-compiled benchmarks (example 1, example 2) while on others it is quite speedy (example). So calling the benchmark “Mandreel-Bullet” would have been better, to indicate it is just one Mandreel-compiled codebase, which cannot represent all compiled code.

    Box2DWeb is not the best port of Box2D

    “Box2DWeb” is another new benchmark in Octane, in which a specific port of Box2D to JavaScript is run, namely Box2DWeb. However, as seen here (see also this), Box2DWeb is significantly slower than other ports of Box2D to the web, specifically Mandreel and Emscripten’s ports from the original C++ that Box2D is written in. Now, you can justify excluding the Mandreel version because it cannot be used as a library from normal JS (just as with Bullet before), but the Emscripten-compiled one does not have that limitation and can be found here. (Demos can be seen here and here.)

    Another reason for preferring the Emscripten version is that it uses Box2D 2.2, whereas Box2DWeb uses the older Box2D 2.1. Compiling the C++ code directly lets the Emscripten port stay up to date with the latest upstream features and improvements far more easily.

    It is possible that Google surveyed websites and found that the slower Box2DWeb was more popular, although I have no idea whether that was the case, but if so that would partially justify preferring the slower version. However, even if that were true, I would argue that it would be better to use the Emscripten version because as mentioned earlier it is faster and more up to date. Another factor to consider is that the version included in Octane will get attention and likely an increase in adoption, which makes it all the more important to select the one that is best for the web.

    I put up a benchmark of Emscripten-compiled Box2D here, and on my machine Chrome is 3x slower than Firefox on that benchmark, but 1.6x faster on the version Google chose to include in Octane. This is a similar situation to what we saw earlier with the Mandreel/Bullet benchmark and it raises the same questions about how representative a single benchmark can be.

    Summary

    As mentioned at the beginning, all benchmarks are imperfect. And the fact that the specific code samples in Octane are ones that Chrome runs well does not mean the code was chosen for that reason: The opposite causation is far more likely, that Google chose to focus on optimizing those and in time made Chrome fast on them. And that is how things properly work – you pick something to optimize for, and then optimize for it.

    However, in 2 of the 5 new benchmarks in Octane there are good reasons for preferring alternative, better versions of those two benchmarks as we saw before. Now, it is possible that when Google started to optimize for Octane, the better options were not yet available – I don’t know when Google started that effort – but the fact that better alternatives exist in the present makes substantial parts of Octane appear less relevant today. Of course, if performance on the better versions was not much different than the Octane versions then this would not matter, but as we saw there were in fact significant differences when comparing browsers on those versions: One browser could be significantly better on one version of the same benchmark but significantly slower on another.

    What all of this shows is that there cannot be a single benchmark for the modern web. There are simply too many kinds of code, and even when we focus on one of them, different benchmarks of that particular task can behave very differently.

    With that said, we shouldn’t be overly skeptical: Benchmarks are useful. We need benchmarks to drive us forward, and Octane is an interesting new benchmark that, even with the problems mentioned above, does contain good ideas and is worth focusing on. But we should always be aware of the limitations of any single benchmark, especially when a single benchmark claims to represent the entire modern web.

     

  9. Mozilla developer preview (Gecko 1.9.3a1) available for download

    Editor’s note: Today, Mozilla released a preview of the Gecko 1.9.3 platform for developers and testers. Check out the Mozilla Developer News announcement reposted below.

    A Mozilla Developer Preview of improvements in the Gecko layout engine is now available for download. This is a pre-release version of the Gecko 1.9.3 platform, which forms the core of rich Internet applications such as Firefox. Please note that this release is intended for developers and testers only. As always, we appreciate any feedback you may have and encourage users to help us by filing bugs.

    This developer preview introduces several new features, including:

    and several other significant changes, including:

    • On Mac OS X, we render text using Core Text rather than ATSUI.
    • We rewrote major parts of the code for handling scrolling. See bug 526394 for details.
    • We rewrote the way a snapshot of a document is taken in order to print or print preview. See bug 487667 for details.
    • We made significant changes to table border handling. See bug 452319 and bug 43178 for details.
    • We made various architectural changes to improve Web page performance.

    More information on these changes is in the release notes, as well as the Upcoming Firefox features for developers article on the Mozilla Developer Center.

    Please use the following links when downloading this Mozilla Developer Preview:

  10. mozilla developer preview 4 ready for testing

    Note: this is a re-post of the entry in the Mozilla Project Development Weblog. There’s some juicy stuff in here for Web Developers that need testing. In particular, this is the first build with the CSS history changes.

    As part of our ongoing platform development work, we’re happy to announce the fourth pre-release of the Gecko 1.9.3 platform. Gecko 1.9.3 will form the core of Firefox and other Mozilla project releases.

    It’s available for download on Mac, Windows or Linux.

    Mozilla expects to release a Developer Preview every 2-3 weeks. If you’ve been running a previous release, you will be automatically updated to the latest version when it is released.

    This preview release contains a lot of interesting stuff that’s worth pointing out, and contains many things that were also in previous releases. Here are the things of note in this release:

    User Interface Changes

    • Open tabs that match searches in the Awesomebar now show up as “Switch to Tab.”
    • This is the first preview release to contain resizable text areas by default.

    Web Developer Changes

    • This is the first preview release to contain changes to CSS :visited that prevent a large class of history sniffing attacks. You can find more information about the details of why this change is important over on the hacks post on the topic and on the Mozilla Security Weblog. Note that this change is likely to break some web sites and requires early testing – please test if you can.
    • SVG Attributes which are mapped to CSS properties can now be animated with SMIL. See the bug or a demo.

    Plugins

    • Out of process plugins support for Windows and Linux continues to improve. This release contains many bug fixes vs. our previous developer preview releases. (In fact, it’s good enough that we’ve ported this code back to the 3.6 branch and have pushed that to beta for a later 3.6.x release.)
    • This is the first release that contains support for out of process plugins for the Mac. If you are running OSX 10.6 and you’re running the latest Flash beta, Flash should run out of process

    Performance

    • One area where people complained about performance was restart performance when applying an update. It turns out that a lot of what made that experience poor wasn’t startup time, it was browser shutdown time. We’ve made a fix since the last preview release that made a whopping 97% improvement in shutdown time. (That’s not a typo, it’s basically free now.)
    • Our work to reduce the amount of I/O on the main thread continues unabated. This preview release will feel much snappier than previous snapshots, and feel much faster than Firefox 3.6.
    • We continue to add hardware acceleration support. If you’re on Windows and you’ve got decent OpenGL 2 drivers, open video will use hardware to scale the video when you’re in full screen mode. For large HD videos this can make a huge difference in the smoothness of the experience and how much power + CPU are used. We’ll be adding OSX and Linux support at some point in the future as well, but we’re starting with Windows.
    • We continue to make improvements and bug fixes to our support for Direct2D. (Not enabled by default. If you want to turn it on see Bas’ post.) If you’re running Alpha 4 on Windows Vista or Windows 7, and you’ve turned on D2D, try running this stress test example in Alpha 4 vs. Firefox 3.6. The difference is pretty amazing. You can also see what this looks like compared to other browsers in this this video. (Thanks to Hans Schmucker for the video and demo.)

    Platform

    • JS-ctypes, our new easy-to-use system for extension authors who want to call into native code now has support for complex types: structures, pointers, and arrays. For more information on this, and how easy it can make calling into native code from JavaScript, see Dan Witte’s post.
    • Mozilla is now sporting an infallible allocator. What is this odd-sounding thing, you ask? It’s basically an allocator that when memory can’t be allocated it aborts instead of returning NULL. This reduces the surface area for an entire class of security bugs related to checking NULL pointers, and also allows us to vastly simplify a huge amount of Gecko’s source code.