Mozilla

Performance Articles

Sort by:

View:

  1. Optimising SVG images

    SVG is a vector image format based on XML. It has great advantages, most notably it is lightweight. Since SVG is a text format, it can be viewed and modified using a simple text editor, and applying GZIP compression produces excellent results.

    It’s critical for a website to provide assets that are as lightweight as possible, especially on mobile where bandwidth can be very limited. You want to optimise your SVG files to have your app load and display as quickly as possible.

    This article will show how to use dedicated tools to optimise SVG images. You will also learn how the markup works so you can go the extra mile to produce the lightest possible images.

    Introducing svgo

    Optimising SVG is very similar to minifying CSS or other text-based formats such as JavaScript or HTML. It is mainly about removing useless whitespace and redundant characters.

    The tool I recommend to reduce the size of SVG images is svgo. It is written for node.js. To install it, just do:

    $ npm install -g svgo

    In its basic form, you’ll use a command line like this:

    $ svgo --input img/graph.svg --output img/optimised-graph.svg

    Please make sure to specify an --output parameter if you want to keep the original image. Otherwise svgo will replace it with the optimised version.

    svgo will apply several changes to the original file—stripping out useless comments, tags, and attributes, reducing the precision of numbers in path definitions, or sorting attributes for better GZIP compression.

    This works with no surprises for simple images. However, in more complex cases, the image manipulation can result in a garbled file.

    svgo plugins

    svgo is very modular thanks to a plugin-based architecture.

    When optimising complex images, I’ve noticed that the main issues are caused by two svgo plugins:

    • convertPathData
    • mergePaths

    Deactivating these will ensure you get a correct result in most cases:

    $ svgo --disable=convertPathData --disable=mergePaths -i img/a.svg

    convertPathData will convert the path data using relative and shorthand notations. Unfortunately, some environments won’t fully recognise this syntax and you’ll get something like:

    Screenshot of Gnome Image Viewer displaying an original SVG image (left) and a version optimised via svgo on (right)

    Screenshot of Gnome Image Viewer displaying an original SVG image (left) and a version optimised via svgo on (right)



    Please note that the optimised image will display correctly in all browsers. So you may still want to use this plugin.

    The other plugin that can cause you trouble—mergePaths—will merge together shapes of the same style to reduce the number of <path> tags in the source. However, this might create issues if two paths overlap.

    Merge paths issue

    In the image on the right, please note the rendering differences around the character’s neck and hand, also note the Twitter logo. The outline view shows 3 overlapping paths that make up the character’s head.

    My suggestion is to first try svgo with all plugins activated, then if anything is wrong, deactivate the two mentioned above.

    If the result is still very different from your original image, then you’ll have to deactivate the plugins one by one to detect the one which causes the issue. Here is a list of svgo plugins.

    Optimising even further

    svgo is a great tool, but in some specific cases, you’ll want to compress your SVG images even further. To do so, you have to dig into the file format and do some manual optimisations.

    In these cases, my favourite tool is Inkscape: it is free, open source and available on most platforms.

    If you want to use the mergePaths plugin of svgo, you must combine overlapping paths yourself. Here’s how to do it:

    Open your image in Inkscape and identify the path with the same style (fill and stroke). Select them all (maintain shift pressed for multiple selection). Click on the Path menu and select Union. You’re done—all three paths have been merged into a single one.

    Merge paths technique

    The 3 different paths that create the character’s head are merged, as shown by the outline view on the right.



    Repeat this operation for all paths of the same style that are overlapping and then you’re ready to use svgo again, keeping the mergePaths plugin.

    There are all sorts of different optimisations you can apply manually:

    • Convert strokes to paths so they can be merged with paths of similar style.
    • Cut paths manually to avoid using clip-path.
    • Exclude an underneath path from a overlapping path and merge with a similar path to avoid layer issues. (In the image above, see the character’s hair—the side hair path is under his head, but the top hair is above it—so you can’t merge the 3 hair paths as is.)

    Final considerations

    These manual optimisations can take a lot of time for meagre results, so think twice before starting!

    A good rule of thumb when optimising SVG images is to make sure the final file has only one path per style (same fill and stroke style) and uses no <g> tags to group path to objects.

    In Firefox OS, we use an icon font, gaia-icons, generated from SVG glyphs. I noticed that optimising them resulted in a significantly lighter font file, with no visual differences.

    Whether you use SVG for embedding images on an app or to create a font file, always remember to optimise. It will make your users happier!

  2. Project Silk

    Editor’s Note: An earlier version of this post appeared on Mason Chang’s personal blog.

    For the past few months, I’ve been working on Project Silk which improves smoothness across the browser. Very much like Project Butter for Android, part of it is finally live on Firefox OS. Silk does three things:

    1. Align Painting with hardware vsync
    2. Resample touch input events based on hardware vsync
    3. Align composites with hardware vsync

    What is vsync, why vsync, and why does it matter at all?

    Vertical synchronization (vsync) occurs when the hardware display shows a new frame on the screen. This rate is set by specific hardware, but major displays in the US occur at a rate of 60 times a second, or every 16.6 ms (milliseconds). This is where you hear about 60 frames per second, one frame each time the hardware display refreshes. What this means in reality is that no matter how many frames are produced in software, the hardware display will still only show at most 60 unique frames per second.

    Currently in Firefox, we mimic 60 frames per second and therefore vsync with a software timer that schedules rendering every 16.6 ms. However, the software scheduler has two problems: (a) it’s noisy and (b) it can be scheduled at bad times relative to vsync.

    In regards to noise, software timers are much noisier than hardware timers. This creates micro-jank for a number of reasons. First, many animations are keyed off timestamps that are generated by the software scheduler to update the position of the animation. If you’ve ever used requestAnimationFrame, you get a timestamp from a software timer. If you want smooth animations, the timestamp provided to requestAnimationFrame should be uniform. Non-uniform timestamps will create non-uniform and janky animations. Here is a graph showing software versus hardware vsync timer uniformity:

    timer

    Wow! Big improvement with a hardware timer. We get a much more uniform, and therefore smoother, timestamp to key animations off of. So that addresses problem (a), noisy timers in software versus hardware.

    With part (b), software timers can be scheduled at bad times relative to vsync. Regardless of what the software does, the hardware display will refresh on its own clock. If our rendering pipeline finishes producing a frame before the next vsync, the display is updated with new content. If we fail to finish producing a frame before the next vsync, the previous frame will be displayed, causing jankiness. Some rendering functions can occur close to vsync and overflow until the next interval. Thus, we actually introduce more potential latency since the frame won’t be displayed on the screen anyway until the next vsync. Let’s look at this in graphic form:

    frames

    At time 0, we start producing frames. For example, let’s say all frames take a constant time of 10 ms. Our frame budget is 16.6 ms because we only have to finish producing a frame before the next hardware vsync occurs. Since frame 1 is finished 6 ms before the next vsync (time t=16 ms), everything is successful and life is good. The frame is produced in time and the hardware display will be refreshed with the updated content.

    Now let’s look at Frame 2. Since software timers are noisy, we start producing a frame 9 ms from the next vsync (time t=32). Since our frame takes 10 ms to produce, we actually finish producing this frame at 1 ms AFTER the next vsync. That means at vsync number 2 (t=32), there is no new frame to display, so the display still shows the previous frame. In addition, the frame just produced won’t be shown until vsync 3 (t=48), because that’s when the hardware updates itself. This creates jank since now the display will have skipped one frame and will try to catch up in the upcoming frames. This also produces one extra frame of latency, which is terrible for games.

    Vsync addresses both of these problems since we get a much more uniform timer and the maximum amount of frame budget time to produce a new frame. Now that we know what vsync is, we can finally go on to what Project Silk is and how it helps create smooth experiences in Firefox.

    The Rendering Pipeline

    In super simplified terms, Gecko’s rendering pipeline does three things:

    1. Paint / draw the new frame on the main thread.
    2. Send the updated content to the Compositor via a LayerTransaction.
    3. Composite the new content.

    In an ideal world, we’d be able to do all three steps within 16.6 ms, but that’s not the case most of the time. Both steps (1) and (3) occur on independent software timers. Thus, there is no real synchronizing clock between the three steps, they are all ad hoc. They also have no relation to vsync, so the timing of the pipeline isn’t related to when the display actually updates the screen with content. With Silk, we replace both independent software timers with the hardware vsync timer. For our purposes, (2) doesn’t really affect the outcome, but is presented here for completeness.

    Align Painting with Hardware Vsync

    Aligning the timer used to tick the refresh driver with vsync creates smoothness in a couple of ways. First, many animations are still done on the main thread, which means any animation using timestamps to set the position of an animation should be smoother. This includes requestAnimationFrame animations! The other nice thing is that we now have a very strict ordering of when rendering is kicked off. Instead of (1) and (3), which occur at separate synched offsets, we start rendering at a specific time.

    Resample Touch Input Events Based on Vsync

    With Silk, we can enable touch resampling, which improves smoothness while tracking your finger. Since I’ve already blogged about touch resampling quite a bit, I’ll keep this short. With Silk, we can finally enable it!

    Align Composites with Hardware Vsync

    Finally, the last part of Silk is about aligning composites with hardware vsync. Compositing takes all the painted content and merges it together to create the single image you see on the display. With Silk, all composites start right after a hardware vsync occurs. This has actually produced a rather nice side benefit — the reduced composite times seen here:

    compositeTimes

    Within the device driver on a Flame device, there’s a global lock that’s grabbed when close to vsync intervals. This lock can take 5-6 ms to get, greatly increasing the composite times. However, when we start a composite right after a vsync, there is little contention to grab the lock. Thus we can shave off the wait, therefore reducing composite times quite a bit. Not only do we get smoother animations, but also reduced composite times, and therefore better battery life. What a nice win!

    With all three pieces, we now have a nice strict ordering of the rendering pipeline. We paint and send the updated content to the Compositor within 16.6 ms. At the next vsync, we composite the updated content. At the vsync after that, the frame should have gone through the rendering pipeline and will be displayed on the screen. Keeping this order reduces jank because we reduce the chance that the timers will schedule each step at a bad time. In a best-case scenario in the current implementation without Silk, a frame could be painted and composited within a single 16.6 ms frame. This is great. However, if the next frame takes 2 frames instead, we’ve just created extra jank, even though no stage in the pipeline was really slow. Aligning the whole pipeline to create a strict sequence of events reduces the chance that we mis-schedule a frame.

    master

    Here’s a picture of the rendering pipeline without Silk. We have Composites (3) at the bottom of this profile. We have painting (1) in the middle, where you see Styles, Reflow, Displaylist, and Rasterize. We have Vsync, represented by those small orange boxes at the top. Finally we have Layer Transactions (2) at the bottom. At first, when we start, compositing and painting are not aligned, so animations are at different positions depending on whether they are on the main thread or the compositor thread. Second, we see long composites because the compositor is waiting on a global lock in the device driver. Lastly, it’s difficult to read any ordering or see if there is a problem without deep knowledge of why / when things should be happening.

    silk

    Here is a picture of the same pipeline with Silk. Composites are a little shorter, and the whole pipeline only starts at vsync intervals. Composite times are reduced because we start composites exactly at vsync intervals. There is a clear ordering of when things should happen. Both composites and painting are keyed off the same timestamp, ensuring smoother animations. Finally, there is a clear indicator that as long as everything finishes before the next Vsync, things will be smooth.

    Ultimately, Silk aims to create a smoother experience across Firefox and the Web. Numerous people contributed to the project. Thanks to Jerry Shih, Boris Chou, Jeff Hwang, Mike Lee, Kartikaya Gupta, Benoit Girard, Michael Wu, Ben Turner, and Milan Sreckovic for their help in making Silk happen.

  3. An easier way of using polyfills

    Polyfills are a fantastic way to enable the use of modern code even while supporting legacy browsers, but currently using polyfills is too hard, so at the FT we’ve built a new service to make it easier. We’d like to invite you to use it, and help us improve it.

    Image from https://www.flickr.com/photos/hamur0w0/6984884135

    More pictures, they said. So here’s a unicorn, which is basically a horse with a polyfill.

    The challenge

    Here are some of the issues we are trying to solve:

    • Developers do not necessarily know which features need to be polyfilled. You load your site in some old version of IE beloved by a frustratingly large number of your users, see that the site doesn’t work, and have to debug it to figure out which feature is causing the problem. Sometimes the culprit is obvious, but often not, especially when legacy browsers also lack good developer tools.
    • There are often multiple polyfills available for each feature. It can be hard to know which one most faithfully emulates the missing feature.
    • Some polyfills come as a big bundle with lots of other polyfills that you don’t need, to provide comprehensive coverage of a large feature set, such as ES6. It should not be necessary to ship all of this code to the browser to fix something very simple.
    • Newer browsers don’t need the polyfill, but typically the polyfill is served to all browsers. This reduces performance in modern browsers in order to improve compatibility with legacy ones. We don’t want to make that compromise. We’d rather serve polyfills only to browsers that lack a native implementation of the feature.

    Our solution: polyfills as a service

    To solve these problems, we created the polyfill service. It’s a similar idea to going to an optometrist, having your eyes tested, and getting a pair of glasses perfectly designed to correct your particular vision problem. We are doing the same for browsers. Here’s how it works:

    1. Developers insert a script tag into their page, which loads the polyfill service endpoint.
    2. The service analyses the browser’s user-agent header and a list of requested features (or uses a default list of everything polyfillable) and builds a list of polyfills that are required for this browser
    3. The polyfills are ordered using a graph sort to place them in the right dependency order.
    4. The bundle is minified and served through a CDN (for which we’re very grateful to Fastly for their support)

    Do we really need this solution? Well, consider this: Modernizr is a big grab bag of feature detects, and all sensible use cases benefit from a custom build, but a large proportion of Modernizr users just use the default build, often from cdnjs.com or as part of html5boilerplate. Why include Modernizr if you aren’t using its feature detects? Maybe you misunderstand the purpose of the library and just think that Modernizr “fixes stuff”? I have to admit, I did, when I first heard the name, and I was mildly disappointed to find that rather than doing any actual modernising, Modernizr actually just defines modernness.

    The polyfill service, on the other hand, does fix stuff. There’s really nothing wrong with not wanting to spend time gaining intimate knowledge of all the foibles of legacy browsers. Let someone figure it out once, and then we can all benefit from it without needing or wanting to understand the details.

    How to use it

    The simplest use case is:

    <script src="//cdn.polyfill.io/v1/polyfill.min.js" async defer></script>

    This includes our default polyfill set. The default set is a manually curated list of features that we think are most essential to modern web development, and where the polyfills are reasonably small and highly accurate. If you want to specify which features you want to polyfill though, go right ahead:

    <!-- Just the Array.from polyfill -->
    <script src="//cdn.polyfill.io/v1/polyfill.min.js?features=Array.from" async defer></script>
     
    <!-- The default set, plus the geolocation polyfill -->
    <script src="//cdn.polyfill.io/v1/polyfill.min.js?features=default,Navigator.prototype.geolocation" async defer></script>

    If it’s important that you have loaded the polyfills before parsing your own code, you can remove the async and defer attributes, or use a script loader (one that doesn’t require any polyfills!).

    Testing and documenting feature support

    This table shows the polyfill service’s effect for a number of key web technologies and a range of popular browsers:

    Polyfill service support grid

    The full list of features we support is shown on our feature matrix. To build this grid we use Sauce Labs’ test automation platform, which runs each polyfill through a barrage of tests in each browser, and documents the results.

    So, er, user-agent sniffing? Really?

    Yes. There are several reasons why UA analysis wins out over feature detection for us:

    • In some cases, we have multiple polyfills for the same feature, because some browsers offer a non-compliant implementation that just needs to be bashed into shape, while others lack any implementation at all. With UA detection you can choose to serve the right variant of the polyfill.
    • With UA detection, the first HTTP request can respond directly with polyfill code. If we used feature detection, the first request would serve feature-detect code, and then a second one would be needed to fetch specific polyfills.

    Almost all websites with significant scale do UA detection. This isn’t to say the stigma attached to it is necessarily bad. It’s easy to write bad UA detect rules, and hard to write good ones. And we’re not ruling out making a way of using the service via feature-detects (in fact there’s an issue in our tracker for it).

    A service for everyone

    The service part of the app is maintained by the FT, and we are working on expanding and improving the tools, documentation, testing and service features all the time. The source is freely available on GitHub so you can easily host it yourself, but we also host an instance of the service on cdn.polyfill.io which you can use for free, and our friends at Fastly are providing free CDN distribution and SSL.

    We’ve made a platform. We need the community’s help to populate it. We already serve some of the best polyfills from Jonathan Neal, Mathias Bynens and others, but we’d love to be more comprehensive. Bring your polyfills, improve our tests, and make this a resource that can help move the web forward!

  4. Generational Garbage Collection in Firefox

    Generational garbage collection (GGC) has now been enabled in the SpiderMonkey JavaScript engine in Firefox 32. GGC is a performance optimization only, and should have no observable effects on script behavior.

    So what is it? What does it do?

    GGC is a way for the JavaScript engine to collect short-lived objects faster. Say you have code similar to:

    function add(point1, point2) {
        return [ point1[0] + point2[0], point1[1] + point2[1] ];
    }

    Without GGC, you will have high overhead for garbage collection (from here on, just “GC”). Each call to add() creates a new Array, and it is likely that the old arrays that you passed in are now garbage. Before too long, enough garbage will pile up that the GC will need to kick in. That means the entire JavaScript heap (the set of all objects ever created) needs to be scanned to find the stuff that is still needed (“live”) so that everything else can be thrown away and the space reused for new objects.

    If your script does not keep very many total objects live, this is totally fine. Sure, you’ll be creating tons of garbage and collecting it constantly, but the scan of the live objects will be fast (since not much is live). However, if your script does create a large number of objects and keep them alive, then the full GC scans will be slow, and the performance of your script will be largely determined by the rate at which it produces temporary objects — even when the older objects aren’t changing, and you’re just re-scanning them over and over again to discover what you already knew. (“Are you dead?” “No.” “Are you dead?” “No.” “Are you dead?”…)

    Generational collector, Nursery & Tenured

    With a generational collector, the penalty for temporary objects is much lower. Most objects will be allocated into a separate memory region called the Nursery. When the Nursery fills up, only the Nursery will be scanned for live objects. The majority of the short-lived temporary objects will be dead, so this scan will be fast. The survivors will be promoted to the Tenured region.

    The Tenured heap will also accumulate garbage, but usually at a far lower rate than the Nursery. It will take much longer to fill up. Eventually, we will still need to do a full GC, but under typical allocation patterns these should be much less common than Nursery GCs. To distinguish the two cases, we refer to Nursery collections as minor GCs and full heap scans as major GCs. Thus, with a generational collector, we split our GCs into two types: mostly fast minor GCs, and fewer slower major GCs.

    GGC Overhead

    While it might seem like we should have always been doing this, it turns out to require quite a bit of infrastructure that we previously did not have, and it also incurs some overhead during normal operation. Consider the question of how to figure out whether some Nursery object is live. It might be pointed to by a live Tenured object — for example, if you create an object and store it into a property of a live Tenured object.

    How do you know which Nursery objects are being kept alive by Tenured objects? One alternative would be to scan the entire Tenured heap to find pointers into the Nursery, but this would defeat the whole point of GGC. So we need a way of answering the question more cheaply.

    Note that these Tenured ⇒ Nursery edges in the heap graph won’t last very long, because the next minor GC will promote all survivors in the Nursery to the Tenured heap. So we only care about the Tenured objects that have been modified since the last minor (or major) GC. That won’t be a huge number of objects, so we make the code that writes into Tenured objects check whether it is writing any Nursery pointers, and if so, record the cross-generational edges in a store buffer.

    In technical terms, this is known as a write barrier. Then, at minor GC time, we walk through the store buffer and mark every target Nursery object as being live. (We actually use the source of the edge at the same time, since we relocate the Nursery object into the Tenured area while marking it live, and thus the Tenured pointer into the Nursery needs to be updated.)

    With a store buffer, the time for a minor GC is dependent on the number of newly-created edges from the Tenured area to the Nursery, not just the number of live objects in the Nursery. Also, keeping track of the store buffer records (or even just the checks to see whether a store buffer record needs to be created) does slow down normal heap access a little, so some code patterns may actually run slower with GGC.

    Allocation Performance

    On the flip side, GGC can speed up object allocation. The pre-GGC heap needs to be fully general. It must track in-use and free areas and avoid fragmentation. The GC needs to be able to iterate over everything in the heap to find live objects. Allocating an object in a general heap like this is surprisingly complex. (GGC’s Tenured heap has pretty much the same set of constraints, and in fact reuses the pre-GGC heap implementation.)

    The Nursery, on the other hand, just grows until it is full. You never need to delete anything, at least until you free up the whole Nursery during a minor GC, so there is no need to track free regions. Consequently, the Nursery is perfect for bump allocation: to allocate N bytes you just check whether there is space available, then increment the current end-of-heap pointer by N bytes and return the previous pointer.

    There are even tricks to optimize away the “space available” check in many cases. As a result, objects with a short lifespan never go through the slower Tenured heap allocation code at all.

    Timings

    I wrote a simple benchmark to demonstrate the various possible gains of GGC. The benchmark is sort of a “vector Fibonacci” calculation, where it computes a Fibonacci sequence for both the x and y components of a two dimensional vector. The script allocates a temporary object on every iteration. It first times the loop with the (Tenured) heap nearly empty, then it constructs a large object graph, intended to be placed into the Tenured portion of the heap, and times the loop again.

    On my laptop, the benchmark shows huge wins from GGC. The average time for an iteration through the loop drops from 15 nanoseconds (ns) to 6ns with an empty heap, demonstrating the faster Nursery allocation. It also shows the independence from the Tenured heap size: without GGC, populating the long-lived heap slows down the mean time from 15ns to 27ns. With GGC, the speed stays flat at 6ns per iteration; the Tenured heap simply doesn’t matter.

    Note that this benchmark is intended to highlight the improvements possible with GGC. The actual benefit depends heavily on the details of a given script. In some scripts, the time to initialize an object is significant and may exceed the time required to allocate the memory. A higher percentage of Nursery objects may get tenured. When running inside the browser, we force enough major GCs (eg, after a redraw) that the benefits of GGC are less noticeable.

    Also, the description above implies that we will pause long enough to collect the entire heap, which is not the case — our incremental garbage collector dramatically reduces pause times on many Web workloads already. (The incremental and generational collectors complement each other — each attacks a different part of the problem.)

    Continued…

  5. How fast is PDF.js?

    Hi, my name is Thorben and I work at Opera Software in Oslo, not at Mozilla. So, how did I end up writing for Mozilla Hacks? Maybe you know that there is no default PDF viewer in the Opera Browser, something we would like to change. But how to include one? Buy it from Adobe or Foxit? Start our own?

    Introducing PDF.js

    While investigating our options we quickly stumbled upon PDF.js. The project aims to create a full-featured PDF viewer in the browser using JavaScript and Canvas. Yeah, it sounds a bit crazy, but it makes sense: browsers need to be good at processing text, images, fonts, and vector graphics — exactly the things a PDF viewer has to be good at. The draw commands in PDFs are a subset of Postscript, and they are not so different from what Canvas offers. Also security is virtually no issue: using PDF.js is as secure as opening any other website.

    Working on PDF.js

    So Christian Krebs, Mathieu Henri and myself began looking at PDF.js in more detail and were impressed: it’s well designed, seems fast and big parts of the code are just wow!

    But we also discovered some problems, mainly with performance on very large or graphics-heavy PDFs. We decided that the best way to get to know PDF.js better and to push the project further, was to help the project and address the major issues we found. This gave us a pretty good understanding of the project and its high potential. We were also very impressed by how much the performance of PDF.js improved while we worked on it. This is an active and well managed project.

    Benchmarking PDF.js

    Of course, our tests gave us the wrong impression about performance. We tried to find super large, awkward and hard-to-render PDFs, but that is not what most people want to view. Most PDFs you actually want to view in PDF.js are fine. But how to test that?

    Well, you could check the most popular PDFs on the Internet – as these are the ones you probably want to view – and benchmark them. A snapshot of 5 to 10k PDFs should be enough … but how do you get them?

    I figured that search engines would be my friend. If you tell them to search for PDFs only, they give you the most relevant PDFs for that keyword, which in turn are probably the most popular ones. And if you use the most searched keywords you end up with a good approximation.

    Benchmarking that many PDFs is a big task. So I got myself a small cluster of old computers and built a nice server application that supplied them with tasks. The current repository has almost 7000 PDFs and benchmarking one version of PDF.js takes around eight hours.

    The results

    Let’s skip to the interesting part with the pretty pictures. This graph

    histogram

    gives us almost all the interesting results at one look. You see a histogram of the time it took to process all the pages in the PDFs in relation to the average time it takes to process the average page of the Tracemonkey Paper (the default PDF you see when opening PDF.js). The User Experience when viewing the Tracemonkey Paper is good and from my tests even 3 to 4 times slower is still okay. That means from all benchmarked pages over 96% (exclude pdfs that crashed) will translate to a good user experience. That is really good news! Or to use a very simple pie chart (in % of pages):

    overview

    You probably already noticed the small catch: around 0.8% of the PDFs crashed PDF.js when we tested them. We had a closer look at most of them and at least a third are actually so heavily damaged that probably no PDF viewer could ever display them.

    And this leads us to another good point: we have to keep in mind that these results just stand here without comparison. There are some PDFs on the Internet that are so complex that there is no hope that even native PDF viewers could display them nice and fast. The slowest tested PDF is an incredibly detailed vector map of the public transport system of Lisbon. Try to open it in Adobe Reader, it’s not fun!

    Conclusion

    From these results we concluded that PDF.js is a very valid candidate to be used as the default PDF viewer in the Opera Browser. There is still a lot of work to do to integrate PDF.js nicely into it, but we are working right now on integrating it behind an experimental flag (BTW: There is an extension that adds PDF.js with the default Mozilla viewer. The “nice” integration I am talking about would be deeper and include a brand new viewer). Thanks Mozilla! We are looking forward to working on PDF.js together with you guys!

    PS: Both the code of the computational system and the results are publicly available. Have a look and tell us if you find them useful!

    PPS: If anybody works at a big search engine company and could give me a list with the actual 10k most used PDFs, that would be awesome :)

    Appendix: What’s next?

    The corpus and the computational framework I described, could be used to do all kinds of interesting things. In the next step, we hope to classify PDFs by used fonts formats, image formats and the like. So you can quickly get PDFs to test a new feature with. We also want to look at which drawing instructions are used with which frequency in the Postscript so we can better optimise for the very common ones, like we did with HTML in browsers. Let’s see what we can actually do ;)

  6. jsDelivr – The advanced open source public CDN

    This is a guest post by Dmitriy Akulov and his project jsDelivr. – Editor’s note.

    As a developer you are probably aware of Google Hosted Libraries. Google offers an easy and fast way to include 12 of the most popular js libraries in your websites.

    But what if you are a webmaster and you want take advantage of a fast CDN with other less popular projects too? Or if you are a developer and you want to make your project easier to access and use by other users.

    This is where jsDelivr comes into play. jsDelivr is a free and open source CDN created to help developers and webmasters. There are no popularity restrictions and all kinds of files are allowed, including JavaScript libraries, jQuery plugins, CSS frameworks, fonts and more.

    Adding a library

    To add a new library or update an existing one all the developer has to do is to clone our Github repository and apply the modifications they see fit. Once a moderator reviews the Pull Request and merges it, the files become instantly available from the official website.

    If a mod is online the approval should not take more than 20 minutes, otherwise it can take up to 10 hours until someone comes online. But once our auto-update utility comes online review times will drop.

    Reliability

    But what actually makes it so advanced? The idea of jsDelivr was not to create another public CDN but to offer a super fast and reliable infrastructure that developers and website owners could trust and use. Any big or small website can use it without worrying about it. There are no bandwidth limits and our service is rock solid.

    Slow responses, timeouts and downtime are not tolerated, so we designed a unique system to overcome these problems and offer a product that even Enterprise CDNs would be jealous of. Uptime and performance are top priority, we monitor everything at all times and we are always looking into new technologies and providers that may further improve our CDN.

    Infrastructure

    network-map

    Unlike the competition jsDelivr uses a unique Multi-CDN infrastructure to offer the best possible uptime and performance. The main backbone of it is built on top of CDN networks provided by MaxCDN and CloudFlare.

    And we also use custom servers in locations where CDNs have little or no presence. In total at this moment this results in 42 global POP locations. In the future we plan to add even more locations to offer top performance even in less popular countries.

    Of course lots of locations means nothing if you can’t load balance across them correctly. For the load balancing system we use services provided by Cedexis. One of their main features is the real time performance data they gather on all major CDN providers. 1.3 billion RUM (Real User Metrics) performance tests per day are processed and available to all Cedexis users.

    Measuring performance

    To gather these RUM tests they have deployed a special JavaScript code on thousands of websites. Every visitor to one of these websites executes the code and starts testing different CDN providers in the background as they browse the website. The testing does not impact on the browsing experience in any way and is completely transparent to the user. You can actually see how it works by visiting our website and opening developer tools to “Network” tab.

    The beauty of these tests is that they are not synthetic. They reflect the real performance real users will get if they download a file from one of those CDNs.

    The following information is then stored:

    • Performance metrics to each of our providers.
    • Availability metrics to each of our providers.
    • Browser’s User-Agent
    • First three octets of the user’s IP address

    Now that we have all this information we can use it in our smart load balancing algorithm.

    Every user gets a unique response that is based on their location and ISP provider. Each time a users requests to download a file from jsDelivr, our algorithm extracts the performance and availability data it has available for the last few minutes and then figures out the most optimal provider for that particular user and that particular time. All that in a few ms.

    First it makes sure that all available providers are online. For this it uses the RUM availability data and a synthetic test that checks each provider every minute for uptime. Then it proceeds to sort the providers by performance for the ISP of the user and his/her location.

    Once it has the fastest provider it returns the hostname to the user. So for example 2 different users in London with different ISPs could get 2 different responses because their ISPs have different routing and performance to different CDN providers. This smart system guarantees maximum uptime and fast loading times to all users. If a provider goes down jsDelivr won’t experience any issue at all and immediately start serving a different provider.

    This algorithm also immediately responds to performance degradation. For example a CDN provider gets DDoSed in Europe and their response times increase, jsDelivr will pick up the change and simply stop using this provider in Europe but still consider it for users in USA and other locations that were not affected by the attack.

    Don’t rely on a single CDN for uptime and speed. Everything can go down, but the chances for 2 CDNs and multiple servers to go down at the same time are very slim. And this is why jsDelivr is the most optimal solution for every website out there. No matter how big it is.

    I should also point out that MaxCDN, CloudFlare, Cedexis and the rest of the companies sponsor jsDelivr for free. Its nice to see that there are companies out there that are willing to help open source projects and build a fast and free internet.

    Advanced Features

    jsDelivr also supports some interesting and very helpful features such as:

    Version Aliasing

    Instead of using a unique URL for each version to load a project with jsDelivr you can use aliasing. Lets take for example the project Abaaso. At this moment the latest version is 3.10.50 and you can load it by specifying the exact version in your url as always. But since this project gets updated very often you would end up with using the old version pretty soon. To overcome this problem you can now simply use the following URL:

    //cdn.jsdelivr.net/abaaso/3.10/abaaso.min.js

    By using 3.10 you tell jsDelivr to load the latest version it has in the 3.10 branch which in this case is 3.10.50. This is the optimal solution for most authors because they can load the latest minor version without worrying for major changes that could break their website.

    It is of course possible to load the latest version in the v3 branch by using the following URL:

    //cdn.jsdelivr.net/abaaso/3/abaaso.min.js

    And if for any reason you need to always load the latest available version in any major branch you can use:

    //cdn.jsdelivr.net/abaaso/latest/abaaso.min.js

    By using the latest version you tell the server to load the absolute latest version it has. This of course is dangerous and given enough time may and will break your website. So use this feature with caution.

    Load multiple files with a single HTTP request

    jsDelivr is the first CDN to support this kind of functionality. You can load multiple files using a single HTTP request. Similar to combining and minifying js files in your own server, but cached by the huge and smart network of jsDelivr.

    All you have to do is to build your own URL with the projects and files you want to combine and their versions if needed. For example, to load the latest version for projects abaaso, ace and alloyui you would use the following syntax:

    //cdn.jsdelivr.net/g/abaaso,ace,alloyui

    Have in mind that loading the latest version is not recommended and given enough time will break your website. This is why you should specify the exact versions or use version aliases:

    //cdn.jsdelivr.net/g/jquery@2.1,angularjs@1.2

    So jquery@2.1 will load 2.1.0 and angularjs@1.2 will load 1.2.14. But the above URL will load the main files of each project and nothing else.

    If you want to load multiple files from a single project then you can do the following:

    //cdn.jsdelivr.net/g/jquery@2.1,angularjs@1.2.14(angular.min.js+angular-resource.min.js+angular-animate.min.js+angular-cookies.min.js+angular-route.min.js+angular-sanitize.min.js)

    If you want to load CSS then select css files using the above format. If all files in the group URL have a .css extension then the server will automatically respond with a Content-Type: text/css HTTP header. In all other cases (for /g/ URLs) Content-Type: application/javascript is used.

    Next you simply include the url in your website and you are done. Less DNS resolving, less TCP connections, less HTTP requests = Faster website.

    You can even use this feature to offer your users a builder to allow them to generate a URL with the modules they need and then load them all using a fast CDN.

    A real API

    jsDelivr has a fully featured API that can be used by developers in their websites,to create custom modules and anything else you might think of https://github.com/jsdelivr/api

    You can request exactly what you need using our API without downloading a huge package json. And it also supports cdnjs and Google. This way developers have everything they need to build their applications.

    Auto-Updates

    jsDelivr libgrabber is a utility that is going to run on our servers and can auto-update all hosted projects if configured. The best part is that the authors don’t have to change anything in their repos. All changes are made on jsDelivr side.

    All you need is to create an update.json file with some basic info inside the project you want to keep auto-updated in jsDelivr repo. This file also supports multiple sources for new versions. Like npm, bower and directly Github repos. It is still under development but is planned to be released soon.

    Try it out, help out!

    jsDelivr is a very interesting project that I enjoy developing and making better. It also heavily relies on the help of the community. Consider using it in your websites and host there your projects.

    And if you are interested in helping out, we can always use some help, just join the conversation on Github.

    Feel free to leave your comments and ask me any questions you might have.

    Thank you

  7. Optimizing your JavaScript game for Firefox OS

    When developing on a quad core processor with 16 gigabytes of RAM you can easily forget to consider how it will perform on a mobile device. This article will detail some best practices and things to consider for moving a game to Firefox OS or any similar hardware target.

    Making the best of 256 Mb RAM/800 Mhz CPU

    There are many areas of focus to keep in mind while developing a game. When your goal is to draw 60 times a second, garbage collection and inefficient drawing calls start to get in your way. Let’s start with the basics…

    Don’t over-optimize

    This might sound counter-intuitive in an article about game optimization but optimization is the last step; performed on complete, working code. While it’s never a bad idea to keep these tips and tricks in mind, you don’t know whether you’ll need them until you’ve finished the game and played it on a device.

    Optimize Drawing

    Drawing on HTML5 2D canvas is the biggest bottleneck in most JavaScript games, as all other updates are usually just algebra without touching the DOM. Canvas operations are hardware accelerated, which can give you some extra room to breath.

    Use whole-pixel rendering

    Sub-pixel rendering occurs when you render objects on a canvas without whole values.

    ctx.drawImage(myImage, 0.3, 0.5)

    This causes the browser to do extra calculations to create the anti-aliasing effect. To avoid this, make sure to round all co-ordinates used in calls to drawImage using Math.floor or as you’ll reader further in the article, bitwse operators.

    jsPerf – drawImage whole pixels.

    Cache drawing in an offscreen canvas

    If you find yourself with complex drawing operations on each frame, consider creating an offscreen canvas, draw to it once (or whenever it changes) on the offscreen canvas, then on each frame draw the offscreen canvas.

    myEntity.offscreenCanvas = document.createElement(“canvas”);
    myEntity.offscreenCanvas.width = myEntity.width;
    myEntity.offscreenCanvas.height = myEntity.height;
    myEntity.offscreenContext = myEntity.offscreenCanvas.getContext(“2d”);
     
    myEntity.render(myEntity.offscreenContext);

    Use moz-opaque on the canvas tag (Firefox Only)

    If your game uses canvas and doesn’t need to be transparent, set the moz-opaque attribute on the canvas tag. This information can be used internally to optimize rendering.

    <canvas id="mycanvas" moz-opaque></canvas>

    Described more in Bug 430906 – Add moz-opaque attribute on canvas.

    Scaling canvas using CSS3 transform

    CSS3 transforms are faster by using the GPU. Best case is to not scale the canvas or have a smaller canvas and scale up rather than a bigger canvas and scale down. For Firefox OS, target 480 x 320 px.

    var scaleX = canvas.width / window.innerWidth;
    var scaleY = canvas.height / window.innerHeight;
     
    var scaleToFit = Math.min(scaleX, scaleY);
    var scaleToCover = Math.max(scaleX, scaleY);
     
    stage.style.transformOrigin = "0 0"; //scale from top left
    stage.style.transform = "scale(" + scaleToFit + ")";

    See it working in this jsFiddle.

    Nearest-neighbour rendering for scaling pixel-art

    Leading on from the last point, if your game is themed with pixel-art, you should use one of the following techniques when scaling the canvas. The default resizing algorithm creates a blurry effect and ruins the beautiful pixels.

    canvas {
      image-rendering: crisp-edges;
      image-rendering: -moz-crisp-edges;
      image-rendering: -webkit-optimize-contrast;
      -ms-interpolation-mode: nearest-neighbor;
    }

    or

    var context = canvas.getContext(‘2d’);
    context.webkitImageSmoothingEnabled = false;
    context.mozImageSmoothingEnabled = false;
    context.imageSmoothingEnabled = false;

    More documentation is available on MDN for image-rendering.

    CSS for large background images

    If like most games you have a static background image, use a plain DIV element with a CSS background property and position it under the canvas. This will avoid drawing a large image to the canvas on every tick.

    Multiple canvases for layers

    Similar to the last point, you may find you have some elements that are frequently changing and moving around whereas other things (like UI) never change. An optimization in this situation is to create layers using multiple canvas elements.

    For example you could create a UI layer that sits on top of everything and is only drawn during user input. You could create game layer where the frequently updating entities exist and a background layer for entities that rarely update.


    Don’t scale images in drawImage

    Cache various sizes of your images on an offscreen canvas when loading as opposed to constantly scaling them in drawImage.

    jsPerf – Canvas drawImage Scaling Performance.

    Be careful with heavy physics libraries

    If possible, roll your own physics as libraries like Box2D don’t perform well on low-end Firefox OS devices.

    When asm.js support lands in Firefox OS, Emscripten-compiled libraries can take advantage of near-native performance. More reading in Box2d Revisited.

    Use WebGL instead of Context 2D

    Easier said than done but giving all the heavy graphics lifting to the GPU will free up the CPU for greater good. Even though WebGL is 3D, you can use it to draw 2D surfaces. There are some libraries out there that aim to abstract the drawing contexts.

    Minimize Garbage Collection

    JavaScript can spoil us when it comes to memory management. We generally don’t need to worry about memory leaks or conservatively allocating memory. But if we’ve allocated too much and garbage collection occurs in the middle of a frame, that can take up valuable time and result in a visible drop in FPS.

    Pool common objects and classes

    To minimise the amount of objects being cleaned during garbage collection, use a pre-initialised pool of objects and reuse them rather than creating new objects all the time.

    Code example of generic object pool:


    Avoid internal methods creating garbage

    There are various JavaScript methods that create new objects rather than modifying the existing one. This includes: Array.slice, Array.splice, Function.bind.

    Read more about JavaScript garbage collection

    Avoid frequent calls to localStorage

    LocalStorage uses file IO and blocks the main thread to retrieve and save data. Use an in-memory object to cache the values of localStorage and even save writes for when the player is not mid-game.

    Code example of an abstract storage object:


    Async localStorage API with IndexedDB

    IndexedDB is a non-blocking API for storing data on the client but may be overkill for small and simple data. Gaia’s library to make localStorage API asynchronous, using IndexedDB is available on Github: async_storage.js.

    Miscellaneous micro-optimization

    Sometimes when you’ve exhausted all your options and it just won’t go any faster, you can try some micro-optimizations below. However do note that these only start to make a difference in heavy usage when every millisecond counts. Look for them in your hot game loops.

    Use x | 0 instead of Math.floor
    Clear arrays with .length = 0 to avoid creating a new Array
    Sacrifice some CPU time to avoid creating garbage.
    Use if .. else over switch
    jsPerf – switch vs if-else
    Date.now() over (+ new Date)
    jsPerf – Date.now vs new Date().getTime() vs +new Date
    Or performance.now() for a sub-millisecond solution
    Use TypedArrays for floats or integers (e.g. vectors and matrices)
    gl-matrix – Javascript Matrix and Vector library for High Performance WebGL apps

    Conclusion

    Building for mobile devices and not-so-strong hardware is a good and creative exercise, and we hope you will want to make sure your games work well on all platforms!

  8. Performance with JavaScript String Objects

    This article aims to take a look at the performance of JavaScript engines towards primitive value Strings and Object Strings. It is a showcase of benchmarks related to the excellent article by Kiro Risk, The Wrapper Object. Before proceeding, I would suggest visiting Kiro’s page first as an introduction to this topic.

    The ECMAScript 5.1 Language Specification (PDF link) states at paragraph 4.3.18 about the String object:

    String object member of the Object type that is an instance of the standard built-in String constructor

    NOTE A String object is created by using the String constructor in a new expression, supplying a String value as an argument.
    The resulting object has an internal property whose value is the String value. A String object can be coerced to a String value
    by calling the String constructor as a function (15.5.1).

    and David Flanagan’s great book “JavaScript: The Definitive Guide”, very meticulously describes the Wrapper Objects at section 3.6:

    Strings are not objects, though, so why do they have properties? Whenever you try to refer to a property of a string s, JavaScript converts the string value to an object as if by calling new String(s). […] Once the property has been resolved, the newly created object is discarded. (Implementations are not required to actually create and discard this transient object: they must behave as if they do, however.)

    It is important to note the text in bold above. Basically, the different ways a new String object is created are implementation specific. As such, an obvious question one could ask is “since a primitive value String must be coerced to a String Object when trying to access a property, for example str.length, would it be faster if instead we had declared the variable as String Object?”. In other words, could declaring a variable as a String Object, ie var str = new String("hello"), rather than as a primitive value String, ie var str = "hello" potentially save the JS engine from having to create a new String Object on the fly so as to access its properties?

    Those who deal with the implementation of ECMAScript standards to JS engines already know the answer, but it’s worth having a deeper look at the common suggestion “Do not create numbers or strings using the ‘new’ operator”.

    Our showcase and objective

    For our showcase, we will use mainly Firefox and Chrome; the results, though, would be similar if we chose any other web browser, as we are focusing not on a speed comparison between two different browser engines, but at a speed comparison between two different versions of the source code on each browser (one version having a primitive value string, and the other a String Object). In addition, we are interested in how the same cases compare in speed to subsequent versions of the same browser. The first sample of benchmarks was collected on the same machine, and then other machines with a different OS/hardware specs were added in order to validate the speed numbers.

    The scenario

    For the benchmarks, the case is rather simple; we declare two string variables, one as a primitive value string and the other as an Object String, both of which have the same value:

      var strprimitive = "Hello";
      var strobject    = new String("Hello");

    and then we perform the same kind of tasks on them. (notice that in the jsperf pages strprimitive = str1, and strobject = str2)

    1. length property

      var i = strprimitive.length;
      var k = strobject.length;

    If we assume that during runtime the wrapper object created from the primitive value string strprimitive, is treated equally with the object string strobject by the JavaScript engine in terms of performance, then we should expect to see the same latency while trying to access each variable’s length property. Yet, as we can see in the following bar chart, accessing the length property is a lot faster on the primitive value string strprimitive, than in the object string strobject.


    (Primitive value string vs Wrapper Object String – length, on jsPerf)

    Actually, on Chrome 24.0.1285 calling strprimitive.length is 2.5x faster than calling strobject.length, and on Firefox 17 it is about 2x faster (but having more operations per second). Consequently, we realize that the corresponding browser JavaScript engines apply some “short paths” to access the length property when dealing with primitive string values, with special code blocks for each case.

    In the SpiderMonkey JS engine, for example, the pseudo-code that deals with the “get property” operation looks something like the following:

      // direct check for the "length" property
      if (typeof(value) == "string" && property == "length") {
        return StringLength(value);
      }
      // generalized code form for properties
      object = ToObject(value);
      return InternalGetProperty(object, property);

    Thus, when you request a property on a string primitive, and the property name is “length”, the engine immediately just returns its length, avoiding the full property lookup as well as the temporary wrapper object creation. Unless we add a property/method to the String.prototype requesting |this|, like so:

      String.prototype.getThis = function () { return this; }
      console.log("hello".getThis());

    then no wrapper object will be created when accessing the String.prototype methods, as for example String.prototype.valueOf(). Each JS engine has embedded similar optimizations in order to produce faster results.

    2. charAt() method

      var i = strprimitive.charAt(0);
      var k = strobject["0"];


    (Primitive value string vs Wrapper Object String – charAt(), on jsPerf)

    This benchmark clearly verifies the previous statement, as we can see that getting the value of the first string character in Firefox 20 is substiantially faster in strprimitive than in strobject, about x70 times of increased performance. Similar results apply to other browsers as well, though at different speeds. Also, notice the differences between incremental Firefox versions; this is just another indicator of how small code variations can affect the JS engine’s speed for certain runtime calls.

    3. indexOf() method

      var i = strprimitive.indexOf("e");
      var k = strobject.indexOf("e");


    (Primitive value string vs Wrapper Object String – IndexOf(), on jsPerf)

    Similarly in this case, we can see that the primitive value string strprimitive can be used in more operations than strobject. In addition, the JS engine differences in sequential browser versions produce a variety of measurements.

    4. match() method

    Since there are similar results here too, to save some space, you can click the source link to view the benchmark.

    (Primitive value string vs Wrapper Object String – match(), on jsPerf)

    5. replace() method

    (Primitive value string vs Wrapper Object String – replace(), on jsPerf)

    6. toUpperCase() method

    (Primitive value string vs Wrapper Object String – toUpperCase(), on jsPerf)

    7. valueOf() method

      var i = strprimitive.valueOf();
      var k = strobject.valueOf();

    At this point it starts to get more interesting. So, what happens when we try to call the most common method of a string, it’s valueOf()? It seems like most browsers have a mechanism to determine whether it’s a primitive value string or an Object String, thus using a much faster way to get its value; surprizingly enough Firefox versions up to v20, seem to favour the Object String method call of strobject, with a 7x increased speed.


    (Primitive value string vs Wrapper Object String – valueOf(), on jsPerf)

    It’s also worth mentioning that Chrome 22.0.1229 seems to have favoured too the Object String, while in version 23.0.1271 a new way to get the content of primitive value strings has been implemented.

    A simpler way to run this benchmark in your browser’s console is described in the comment of the jsperf page.

    8. Adding two strings

      var i = strprimitive + " there";
      var k = strobject + " there";


    (Primitive string vs Wrapper Object String – get str value, on jsPerf)

    Let’s now try and add the two strings with a primitive value string. As the chart shows, both Firefox and Chrome present a 2.8x and 2x increased speed in favour of strprimitive, as compared with adding the Object string strobject with another string value.

    9. Adding two strings with valueOf()

      var i = strprimitive.valueOf() + " there";
      var k = strobject.valueOf() + " there";


    (Primitive string vs Wrapper Object String – str valueOf, on jsPerf)

    Here we can see again that Firefox favours the strobject.valueOf(), since for strprimitive.valueOf() it moves up the inheritance tree and consequently creates a new wapper object for strprimitive. The effect this chained way of events has on the performance can also be seen in the next case.

    10. for-in wrapper object

      var i = "";
      for (var temp in strprimitive) { i += strprimitive[temp]; }
     
      var k = "";
      for (var temp in strobject) { k += strobject[temp]; }

    This benchmark will incrementally construct the string’s value through a loop to another variable. In the for-in loop, the expression to be evaluated is normally an object, but if the expression is a primitive value, then this value gets coerced to its equivalent wrapper object. Of course, this is not a recommended method to get the value of a string, but it is one of the many ways a wrapper object can be created, and thus it is worth mentioning.


    (Primitive string vs Wrapper Object String – Properties, on jsPerf)

    As expected, Chrome seems to favour the primitive value string strprimitive, while Firefox and Safari seem to favour the object string strobject. In case this seems much typical, let’s move on the last benchmark.

    11. Adding two strings with an Object String

      var str3 = new String(" there");
     
      var i = strprimitive + str3;
      var k = strobject + str3;


    (Primitive string vs Wrapper Object String – 2 str values, on jsPerf)

    In the previous examples, we have seen that Firefox versions offer better performance if our initial string is an Object String, like strobject, and thus it would be seem normal to expect the same when adding strobject with another object string, which is basically the same thing. It is worth noticing, though, that when adding a string with an Object String, it’s actually quite faster in Firefox if we use strprimitive instead of strobject. This proves once more how source code variations, like a patch to a bug, lead to different benchmark numbers.

    Conclusion

    Based on the benchmarks described above, we have seen a number of ways about how subtle differences in our string declarations can produce a series of different performance results. It is recommended that you continue to declare your string variables as you normally do, unless there is a very specific reason for you to create instances of the String Object. Also, note that a browser’s overall performance, particularly when dealing with the DOM, is not only based on the page’s JS performance; there is a lot more in a browser than its JS engine.

    Feedback comments are much appreciated. Thanks :-)

  9. No Single Benchmark for the Web

    Google released a new JavaScript benchmark a few days ago called Octane. New benchmarks are always welcome, as they push browsers to new levels of performance in new areas. I was particularly pleased to see the inclusion of pdf.js, which unlike most benchmarks is real-world code, as well as the GB Emulator which is a very interesting type of performance-intensive code. However, every benchmark suite has limitations, and it is worth keeping that in mind, especially given the new benchmark’s title in the announcement and in the project page as “The JavaScript Benchmark Suite for the Modern Web” – which is a high goal to set for a single benchmark.

    Now, every benchmark must pick some code to run out of all the possible code out there, and picking representative code is very hard. So it is always understandable that benchmarks are never 100% representative of the code that exists and is important. However, even taking that into account, I have concerns with some of the code selected to appear in Octane: There are better versions of two of the five new benchmarks, and performance on those better versions is very different than the versions that do appear in Octane.

    Benchmarking black boxes

    One of the new benchmarks in Octane is “Mandreel”, which is the Bullet physics engine compiled by Mandreel, a C++ to JS compiler. Bullet is definitely interesting code to include in a benchmark. However the choice of Mandreel’s port is problematic. One issue is that Mandreel is a closed-source compiler, a black box, making it hard to learn from it what kind of code is efficient and what should be optimized. We just have a generated code dump, which, as a commercial product, would cost money for anyone to reproduce those results with modifications to the original C++ being run or a different codebase. We also do not have the source code compiled for this particular benchmark: Bullet itself is open source, but we don’t know the specific version compiled here, nor do we have the benchmark driver code that uses Bullet, both of which would be necessary to reproduce these results using another compiler.

    An alternative could have been to use Bullet compiled by Emscripten, an open source compiler that similarly compiles C++ to JS (disclaimer: I am an Emscripten dev). Aside from being open, Emscripten also has a port of Bullet (a demo can be seen here) that can interact in a natural way with regular JS, making it usable in normal web games and not just compiled ones, unlike Mandreel’s port. This is another reason for preferring the Emscripten port of Bullet instead.

    Is Mandreel representative of the web?

    The motivation Google gives for including Mandreel in Octane is that Mandreel is “used in countless web-based games.” It seems that Mandreel is primarily used in the Chrome Web Store (CWS) and less outside in the normal web. The quoted description above is technically accurate: Mandreel games in the CWS are indeed “web-based” (written in JS+HTML+WebGL) even if they are not actually “on the web”, where by “on the web” I mean outside of the walled garden of the CWS and in the normal web that all browsers can access. And it makes perfect sense that Google cares about the performance of code that runs in the CWS, since Google runs and profits from that store. But it does call into question the title of the Octane benchmark as “The JavaScript Benchmark Suite for the Modern Web.”

    Performance of generated code is highly variable

    With that said, it is still fair to say that compiler-generated code is increasing in importance on the web, so some benchmark must be chosen to represent it. The question is how much the specific benchmark chosen represents compiled code in general. On the one hand the compiled output of Mandreel and Emscripten is quite similar: both use large typed arrays, the same Relooper algorithm, etc., so we could expect performance to be similar. That doesn’t seem to always be the case, though. When we compare Bullet compiled by Mandreel with Bullet compiled by Emscripten – I made a benchmark of that a while back, it’s available here – then on my MacBook pro, Chrome is 1.5x slower than Firefox on the Emscripten version (that is, Chrome takes 1.5 times as long to execute in this case), but 1.5x faster on the Mandreel version that Google chose to include in Octane (that is, Chrome receives a score 1.5 times larger in this case). (I tested with Chrome Dev, which is the latest version available on Linux, and Firefox Aurora which is the best parallel to it. If you run the tests yourself, note that in the Emscripten version smaller numbers are better while the opposite is true in the Octane version.)

    (An aside, not only does Chrome have trouble running the Emscripten version quickly, but that benchmark also exposes a bug in Chrome where the tab consistently crashes when the benchmark is reloaded – possibly a dupe of this open issue. A serious problem of that nature, that does not happen on the Mandreel-compiled version, could indicate that the two were optimized differently as a result of having received different amounts of focus by developers.)

    Another issue with the Mandreel benchmark is the name. Calling it Mandreel implies it represents all Mandreel-generated code, but there can be huge differences in performance depending on what C/C++ code is compiled, even with a single compiler. For example, Chrome can be 10-15x slower than Firefox on some Emscripten-compiled benchmarks (example 1, example 2) while on others it is quite speedy (example). So calling the benchmark “Mandreel-Bullet” would have been better, to indicate it is just one Mandreel-compiled codebase, which cannot represent all compiled code.

    Box2DWeb is not the best port of Box2D

    “Box2DWeb” is another new benchmark in Octane, in which a specific port of Box2D to JavaScript is run, namely Box2DWeb. However, as seen here (see also this), Box2DWeb is significantly slower than other ports of Box2D to the web, specifically Mandreel and Emscripten’s ports from the original C++ that Box2D is written in. Now, you can justify excluding the Mandreel version because it cannot be used as a library from normal JS (just as with Bullet before), but the Emscripten-compiled one does not have that limitation and can be found here. (Demos can be seen here and here.)

    Another reason for preferring the Emscripten version is that it uses Box2D 2.2, whereas Box2DWeb uses the older Box2D 2.1. Compiling the C++ code directly lets the Emscripten port stay up to date with the latest upstream features and improvements far more easily.

    It is possible that Google surveyed websites and found that the slower Box2DWeb was more popular, although I have no idea whether that was the case, but if so that would partially justify preferring the slower version. However, even if that were true, I would argue that it would be better to use the Emscripten version because as mentioned earlier it is faster and more up to date. Another factor to consider is that the version included in Octane will get attention and likely an increase in adoption, which makes it all the more important to select the one that is best for the web.

    I put up a benchmark of Emscripten-compiled Box2D here, and on my machine Chrome is 3x slower than Firefox on that benchmark, but 1.6x faster on the version Google chose to include in Octane. This is a similar situation to what we saw earlier with the Mandreel/Bullet benchmark and it raises the same questions about how representative a single benchmark can be.

    Summary

    As mentioned at the beginning, all benchmarks are imperfect. And the fact that the specific code samples in Octane are ones that Chrome runs well does not mean the code was chosen for that reason: The opposite causation is far more likely, that Google chose to focus on optimizing those and in time made Chrome fast on them. And that is how things properly work – you pick something to optimize for, and then optimize for it.

    However, in 2 of the 5 new benchmarks in Octane there are good reasons for preferring alternative, better versions of those two benchmarks as we saw before. Now, it is possible that when Google started to optimize for Octane, the better options were not yet available – I don’t know when Google started that effort – but the fact that better alternatives exist in the present makes substantial parts of Octane appear less relevant today. Of course, if performance on the better versions was not much different than the Octane versions then this would not matter, but as we saw there were in fact significant differences when comparing browsers on those versions: One browser could be significantly better on one version of the same benchmark but significantly slower on another.

    What all of this shows is that there cannot be a single benchmark for the modern web. There are simply too many kinds of code, and even when we focus on one of them, different benchmarks of that particular task can behave very differently.

    With that said, we shouldn’t be overly skeptical: Benchmarks are useful. We need benchmarks to drive us forward, and Octane is an interesting new benchmark that, even with the problems mentioned above, does contain good ideas and is worth focusing on. But we should always be aware of the limitations of any single benchmark, especially when a single benchmark claims to represent the entire modern web.

     

  10. Getting snappy – performance optimizations in Firefox 13

    Back in the fall of 2011, we took a targeted look at Firefox responsiveness issues. We identified a number of short term projects that together could achieve significant responsiveness improvements in day-to-day Firefox usage. Project Snappy kicked off at the end of the year with the goal of improving Firefox responsiveness.

    Although Snappy first contributed fixes to Firefox 11, Snappy’s most noticeable contributions to date are landing with Firefox 13. Currently in beta, this release includes a number of responsiveness related fixes, most notably tabs-on-demand, cycle collector improvements, and start-up optimization.

    Tabs -on-Demand

    Tabs-on-demand is a feature that reduces start-up time for Firefox windows with many tabs. In Firefox 12, all tabs are loaded on start-up. For windows with many tabs this may cause a delay before you can interact with Firefox as each tab must load its content. In Firefox 13, only the active tab will load. Loading of background tabs is deferred until a tab is selected. This results in Firefox starting faster as tabs-on-demand reduces processing requirements, network usage, and memory consumption.

    Cycle Collector

    As you interact with the browser and Web content, memory is allocated as needed. The Firefox cycle collector works to automatically free some of this memory when it is no longer needed. This action reduces Firefox’s memory usage. In Firefox 13, the cycle collector is more efficient, spending less time examining memory that is still in use, which results in less pauses as you use Firefox.

    Start-up

    Firefox start-up time is visible to all users. Our investigation into start-up has identified a number of unoptimized routines in the code that executes before what we call “first paint”. “First paint” signifies when the Firefox user interface is first visible on your screen. In Firefox 13 we have optimized file calls, audio sessions, drag and drop, and overall IO, just to name a few. We are continuing to profile the Firefox start-up sequence to identify further optimizations that can be made in future releases.

    There are numerous other Snappy fixes in Firefox 13 including significant improvements to IO contention, font enumeration, and livemark overhead. All of these fixes contribute to a more responsive experience. We are already working on further responsiveness fixes for future Firefox releases. You can expect to see Snappy improvements in upcoming releases in areas such as memory usage, shutdown time, network cache and connections, menus, and graphics.