Mozilla

Performance Articles

Sort by:

View:

  1. Mozilla developer preview (Gecko 1.9.3a2) now available

    We’ve posted a new release of our Mozilla developer preview series as a way to test new features that we’re putting into the Mozilla platform. These features may or may not make it into a future Firefox release, either for desktops or for mobile phones. But that’s why we do these releases – to get testing and feedback early so we know how to treat them.

    Note that this release does not contain two things that have gotten press recently: D2D or the new JavaScript VM work we’ve been doing.

    Since this is a weblog focused on web developers, I think that it’s important to talk about what’s new for all of you. So we’ll jump right into that:

    Out of Process Plugins

    We did an a1 release about three weeks ago in order to get testing on some of the new web developer features (which we’ll list here again.) The biggest change between that release and this one is the inclusion of out of process plugins for Windows and Linux. (Mac is a little bit more work and we’re working on it as fast as our little fingers will type.)

    There are a lot of plugins out there on the web, and they exist to varying degrees of quality. So we’re pushing plugins out of process so that when one of them crashes it doesn’t take the entire browser with it. (It also has lots of other nice side effects – we can better control memory usage, CPU usage and it also helps with general UI lag.)

    If you want to know more about it, have a look at this post by Ben Smedberg who goes over how it works, what prefs you can set and how you can help test it. It would help us a lot of you did.

    (If you really want to get on the testing train we strongly suggest you start running our nightly builds which are the ultimate in bleeding edge but are generally stable enough for daily use.)

    Anyway, on to the feature list and performance improvements taken from the release announcement:

    Web Developer Features

    • Support for Content Security Policy. This is largely complete, minus the ability to disable eval().
    • The placeholder attribute for <input/> and <textarea> is now supported.
    • Support for SMIL Animation in SVG. Support for animating some SVG attributes is still under development and the animateMotion element isn’t supported yet.
    • Support for CSS Transitions. This support is not quite complete: support for animation of transforms and gradients has not yet been implemented.
    • Support for WebGL, which is disabled by default but can be enabled by changing a preference. See this blog post and this blog post for more details.
    • Support for the getClientRects and getBoundingClientRect methods on Range objects. See bug 396392 for details.
    • Support for the setCapture and releaseCapture methods on DOM elements. See bug 503943 for details.
    • Support for the HTML5 History.pushState() and History.replaceState() methods and the popstate event. See bug 500328 for details.
    • Support for the -moz-image-rect() value for background-image. See bug 113577 for more details.

    For the full list of new web developer features please visit our page on Upcoming Features for Web Developers.

    Performance Improvements

    • We’ve removed link history lookup from the main thread and made it asynchronous. This results in less I/O during page loads and improves overall browser responsiveness.
    • Loading the HTML5 spec no longer causes very long browser pauses.
    • A large number of layout performance improvements have been made, including work around DOM access times, color management performance, text area improvements and many other hot spots in the layout engine.
    • The JavaScript engine has many improvements: String handling is improved, faster closures, some support for recursion in TraceMonkey to name a few.
    • Improvements to the performance of repainting HTML in <foreignObject>.
    • Strings are not copied between the main DOM code and web workers, improving performance for threaded JavaScript which moves large pieces of data between threads.
  2. Optimizing your JavaScript game for Firefox OS

    When developing on a quad core processor with 16 gigabytes of RAM you can easily forget to consider how it will perform on a mobile device. This article will detail some best practices and things to consider for moving a game to Firefox OS or any similar hardware target.

    Making the best of 256 Mb RAM/800 Mhz CPU

    There are many areas of focus to keep in mind while developing a game. When your goal is to draw 60 times a second, garbage collection and inefficient drawing calls start to get in your way. Let’s start with the basics…

    Don’t over-optimize

    This might sound counter-intuitive in an article about game optimization but optimization is the last step; performed on complete, working code. While it’s never a bad idea to keep these tips and tricks in mind, you don’t know whether you’ll need them until you’ve finished the game and played it on a device.

    Optimize Drawing

    Drawing on HTML5 2D canvas is the biggest bottleneck in most JavaScript games, as all other updates are usually just algebra without touching the DOM. Canvas operations are hardware accelerated, which can give you some extra room to breath.

    Use whole-pixel rendering

    Sub-pixel rendering occurs when you render objects on a canvas without whole values.

    ctx.drawImage(myImage, 0.3, 0.5)

    This causes the browser to do extra calculations to create the anti-aliasing effect. To avoid this, make sure to round all co-ordinates used in calls to drawImage using Math.floor or as you’ll reader further in the article, bitwse operators.

    jsPerf – drawImage whole pixels.

    Cache drawing in an offscreen canvas

    If you find yourself with complex drawing operations on each frame, consider creating an offscreen canvas, draw to it once (or whenever it changes) on the offscreen canvas, then on each frame draw the offscreen canvas.

    myEntity.offscreenCanvas = document.createElement(“canvas”);
    myEntity.offscreenCanvas.width = myEntity.width;
    myEntity.offscreenCanvas.height = myEntity.height;
    myEntity.offscreenContext = myEntity.offscreenCanvas.getContext(“2d”);
     
    myEntity.render(myEntity.offscreenContext);

    Use moz-opaque on the canvas tag (Firefox Only)

    If your game uses canvas and doesn’t need to be transparent, set the moz-opaque attribute on the canvas tag. This information can be used internally to optimize rendering.

    <canvas id="mycanvas" moz-opaque></canvas>

    Described more in Bug 430906 – Add moz-opaque attribute on canvas.

    Scaling canvas using CSS3 transform

    CSS3 transforms are faster by using the GPU. Best case is to not scale the canvas or have a smaller canvas and scale up rather than a bigger canvas and scale down. For Firefox OS, target 480 x 320 px.

    var scaleX = canvas.width / window.innerWidth;
    var scaleY = canvas.height / window.innerHeight;
     
    var scaleToFit = Math.min(scaleX, scaleY);
    var scaleToCover = Math.max(scaleX, scaleY);
     
    stage.style.transformOrigin = "0 0"; //scale from top left
    stage.style.transform = "scale(" + scaleToFit + ")";

    See it working in this jsFiddle.

    Nearest-neighbour rendering for scaling pixel-art

    Leading on from the last point, if your game is themed with pixel-art, you should use one of the following techniques when scaling the canvas. The default resizing algorithm creates a blurry effect and ruins the beautiful pixels.

    canvas {
      image-rendering: crisp-edges;
      image-rendering: -moz-crisp-edges;
      image-rendering: -webkit-optimize-contrast;
      -ms-interpolation-mode: nearest-neighbor;
    }

    or

    var context = canvas.getContext(‘2d’);
    context.webkitImageSmoothingEnabled = false;
    context.mozImageSmoothingEnabled = false;
    context.imageSmoothingEnabled = false;

    More documentation is available on MDN for image-rendering.

    CSS for large background images

    If like most games you have a static background image, use a plain DIV element with a CSS background property and position it under the canvas. This will avoid drawing a large image to the canvas on every tick.

    Multiple canvases for layers

    Similar to the last point, you may find you have some elements that are frequently changing and moving around whereas other things (like UI) never change. An optimization in this situation is to create layers using multiple canvas elements.

    For example you could create a UI layer that sits on top of everything and is only drawn during user input. You could create game layer where the frequently updating entities exist and a background layer for entities that rarely update.

    Don’t scale images in drawImage

    Cache various sizes of your images on an offscreen canvas when loading as opposed to constantly scaling them in drawImage.

    jsPerf – Canvas drawImage Scaling Performance.

    Be careful with heavy physics libraries

    If possible, roll your own physics as libraries like Box2D don’t perform well on low-end Firefox OS devices.

    When asm.js support lands in Firefox OS, Emscripten-compiled libraries can take advantage of near-native performance. More reading in Box2d Revisited.

    Use WebGL instead of Context 2D

    Easier said than done but giving all the heavy graphics lifting to the GPU will free up the CPU for greater good. Even though WebGL is 3D, you can use it to draw 2D surfaces. There are some libraries out there that aim to abstract the drawing contexts.

    Minimize Garbage Collection

    JavaScript can spoil us when it comes to memory management. We generally don’t need to worry about memory leaks or conservatively allocating memory. But if we’ve allocated too much and garbage collection occurs in the middle of a frame, that can take up valuable time and result in a visible drop in FPS.

    Pool common objects and classes

    To minimise the amount of objects being cleaned during garbage collection, use a pre-initialised pool of objects and reuse them rather than creating new objects all the time.

    Code example of generic object pool:

    Avoid internal methods creating garbage

    There are various JavaScript methods that create new objects rather than modifying the existing one. This includes: Array.slice, Array.splice, Function.bind.

    Read more about JavaScript garbage collection

    Avoid frequent calls to localStorage

    LocalStorage uses file IO and blocks the main thread to retrieve and save data. Use an in-memory object to cache the values of localStorage and even save writes for when the player is not mid-game.

    Code example of an abstract storage object:

    Async localStorage API with IndexedDB

    IndexedDB is a non-blocking API for storing data on the client but may be overkill for small and simple data. Gaia’s library to make localStorage API asynchronous, using IndexedDB is available on Github: async_storage.js.

    Miscellaneous micro-optimization

    Sometimes when you’ve exhausted all your options and it just won’t go any faster, you can try some micro-optimizations below. However do note that these only start to make a difference in heavy usage when every millisecond counts. Look for them in your hot game loops.

    Use x | 0 instead of Math.floor
    Clear arrays with .length = 0 to avoid creating a new Array
    Sacrifice some CPU time to avoid creating garbage.
    Use if .. else over switch
    jsPerf – switch vs if-else
    Date.now() over (+ new Date)
    jsPerf – Date.now vs new Date().getTime() vs +new Date
    Or performance.now() for a sub-millisecond solution
    Use TypedArrays for floats or integers (e.g. vectors and matrices)
    gl-matrix – Javascript Matrix and Vector library for High Performance WebGL apps

    Conclusion

    Building for mobile devices and not-so-strong hardware is a good and creative exercise, and we hope you will want to make sure your games work well on all platforms!

  3. No Single Benchmark for the Web

    Google released a new JavaScript benchmark a few days ago called Octane. New benchmarks are always welcome, as they push browsers to new levels of performance in new areas. I was particularly pleased to see the inclusion of pdf.js, which unlike most benchmarks is real-world code, as well as the GB Emulator which is a very interesting type of performance-intensive code. However, every benchmark suite has limitations, and it is worth keeping that in mind, especially given the new benchmark’s title in the announcement and in the project page as “The JavaScript Benchmark Suite for the Modern Web” – which is a high goal to set for a single benchmark.

    Now, every benchmark must pick some code to run out of all the possible code out there, and picking representative code is very hard. So it is always understandable that benchmarks are never 100% representative of the code that exists and is important. However, even taking that into account, I have concerns with some of the code selected to appear in Octane: There are better versions of two of the five new benchmarks, and performance on those better versions is very different than the versions that do appear in Octane.

    Benchmarking black boxes

    One of the new benchmarks in Octane is “Mandreel”, which is the Bullet physics engine compiled by Mandreel, a C++ to JS compiler. Bullet is definitely interesting code to include in a benchmark. However the choice of Mandreel’s port is problematic. One issue is that Mandreel is a closed-source compiler, a black box, making it hard to learn from it what kind of code is efficient and what should be optimized. We just have a generated code dump, which, as a commercial product, would cost money for anyone to reproduce those results with modifications to the original C++ being run or a different codebase. We also do not have the source code compiled for this particular benchmark: Bullet itself is open source, but we don’t know the specific version compiled here, nor do we have the benchmark driver code that uses Bullet, both of which would be necessary to reproduce these results using another compiler.

    An alternative could have been to use Bullet compiled by Emscripten, an open source compiler that similarly compiles C++ to JS (disclaimer: I am an Emscripten dev). Aside from being open, Emscripten also has a port of Bullet (a demo can be seen here) that can interact in a natural way with regular JS, making it usable in normal web games and not just compiled ones, unlike Mandreel’s port. This is another reason for preferring the Emscripten port of Bullet instead.

    Is Mandreel representative of the web?

    The motivation Google gives for including Mandreel in Octane is that Mandreel is “used in countless web-based games.” It seems that Mandreel is primarily used in the Chrome Web Store (CWS) and less outside in the normal web. The quoted description above is technically accurate: Mandreel games in the CWS are indeed “web-based” (written in JS+HTML+WebGL) even if they are not actually “on the web”, where by “on the web” I mean outside of the walled garden of the CWS and in the normal web that all browsers can access. And it makes perfect sense that Google cares about the performance of code that runs in the CWS, since Google runs and profits from that store. But it does call into question the title of the Octane benchmark as “The JavaScript Benchmark Suite for the Modern Web.”

    Performance of generated code is highly variable

    With that said, it is still fair to say that compiler-generated code is increasing in importance on the web, so some benchmark must be chosen to represent it. The question is how much the specific benchmark chosen represents compiled code in general. On the one hand the compiled output of Mandreel and Emscripten is quite similar: both use large typed arrays, the same Relooper algorithm, etc., so we could expect performance to be similar. That doesn’t seem to always be the case, though. When we compare Bullet compiled by Mandreel with Bullet compiled by Emscripten – I made a benchmark of that a while back, it’s available here – then on my MacBook pro, Chrome is 1.5x slower than Firefox on the Emscripten version (that is, Chrome takes 1.5 times as long to execute in this case), but 1.5x faster on the Mandreel version that Google chose to include in Octane (that is, Chrome receives a score 1.5 times larger in this case). (I tested with Chrome Dev, which is the latest version available on Linux, and Firefox Aurora which is the best parallel to it. If you run the tests yourself, note that in the Emscripten version smaller numbers are better while the opposite is true in the Octane version.)

    (An aside, not only does Chrome have trouble running the Emscripten version quickly, but that benchmark also exposes a bug in Chrome where the tab consistently crashes when the benchmark is reloaded – possibly a dupe of this open issue. A serious problem of that nature, that does not happen on the Mandreel-compiled version, could indicate that the two were optimized differently as a result of having received different amounts of focus by developers.)

    Another issue with the Mandreel benchmark is the name. Calling it Mandreel implies it represents all Mandreel-generated code, but there can be huge differences in performance depending on what C/C++ code is compiled, even with a single compiler. For example, Chrome can be 10-15x slower than Firefox on some Emscripten-compiled benchmarks (example 1, example 2) while on others it is quite speedy (example). So calling the benchmark “Mandreel-Bullet” would have been better, to indicate it is just one Mandreel-compiled codebase, which cannot represent all compiled code.

    Box2DWeb is not the best port of Box2D

    “Box2DWeb” is another new benchmark in Octane, in which a specific port of Box2D to JavaScript is run, namely Box2DWeb. However, as seen here (see also this), Box2DWeb is significantly slower than other ports of Box2D to the web, specifically Mandreel and Emscripten’s ports from the original C++ that Box2D is written in. Now, you can justify excluding the Mandreel version because it cannot be used as a library from normal JS (just as with Bullet before), but the Emscripten-compiled one does not have that limitation and can be found here. (Demos can be seen here and here.)

    Another reason for preferring the Emscripten version is that it uses Box2D 2.2, whereas Box2DWeb uses the older Box2D 2.1. Compiling the C++ code directly lets the Emscripten port stay up to date with the latest upstream features and improvements far more easily.

    It is possible that Google surveyed websites and found that the slower Box2DWeb was more popular, although I have no idea whether that was the case, but if so that would partially justify preferring the slower version. However, even if that were true, I would argue that it would be better to use the Emscripten version because as mentioned earlier it is faster and more up to date. Another factor to consider is that the version included in Octane will get attention and likely an increase in adoption, which makes it all the more important to select the one that is best for the web.

    I put up a benchmark of Emscripten-compiled Box2D here, and on my machine Chrome is 3x slower than Firefox on that benchmark, but 1.6x faster on the version Google chose to include in Octane. This is a similar situation to what we saw earlier with the Mandreel/Bullet benchmark and it raises the same questions about how representative a single benchmark can be.

    Summary

    As mentioned at the beginning, all benchmarks are imperfect. And the fact that the specific code samples in Octane are ones that Chrome runs well does not mean the code was chosen for that reason: The opposite causation is far more likely, that Google chose to focus on optimizing those and in time made Chrome fast on them. And that is how things properly work – you pick something to optimize for, and then optimize for it.

    However, in 2 of the 5 new benchmarks in Octane there are good reasons for preferring alternative, better versions of those two benchmarks as we saw before. Now, it is possible that when Google started to optimize for Octane, the better options were not yet available – I don’t know when Google started that effort – but the fact that better alternatives exist in the present makes substantial parts of Octane appear less relevant today. Of course, if performance on the better versions was not much different than the Octane versions then this would not matter, but as we saw there were in fact significant differences when comparing browsers on those versions: One browser could be significantly better on one version of the same benchmark but significantly slower on another.

    What all of this shows is that there cannot be a single benchmark for the modern web. There are simply too many kinds of code, and even when we focus on one of them, different benchmarks of that particular task can behave very differently.

    With that said, we shouldn’t be overly skeptical: Benchmarks are useful. We need benchmarks to drive us forward, and Octane is an interesting new benchmark that, even with the problems mentioned above, does contain good ideas and is worth focusing on. But we should always be aware of the limitations of any single benchmark, especially when a single benchmark claims to represent the entire modern web.

     

  4. Mozilla developer preview (Gecko 1.9.3a1) available for download

    Editor’s note: Today, Mozilla released a preview of the Gecko 1.9.3 platform for developers and testers. Check out the Mozilla Developer News announcement reposted below.

    A Mozilla Developer Preview of improvements in the Gecko layout engine is now available for download. This is a pre-release version of the Gecko 1.9.3 platform, which forms the core of rich Internet applications such as Firefox. Please note that this release is intended for developers and testers only. As always, we appreciate any feedback you may have and encourage users to help us by filing bugs.

    This developer preview introduces several new features, including:

    and several other significant changes, including:

    • On Mac OS X, we render text using Core Text rather than ATSUI.
    • We rewrote major parts of the code for handling scrolling. See bug 526394 for details.
    • We rewrote the way a snapshot of a document is taken in order to print or print preview. See bug 487667 for details.
    • We made significant changes to table border handling. See bug 452319 and bug 43178 for details.
    • We made various architectural changes to improve Web page performance.

    More information on these changes is in the release notes, as well as the Upcoming Firefox features for developers article on the Mozilla Developer Center.

    Please use the following links when downloading this Mozilla Developer Preview:

  5. mozilla developer preview 4 ready for testing

    Note: this is a re-post of the entry in the Mozilla Project Development Weblog. There’s some juicy stuff in here for Web Developers that need testing. In particular, this is the first build with the CSS history changes.

    As part of our ongoing platform development work, we’re happy to announce the fourth pre-release of the Gecko 1.9.3 platform. Gecko 1.9.3 will form the core of Firefox and other Mozilla project releases.

    It’s available for download on Mac, Windows or Linux.

    Mozilla expects to release a Developer Preview every 2-3 weeks. If you’ve been running a previous release, you will be automatically updated to the latest version when it is released.

    This preview release contains a lot of interesting stuff that’s worth pointing out, and contains many things that were also in previous releases. Here are the things of note in this release:

    User Interface Changes

    • Open tabs that match searches in the Awesomebar now show up as “Switch to Tab.”
    • This is the first preview release to contain resizable text areas by default.

    Web Developer Changes

    • This is the first preview release to contain changes to CSS :visited that prevent a large class of history sniffing attacks. You can find more information about the details of why this change is important over on the hacks post on the topic and on the Mozilla Security Weblog. Note that this change is likely to break some web sites and requires early testing – please test if you can.
    • SVG Attributes which are mapped to CSS properties can now be animated with SMIL. See the bug or a demo.

    Plugins

    • Out of process plugins support for Windows and Linux continues to improve. This release contains many bug fixes vs. our previous developer preview releases. (In fact, it’s good enough that we’ve ported this code back to the 3.6 branch and have pushed that to beta for a later 3.6.x release.)
    • This is the first release that contains support for out of process plugins for the Mac. If you are running OSX 10.6 and you’re running the latest Flash beta, Flash should run out of process

    Performance

    • One area where people complained about performance was restart performance when applying an update. It turns out that a lot of what made that experience poor wasn’t startup time, it was browser shutdown time. We’ve made a fix since the last preview release that made a whopping 97% improvement in shutdown time. (That’s not a typo, it’s basically free now.)
    • Our work to reduce the amount of I/O on the main thread continues unabated. This preview release will feel much snappier than previous snapshots, and feel much faster than Firefox 3.6.
    • We continue to add hardware acceleration support. If you’re on Windows and you’ve got decent OpenGL 2 drivers, open video will use hardware to scale the video when you’re in full screen mode. For large HD videos this can make a huge difference in the smoothness of the experience and how much power + CPU are used. We’ll be adding OSX and Linux support at some point in the future as well, but we’re starting with Windows.
    • We continue to make improvements and bug fixes to our support for Direct2D. (Not enabled by default. If you want to turn it on see Bas’ post.) If you’re running Alpha 4 on Windows Vista or Windows 7, and you’ve turned on D2D, try running this stress test example in Alpha 4 vs. Firefox 3.6. The difference is pretty amazing. You can also see what this looks like compared to other browsers in this this video. (Thanks to Hans Schmucker for the video and demo.)

    Platform

    • JS-ctypes, our new easy-to-use system for extension authors who want to call into native code now has support for complex types: structures, pointers, and arrays. For more information on this, and how easy it can make calling into native code from JavaScript, see Dan Witte’s post.
    • Mozilla is now sporting an infallible allocator. What is this odd-sounding thing, you ask? It’s basically an allocator that when memory can’t be allocated it aborts instead of returning NULL. This reduces the surface area for an entire class of security bugs related to checking NULL pointers, and also allows us to vastly simplify a huge amount of Gecko’s source code.
  6. Detecting and generating CSS animations in JavaScript

    When writing of the hypnotic spiral demo the issue appeared that I wanted to use CSS animation when possible but have a fallback to rotate an element. As I didn’t want to rely on CSS animation I also considered it pointless to write it by hand but instead create it with JavaScript when the browser supports it. Here’s how that is done.

    Testing for the support of animations means testing if the style attribute is supported:

    var animation = false,
        animationstring = 'animation',
        keyframeprefix = '',
        domPrefixes = 'Webkit Moz O ms Khtml'.split(' '),
        pfx  = '';
     
    if( elm.style.animationName ) { animation = true; }    
     
    if( animation === false ) {
      for( var i = 0; i < domPrefixes.length; i++ ) {
        if( elm.style[ domPrefixes[i] + 'AnimationName' ] !== undefined ) {
          pfx = domPrefixes[ i ];
          animationstring = pfx + 'Animation';
          keyframeprefix = '-' + pfx.toLowerCase() + '-';
          animation = true;
          break;
        }
      }
    }

    [Update - the earlier code did not check if the browser supports animation without a prefix - this one does]

    This checks if the browser supports animation without any prefixes. If it does, the animation string will be ‘animation’ and there is no need for any keyframe prefixes. If it doesn’t then we go through all the browser prefixes (to date :)) and check if there is a property on the style collection called browser prefix + AnimationName. If there is, the loop exits and we define the right animation string and keyframe prefix and set animation to true. On Firefox this will result in MozAnimation and -moz- and on Chrome in WebkitAnimation and -webkit- so on. This we can then use to create a new CSS animation in JavaScript. If none of the prefix checks return a supported style property we animate in an alternative fashion.

    if( animation === false ) {
     
      // animate in JavaScript fallback
     
    } else {
      elm.style[ animationstring ] = 'rotate 1s linear infinite';
     
      var keyframes = '@' + keyframeprefix + 'keyframes rotate { '+
                        'from {' + keyframeprefix + 'transform:rotate( 0deg ) }'+
                        'to {' + keyframeprefix + 'transform:rotate( 360deg ) }'+
                      '}';
     
      if( document.styleSheets && document.styleSheets.length ) {
     
          document.styleSheets[0].insertRule( keyframes, 0 );
     
      } else {
     
        var s = document.createElement( 'style' );
        s.innerHTML = keyframes;
        document.getElementsByTagName( 'head' )[ 0 ].appendChild( s );
     
      }
     
    }

    With the animation string defined we can set a (shortcut notation) animation on our element. Now, adding the keyframes is trickier. As they are not part of the original Animation but disconnected from it in the CSS syntax (to give them more flexibility and allow re-use) we can’t set them in JavaScript. Instead we need to write them out as a CSS string.

    If there is already a style sheet applied to the document we add this keyframe definition string to that one, if there isn’t a style sheet available we create a new style block with our keyframe and add it to the document.

    You can see the detection in action and a fallback JavaScript solution on JSFiddle:

    JSFiddle demo.

    Quite simple, but also a bit more complex than I originally thought. You can also dynamically detect and change current animations as this post by Wayne Pan and this demo by Joe Lambert explains but this also seems quite verbose.

    I’d love to have a CSSAnimations collection for example where you could store different animations in JSON or as a string and have their name as the key. Right now, creating a new rule dynamically and adding it either to the document or append it to the ruleset seems to be the only cross-browser way. Thoughts?

  7. Theora on N900

    This is a re-post from Matthew Gregan’s personal weblog on the work that he’s been doing to bring HTML5 open video to mobile devices. Google recently announced funding for some work to bring Theora to ARM devices via a CPU-driven code path. Mozilla has been funding similar work over the last year or so to do video decoding on DSPs found in mobile devices, leaving the CPU largely idle.

    We realize this post isn’t strictly web developer-facing, but it’s interesting enough for those who want to know how this stuff works under the covers.

    Theora on N900: Or, how to play full-screen Theora video on the N900 with 80% idle CPU.

    The C64x+ DSP is often found in systems built upon TI’s OMAP3 SoC, such as the Palm Pre, Motorola Droid, and Nokia N900. Last year, Mozilla funded a port, named Leonora, of Xiph’s Theora video codec to the TI C64x+ DSP. David Schleef conducted the port impressively quickly and published his results. The intention of this project was to provide a high quality set of royalty free media codecs for a common mobile computing platform. The initial focus is Firefox Mobile on the N900, so I am working on integrating David’s work into Firefox. To experiment with other facilities Firefox could use to accelerate video playback and test integration, I’ve been hacking on a branch of a stand-alone Ogg Theora and Vorbis player originally written by Chris Double called plogg.

    Decoding and playing video can be a CPU intensive process, especially when all of the steps are fighting for time on a single CPU. The expensive parts of the playback process can be broken down into a few coarse pieces, in approximate descending order of cost:

    1. Video frame decode
    2. Video colour-space conversion (Y’CbCr to RGB)
    3. Video frame display
    4. Audio block decode
    5. Audio block playback

    David’s DSP work enables item 1 to be off-loaded from the CPU completely, effectively providing “hardware accelerated” video decoding. Most devices have some way to off-load items 2 and 3 to the graphics hardware, but it can be difficult to make use of this while integrating with an existing graphics rendering pipeline.

    The N900 has a 800×480 pixel display, so my hope was to play a 800×480 video full-screen at 30 frames per second with low CPU use and good battery life.

    The ARM CPU in the N900 is quite fast. Doing a pure video-decode-only test, the original Theora library, which currently does not have ARM specific optimizations, is able to decode a 640×360 video at 76 frames per second, and it can even decode an 800×480 video at 32 frames per second. With the ARM optimized port by Robin Watts, those numbers become 110 FPS and 47 FPS. David’s DSP port produces 78 FPS and 39 FPS, and it leaves the CPU completely idle because the entire decode is off-loaded to the DSP. With these numbers, it’s clear that the N900 is up to the task of playing back video smoothly if we can get the bits on the screen fast enough.

    I am using plogg as a basis for experimentation using techniques applicable in the Firefox rendering engine. This requirement immediately excludes some techniques. For example, using hardware Y’CbCr overlays to display the video frames is excluded because it is not possible for Firefox to render arbitrary HTML content over the top of the overlay.

    Chris’s original version of plogg used SDL’s Y’CbCr overlay API, which uses a fast direct overlay path on most systems. This provided a baseline for playback performance. Decoding my 800×480 test video with the DSP, it was possible to play back at 33 FPS with around 20% CPU idle. Unlike the decode-only benchmarks mentioned above, the plogg benchmarks are playing both audio and video with correct A/V synchronisation. With 44.1kHz stereo audio, I observed that 10-15% of the device’s CPU is used by PulseAudio. This indicates that audio playback may constitute a significant amount of processor time with some configurations.

    Because there was already work underway to provide OpenGL accelerated compositing in Firefox with the newly conceived Layers API, it seemed logical to try using a GLSL fragment shader to off-load colour-space conversion to the GPU. This turned out to be too slow to play back a full-screen video.

    Looking at the list of vendor-specific OpenGL extensions available on the N900, I discovered the texture streaming API. This allows a program to directly map texture memory and copy Y’CbCr data into that memory without having to perform an expensive texture upload or colour-space conversion. The colour-space conversion is off-loaded to dedicated graphics hardware inaccessible via the standard OpenGL APIs. Using this and the modified bc-cat kernel module from the gst-bc-cat project, it’s possible to play back at 26 FPS with 81% CPU idle.

    One drawback of the current bc-cat kernel driver is that there is a very limited set of texture formats supported (NV12, UYVY, RGB565, and YUYV), and none of them are the same as what Theora produces. To work around this, a format conversion is required to convert planar Y’CbCr to packed 4:2:2 UYVY. Fortunately, this conversion is much simpler than a full colour-space conversion. Timothy Terriberry sent me a couple of patches to off-load this conversion work to the DSP. If it’s possible to extend the bc-cat driver to support texture formats compatible with Theora’s output, performance can be further improved.

    The test files used for benchmarking were: Big Buck Bunny (video: 640×360 @ 500 kbps 24 FPS, audio: 64 kbps 48kHz stereo, 9m 56s, 40MB) and the movie trailer for 9 (video: 800×480 @ 2 Mbps 23.98 FPS, audio: 44.1kHz stereo, 2m 30s, 30MB). Benchmarks were run with the CPU frequency fixed at 600MHz.

    In summary, it’s possible to play full-screen Ogg Theora videos on the N900 at full frame rates with low CPU use by off-loading video decoding to the DSP and colour-space conversion and painting to the GPU. There are opportunities for optimization left, tuning for battery life needs to be investigated, and the integration into Firefox still needs to be done.

    Video decode-only:

    Decoder FPS (800×480) Idle CPU
    libtheora 1.1 32 0.7%
    TheorARM 0.04 47 0.4%
    leonora (DSP) 39 99.0%

    Playback (video decode + paint, audio decode + not played). DSP decoding video:

    Method FPS (800×480) Idle CPU
    SDL/overlay 33 20.0%
    OpenGL/bc-cat 26 81.4%
  8. People of HTML5 – Remy Sharp

    HTML5 needs spokespeople to work. There are a lot of people out there who took on this role, and here at Mozilla we thought it is a good idea to introduce some of them to you with a series of interviews and short videos. The format is simple – we send the experts 10 questions to answer and then do a quick video interview to let them introduce themselves and ask for more detail on some of their answers.

    Leggi la traduzione in italiano

    Remy SharpToday we are featuring Remy Sharp co-author of Introducing HTML5 and organiser of the Full Frontal conference in Brighton, England.

    Remy is one of those ubiquitous people of HTML5. Whenever something needed fixing, there is probably something on GitHub that Remy wrote that helps you. He is also very English and doesn’t mince his words much.

    You can find Remy on Twitter as @rem.

    The video interview

    Watch the video on YouTube or Download it from Archive.org as MP4 (98 MB), OGG (70 MB) or WebM (68MB)

    Ten questions about HTML5 for Remy Sharp

    1) Reading “Introducing HTML5″ it seems to me that you were more of the API – focused person and Bruce the markup guy. Is that a fair assumption? What is your background and passion?

    That’s spot on. Bruce asked me to join the project as the “JavaScript guy” – which is the slogan I wear under my clothes and frequently reveal in a superman ‘spinning around’ fashion (often at the surprise of clients).

    My background has always been coding – even from a young age, my dad had me copying out listings from old spectrum magazines only to result in hours of typing and some random error that I could never debug.

    As I got older I graduated to coding in C but those were the days the SDKs were 10Mb downloaded over a 14kb modem, and compile in to some really odd environment. Suffice to say I didn’t get very far.

    Then along came JavaScript. A programming language that didn’t require any special development environment. I could write the code in Notepad on my dodgy Window 95 box, and every machine came with the runtime: the browser. Score!

    From that point on the idea of instant gratification from the browser meant that I was converted – JavaScript was the way for me.

    Since then I’ve worked on backend environments too (yep, I’m a Perl guy, sorry!), but always worked and played in the front end in some way or another. However, since started on my own in 2006, it’s allowed me to move focus almost entirely on the front end, and specialise in JavaScript. Basically, work-wise: I’m a pig in shit [Ed: for our non-native English readers, he means "happy")].

    2) From a programmer’s point of view, what are the most exciting bits about the HTML5 standard? What would you say is something every aspiring developer should get their head around first?

    For me, the most exciting aspects of HTML5 is the depth of the JavaScript APIs. It’s pretty tricky to explain to Joe Bloggs that actually this newly spec’ed version of HTML isn’t mostly HTML; it’s mostly JavaScript.

    I couldn’t put my finger on one single part of the spec, only because it’s like saying which is your favourite part of CSS (the :target selector – okay, so I can, but that’s not the point!). What’s most exciting to me is that HTML5 is saying that the browser is the platform that we can deliver real applications – take this technology seriously.

    If an aspiring developer wanted something quick and impressive, I’d say play around with the video API – by no means is this the best API, just an easy one.

    If they really wanted to blow people away with something amazing using HTML5, I’d say learn JavaScript (I’m assuming they’re already happy with HTML and CSS). Get a book like JavaScript: The Good Parts and then get JavaScript Patterns and master the language. Maybe, just maybe, then go buy Introducing HTML5, it’s written by two /really/ good looking (naked) guys: http://flic.kr/p/8iyQTE and http://flic.kr/p/8iy6Z1 [Ed: maybe NSFW, definitely disturbing].

    3) In your book you wrote a nice step-by-step video player for HTML5 video. What do you think works well with the Video APIs and what are still problems that need solving?

    The media API is dirt simple, so it means working with video and audio is a doddle. For me, most of it works really well (so long as you understand the loading process and the events).

    Otherwise what’s really quite neat, is the fact I can capture the video frames and mess with them in a canvas element – there’s lots of fun that can be had there (see some of Paul Rouget’s demos for that!).

    What sucks, and sucks hard, is the spec asks vendors, ie. browser makers, *not* to implement full screen mode. It uses security concerns as the reason (which I can understand), but Flash solved this long ago – so why not follow their lead on this particular problem? If native video won’t go full screen, it will never be a competitive alternative to Flash for video.

    That all said, I do like that the folks behind WebKit went and ignored the spec, and implemented full screen. The specs are just guidelines, and personally, I think browsers should be adding this feature.

    4) Let’s talk a bit about non-HTML5 standards, like Geolocation. I understand you did some work with that and found that some parts of the spec work well whilst others less so. Can you give us some insight?

    On top of HTML5 specification there’s a bunch more specs that make the browser really, really exciting. If we focus on the browser being released today (IE9 included) there’s a massive amount that can be done that we couldn’t do 10 years ago.

    There’s the “non-HTML5″ specs that actually were part of HTML5, but split out for good reason (so they can be better managed), like web storage, 2D canvas API and Web Sockets, but there’s also the /really/ “nothing-to-do-with-HTML5″ APIs (NTDWH5API!) like querySelector, XHR2 and the Device APIs. I’m super keen to try all of these out even if they’re not fully there in all the browsers.

    Geolocation is a great example of cherry picking technology. Playing against the idea that the technology isn’t fully implemented. Something I find myself ranting on and on about when it comes to the question of whether a developer should use HTML5. Only 50% of Geolocation is implemented in the browsers supporting it, in that they don’t have altitude, heading or speed – all of which are part of the spec. Does that stop mainstream apps like Google Maps from using the API? (clue: no).

    The guys writting the specs have done a pretty amazing job, and in particular there are few cases where the specs have been retrospectively written. XHR is one of these and now we’ve got a stable API being added in new browsers (i.e. IE6 sucks, yes, we all know that). Which leads us to drag and drop. The black sheep of the APIs. In theory a really powerful API that could make our applications rip, but the technical implementation is a shambles. PPK (Peter-Paul Koch) tore the spec a bit of a ‘new one’. It’s usable, but it’s confusing, and lacking.

    Generally, I’ve found the “non-HTML5″ specs to be a bit of mixed bag. Some are well supported in new browsers, some not at all. SVG is an oldie and now really viable with the help of JavaScript libraries such as Raphaël.js or SVGWeb (a Flash based solution). All in all, there’s lots of options available in JavaScript API nowadays compared to back in the dark ages.

    5) Let’s talk Canvas vs. SVG for a bit. Isn’t Canvas just having another pixel-based rectangle in the page much like Java Applets used to be? SVG, on the other hand is Vector based and thus would be a more natural tool to do something with open standards that we do in Flash now. When would you pick SVG instead of Canvas and vice versa?

    Canvas, in a lot of ways is just like the Flash drawing APIs. It’s not accessible and a total black box. The thing is, in the West, there’s a lot of businesses, rightly or wrongly, that want their fancy animation thingy to work on iOS. Since Flash doesn’t work there, canvas is a superb solution.

    However, you must, MUST, decide which technology to use. Don’t just use canvas because you saw a Mario Kart demo using it. Look at the pros and cons of each. SVG and the canvas API are not competitive technologies, they’re specially shaped hammers for specific jobs.

    Brad Neuberg did a superb job of summarising the pros and cons of each, and I’m constantly referring people to it (here’s the video).

    So it really boils down to:

    • Canvas: pixel manipulation, non-interactive and high animation
    • SVG: interactive, vector based

    Choose wisely young padawan!

    6) What about performance? Aren’t large Canvas solutions very slow, especially on mobile devices? Isn’t that a problem for gaming? What can be done to work around that?

    Well…yes and no. I’m finishing a project that has a large canvas animation going on, and it’s not slow on mobile devices (not that it was designed for those). The reason it’s not slow is because of the way the canvas animates. It doesn’t need to be constantly updating at 60fps.

    Performance depends on your application. Evaluate the environment, the technologies and make a good decision. I personally don’t think using a canvas for a really high performance game on a mobile is quite ready. I don’t think the devices have the ommph to get the job done – but there’s a hidden problem – the browser in the device isn’t quite up to it. Hardware acceleration is going to help, a lot, but today, right now, I don’t think we’ll see games like Angry Birds written in JavaScript.

    That said… I’ve seriously considered how you could replicate a game like Canabalt using a mix of canvas, DIVs and CSS. I think it might be doable ::throws gauntlet::

    I think our community could actually learn a lot from the Flash community. They’ve been through all of this already. Trying to make old versions of Flash from years back do things that were pushing the boundaries. People like Seb Lee-Delisle (@seb_ly / http://seb.ly) are doing an amazing job of teaching both the Flash and JavaScript community.

    7) A feature that used to be HTML5 and is now an own spec is LocalStorage and its derivatives Session Storage or the full-fledged WebSQL and IndexedDB. Another thing is offline storage. There seems to be a huge discussion in developer circles about what to use when and if NoSQL solutions client side are the future or not. What are your thoughts? When can you use what and what are the drawbacks?

    Personally I love seeing server-less applications. Looking at the storage solutions I often find it difficult to see why you wouldn’t use WebStorage every time.

    In a way it acts like (in my limited experience of) NoSQL, in that you lookup a key and get a result.

    Equally, I think SQL in the browser is over the top. Like you’re trying to use the storage method *you* understand and forcing it into the browser. Seems like too much work for too little win.

    Offline Apps, API-wise, ie. the application cache is /really/ sexy. Like sexy with chocolate on top sexy. The idea that our applications can run without the web, or switch when it detects it’s offline is really powerful. The only problem is that the events are screwed. The event to say your app is now offline requires the user to intervene via the browser menu, telling the browser to “work in offline mode”. A total failure of experience. What’s worse is that, as far as I know, there’s no plan to make offline event fire properly :-(

    That all said, cookies are definitely dead for me. I’ve yet to find a real solution for cookies since I found the Web Storage API – and there’s a good decent number of polyfills for Web Storage – so there’s really no fear of using the API.

    8) Changing the track a bit, you’ve built the HTML5shiv to make HTML5elements styleable in IE. This idea sparked quite a lot of other solutions to make IE6 work with the new technologies (or actually simulate them). Where does this end? Do you think it is worth while to write much more code just to have full IE6 support?

    There’s two things here:

    1. Supporting IE6 (clue: don’t)
    2. Polyfills

    IE6, seriously, and for the umpteenth time, look at your users. Seriously. I know the project manager is going to say they don’t know what the figures are, in that case: find out! Then, once you’ve got the usage stats in hand, you know your audience and you know what technology they support.

    If they’re mostly IE6 users, then adding native video with spinning and dancing canvas effect isn’t going to work – not even with shims and polyfills. IE6 is an old dog that just isn’t up to doing the mileage he used to be able to do back in his prime. But enough on this subject – the old ‘do I, or don’t I developer for IE6′ is long in the tooth.

    Polyfills – that’s a different matter. They’re not there to support IE6, they’re there to bring browsers up to your expectations as a developer. However, I’d ask that you carefully consider them before pulling them in. The point of these scripts is they plug missing APIs in those older browsers. “Old browsers” doesn’t particularly mean IE6. For example, the Web Sockets API has a polyfill by way of Flash. If native Web Sockets aren’t there, Flash fills the gap, but the API is exposed in exactly the same manner, meaning that you don’t have to fork your code.

    I don’t think people should be pulling in scripts just for the hell of it. You should consider what you’re trying to achieve and decide whether X technology is the right fit. If it is, and you know (or expect) your users have browsers that don’t support X technology – should you plug it with JavaScript or perhaps should you consider a different technology?

    This exact same argument rings true for when someone adds jQuery just to add or remove a class from an element. It’s simply not worth it – but clearly that particular developer didn’t really understand what they needed to do. So is education the solution? I should hope so.

    9) Where would you send people if they want to learn about HTML5? What are tutorials that taught you a lot? Where should interested people hang out?

    HTML5 Doctor – fo sho’. :)

    I tend to also direct people to my http://html5demos.com simply to encourage viewing source, and hacking away.

    Regarding what tutorials taught me – if I’m totally honest, the place I’ve learnt the most from is actually HTML5Doctor.com. There’s some pretty good JavaScript / API tutorials coming from the chaps at http://bocoup.com. Otherwise, I actually spend a lot of time just snooping through the specifications, looking for bits that I’ve not seen before and generally poking them with a stick.

    10) You have announced that you are concentrating on building a framework to make Websockets easy to work with. How is that getting on and what do you see Websockets being used for in the future? In other words, why the fascination?

    Concentrating is a strong word ;-) but it is true, I’ve started working on a product that abstracts Web Sockets to a service. Not the API alone, since it’s so damn simple, but the server setup: creating sessions, user control flow, waiting for users and more.

    The service is called Förbind. Swedish for “connect”, ie. to connect your users. It’s still early days, but I hope to release alpha access to forbind.net this month.

    I used to work in finance web sites and real-time was the golden egg: to get that data as soon as it was published. So now that it’s available in a native form in the browser, I’m all over it!

    What’s more, I love the idea of anonymous users. I created a bunch of demos where the user can contribute to something without ever really revealing themselves, and when the users come, you start to see how creative people are without really trying. Sure, you get a lot of cocks being drawn, but you also see some impressive ideas – my business 404 page for example allows people to leave a drawing, one of the most impressive is a Super Mario in all his glory. Anonymous users really interest me because as grey as things can seem sometimes, a stranger can easily inspire you.

    Do you know anyone I should interview for “People of HTML5″? Tell me on Twitter: @codepo8

  9. Fun With Fast JavaScript

    This post is by Vladimir Vukićević and is a re-post from his personal weblog.

    Fast JavaScript is a cornerstone of the modern web. In the past, application authors had to wait for browser developers to implement any complex functionality in the browser itself, so that they could access it from script code. Today, many of those functions can move straight into JavaScript itself. This has many advantages for application authors: there’s no need to wait for a new version of a browser before you can develop or ship your app, you can tailor the functionality to exactly what you need, and you can improve it directly (make it faster, higher quality, more precise, etc.).

    Here are two examples that show off what can be done with the improved JS engine and capabilities that will be present in Firefox 4. The first example shows a simple web-based Darkroom that allows you to perform color correction on an image. The HTML+JS is around 700 lines of code, not counting jQuery. This is based on a demo that’s included with Google’s Native Client (NaCl) SDK; in that demo, the color correction work is done inside native code going through NaCl. That demo (originally presented as “too slow to run in JavaScript”) is a few thousand lines of code, and involves downloading and installing platform-specific compilers, multiple steps to test/deploy code, and installing a plugin on the browser side.

    I get about 15-16 frames per second with the default zoomed out image (around 5 million pixels per second — that number won’t be affected by image size) on my MacBook Pro, which is definitely fast enough for live manipulation. The algorithm could be tightened up to make this faster still. Further optimizations to the JS engine could help here as well; for example, I noticed that we spend a lot of time doing floating point to integer conversions for writing the computed pixels back to the display canvas, due to how the canvas API specifies image data handling.

    The Web Darkroom tool also supports drag & drop, so you can take any image from your computer and drop it onto the canvas to load it. A long (long!) time ago, back in 2006, I wrote an addon called “Croppr!”. It was intended to be used with Flickr, allowing users to play around with custom crops of any image, and then leave crop suggestions in comments to be viewed using Croppr. It almost certainly doesn’t work any more, but it would be neat to update it: this time with both cropping and color correction. Someone with the addon (perhaps a Jetpack now!) could then visit a Flickr photo and experiment, and leave suggestions for the photographer.

    The second example is based on some work that Dave Humphrey and others have been doing to bring audio manipulation to the web platform. Originally, their spec included a pre-computed FFT with each audio frame delivered to the web app. I suggested that there’s no need for this — while a FFT is useful for some applications, for others it would be wasted work. Those apps that want a FFT could implement one in JS. Some benchmark numbers backed this up — using the typed arrays originally created for WebGL, computing an FFT in JS was approaching the speed of native code. Again, both could be sped up (perhaps using SSE2 or something like Mono.Simd on the JS side), but it’s fast enough to be useful already.

    The demo shows this in action. A numeric benchmark isn’t really all that interesting, so instead I take a video clip, and as it’s playing, I extract a portion of the green channel of each frame and compute its 2D FFT, which is then displayed. The original clip plays at 24 frames per second, so that’s the upper bound of this demo. Using Float32 typed arrays, the computation and playback proceeds at around 22-24fps for me.

    You can grab the video controls and scrub to a specific frame. (The frame rate calculation is only correct while the video is playing normally, not while you’re scrubbing.) The video source uses Theora, so you’ll need a browser that can play Theora content. (I didn’t have a similar clip that uses WebM, or I could have used that.)

    These examples are demonstrating the strength of the trace-based JIT technique that Firefox has used for accelerating JavaScript since Firefox 3.5. However, not all code can see such dramatic speedups from that type of acceleration. Because of that, we’ll be including a full method-based JIT for Firefox 4 (for more details, see David Anderson’s blog, as well as David Mandelin’s blog). This will provide significantly faster baseline JS performance, with the trace JIT becoming a turbocharger for code that it would naturally apply to.

    Combining fast JavaScript performance alongside new web platform technologies such as WebGL and Audio will make for some pretty exciting web apps, and I’m looking forward to seeing what developers do with them!

    Edit: Made some last-minute changes to the demos, which ended up pulling in a slightly broken version of jQuery UI that wasn’t all that happy with Safari. Should be fixed now!

  10. Delivering the good message of local storage

    As you might know there are an incredible amount of advent calendar blogs out at the moment each delivering one cool article for each day of December until Christmas.

    Today (6/12/11) two calendar blogs delivered an article of mine talking about the benefits of using local storage in browsers and how to implement it.

    You can find the English version on the 24ways.org calendar and a German translation at the Webkrauts calendar.

    In essence, the article explains why it is a good idea to use local storage, explains how to use it, how to work around the issue that local storage can only store strings and gives example code on how to speed up web service use by caching information client-side much like you would do it on a server.

    Have a read and go speed up your own solutions by using what browsers provide you these days.