Mozilla

JavaScript Articles

Sort by:

View:

  1. Compiling to JavaScript, and Debugging with Source Maps

    Update 2013/05/29: I have updated the article to reflect recent changes in the source map specification where the //@ syntax for linking a source map to a script has been deprecated in favor of //# due to problems with Internet Explorer.

    This is a tutorial on how to write a compiler which generates JavaScript as its target language, and maintains line and column meta-data in source maps for debugging. Storing line and column coordinates in a source map allows the end-user of the compiler to debug the source code that they wrote, rather than the ugly, generated JavaScript they are not familiar with.

    In this tutorial, we will be compiling a small Reverse Polish Notation, or RPN, language to JavaScript. The language is super simple, and is nothing more than simple arithmetic with variable storage and output capabilities. We are keeping the language simple so that we can focus on integrating source maps with the compiler, rather than language implementation details.

    Availability

    Initial support for source maps in the debugger is available in Firefox 23 (Aurora at time of writing) with more improvements coming in Firefox 24 (Nightly at time of writing). Chrome DevTools also have support for source maps.

    Overview of the Source Language

    RPN uses postfix notation, meaning that the operator follows its two operands. One of the benefits of RPN is that as long as we limit ourselves to binary operators, we do not need any parentheses, and do not need to worry about operator precedence.

    Here is an example program in our source language:

    a 5 =;
    b 3 =;
    c a b + 4 * =;

    This is an equivalent program written in a language which uses infix notation for its arithmetic operators:

    a = 5;
    b = 3;
    c = (a + b) * 4;

    Our language will support addition, subtraction, multiplication, division, assignment, and printing. The print operator’s first operand is the value to print, the second operand is how many times to print the value and must be greater than or equal to one:

    5 1 print;
    # Output:
    # 5
     
    3 4 print;
    # Output:
    # 3
    # 3
    # 3
    # 3
     
    4 print;
    # Syntax error
     
    n -1 =;
    4 n print;
    # Runtime error

    Lastly, division by zero should throw an error:

    5 0 /;
    # Runtime error

    Getting Setup

    We will be writing our compiler on Node.js, using Jison to generate the parser for our language from a grammar, and using the source-map library to help generate source maps.

    The first step is to download and install Node.js if you don’t already have it on your system.

    After you have installed Node.js, use its package manager npm to create a new project for the compiler:

    $ mkdir rpn
    $ cd rpn/
    $ npm init .

    After the last command, npm will prompt you with a bunch of questions. Enter your name and email, answer ./lib/rpn.js for the main module/entry point, and just let npm use the defaults that it supplies for the rest of the questions.

    Once you have finished answering the prompts, create the directory layout for the project:

    $ mkdir lib
    $ touch lib/rpn.js
    $ mkdir -p lib/rpn

    The public API for the compiler will reside within lib/rpn.js, while the submodules we use to implement various things such as the lexer and abstract syntax tree will live in lib/rpn/*.js.

    Next, open up the package.json file and add jison and source-map to the project’s dependencies:

    ...
    "dependencies": {
      "jison": ">=0.4.4",
      "source-map": ">=0.1.22"
    },
    ...

    Now we will install a link to our package in Node.js’s globally installed packages directory. This allows us to import our package from the Node.js shell:

    $ npm link .

    Make sure that everything works by opening the Node.js shell and importing our package:

    $ node
    > require("rpn")
    {}

    Writing the Lexer

    A lexer (also known as a scanner or tokenizer) breaks the inputted raw source code into a stream of semantic tokens. For example in our case, we would want to break the raw input string "5 3 +;" into something like ["5", "3", "+", ";"].

    Because we are using Jison, rather than writing the lexer and parser by hand, our job is much easier. All that is required is to supply a list of rules that describe the types of tokens we are expecting. The left hand side of the rules are regular expressions to match individual tokens, the right hand side are the snippets of code to execute when an instance of the corresponding token type is found. These tokens will be passed on to the parser in the next phase of the compiler.

    Create the rules for lexical analysis in lib/rpn/lex.js:

    exports.lex = {
      rules: [
        ["\s+",                   "/* Skip whitespace! */"],
        ["#.*\n",                 "/* Skip comments! */"],
        [";",                      "return 'SEMICOLON'"],
        ["\-?[0-9]+(\.[0-9]+)?", "return 'NUMBER';"],
        ["print",                  "return 'PRINT';"],
        ["[a-zA-Z][a-zA-Z0-9_]*",  "return 'VARIABLE';"],
        ["=",                      "return '=';"],
        ["\+",                    "return '+';"],
        ["\-",                    "return '-';"],
        ["\*",                    "return '*';"],
        ["\/",                    "return '/';"],
        ["$",                      "return 'EOF';"]
      ]
    };

    Writing the Parser

    The parser takes the tokens from the lexer one at a time and confirms that the input is a valid program in our source language.

    Once again, the task of writing the parser is much easier than it would otherwise be thanks to Jison. Rather than writing the parser ourselves, Jison will programmatically create one for us if we provide a grammar for the language.

    If all we cared about was whether the input was a valid program, we would stop here. However, we are also going to compile the input to JavaScript, and to do that we need to create an abstract syntax tree. We build the AST in the code snippets next to each rule.

    A typical grammar contains productions with the form:

    LeftHandSide → RightHandSide1
                 | RightHandSide2
                 ...

    However, in Jison we are a) writing in JavaScript, and b) also providing code to execute for each rule so that we can create the AST. Therefore, we use the following format:

    LeftHandSide: [
      [RightHandSide1, CodeToExecute1],
      [RightHandSide2, CodeToExecute2],
      ...
    ]

    Inside the code snippets, there are a handful of magic variables we have access to:

    • $$: The value of the left hand side of the production.
    • $1/$2/$3/etc: The value of the the nth form in the right hand side of the production.
    • @1/@2/@3/etc: An object containing the line and column coordinates where the nth form in the right hand side of the production was parsed.
    • yytext: The full text of currently matched rule.

    Using this information, we can create the grammar in lib/rpn/bnf.js:

    exports.bnf = {
      start: [
        ["input EOF", "return $$;"]
      ],
      input: [
        ["",           "$$ = [];"],
        ["line input", "$$ = [$1].concat($2);"]
      ],
      line: [
        ["exp SEMICOLON", "$$ = $1;"]
      ],
      exp: [
        ["NUMBER",           "$$ = new yy.Number(@1.first_line, @1.first_column, yytext);"],
        ["VARIABLE",         "$$ = new yy.Variable(@1.first_line, @1.first_column, yytext);"],
        ["exp exp operator", "$$ = new yy.Expression(@3.first_line, @3.first_column, $1, $2, $3);"]
      ],
      operator: [
        ["PRINT", "$$ = new yy.Operator(@1.first_line, @1.first_column, yytext);"],
        ["=",     "$$ = new yy.Operator(@1.first_line, @1.first_column, yytext);"],
        ["+",     "$$ = new yy.Operator(@1.first_line, @1.first_column, yytext);"],
        ["-",     "$$ = new yy.Operator(@1.first_line, @1.first_column, yytext);"],
        ["*",     "$$ = new yy.Operator(@1.first_line, @1.first_column, yytext);"],
        ["/",     "$$ = new yy.Operator(@1.first_line, @1.first_column, yytext);"]
      ]
    };

    Implementing the Abstract Syntax Tree

    Create the definitions for the abstract syntax tree nodes in lib/rpn/ast.js.

    Since we will be maintaining line and column information in all of the AST nodes, we can reuse some code by making a base prototype:

    var AstNode = function (line, column) {
      this._line = line;
      this._column = column;
    };

    The definitions for the rest of the AST nodes are pretty straight forward. Link up the prototype chain, assign relevant attributes, and don’t forget to call AstNode‘s constructor:

    exports.Number = function (line, column, numberText) {
      AstNode.call(this, line, column);
      this._value = Number(numberText);
    };
    exports.Number.prototype = Object.create(AstNode.prototype);
     
    exports.Variable = function (line, column, variableText) {
      AstNode.call(this, line, column);
      this._name = variableText;
    };
    exports.Variable.prototype = Object.create(AstNode.prototype);
     
    exports.Expression = function (line, column, operand1, operand2, operator) {
      AstNode.call(this, line, column);
      this._left = operand1;
      this._right = operand2;
      this._operator = operator;
    };
    exports.Expression.prototype = Object.create(AstNode.prototype);
     
    exports.Operator = function (line, column, operatorText) {
      AstNode.call(this, line, column);
      this.symbol = operatorText;
    };
    exports.Operator.prototype = Object.create(AstNode.prototype);

    Compilation

    Generated JavaScript

    Before we generate JavaScript, we need a plan. There are a couple ways we can structure the outputted JavaScript.

    One strategy is to translate the RPN expressions to the equivalent human readable JavaScript expression we would create if we had been writing JavaScript all along. For example, if we were to port this RPN example:

    a 8 =;
    b 2 =;
    c a b 1 - / =;

    We might write the following JavaScript:

    var a = 8;
    var b = 3;
    var c = a / (b - 1);

    However, this means that we are completely adopting the nuances of JavaScript’s arithmetic. In an earlier example, we saw that a helpful runtime error was thrown when any number was divided by zero. Most languages throw an error when this occurs, however JavaScript does not; instead, the result is Infinity. Therefore, we can’t completely embrace JavaScript’s arithmetic system, and we must generate some code to check for divide-by-zero errors ourselves. Adding this code gets a little tricky if we want to maintain the strategy of generating human readable code.

    Another option is treating the JavaScript interpreter as a stack machine of sorts and generating code that pushes and pops values to and from a stack. Furthermore, stack machines are a natural fit for evaluating RPN. In fact, it is such a good fit that RPN “was independently reinvented by F. L. Bauer and E. W. Dijkstra in the early 1960s to reduce computer memory access and utilize the stack to evaluate expressions.”

    Generating JavaScript code for the same example above, but utilizing the JavaScript interpreter as a stack machine, might look something like this:

    push(8);
    push('a');
    env[pop()] = pop();
    push(2);
    push('b');
    env[pop()] = pop();
    push('a');
    push('b');
    push(1);
    temp = pop();
    push(pop() - temp);
    temp = pop();
    if (temp === 0) throw new Error("Divide by zero");
    push(pop() / temp);
    push('c');
    env[pop()] = pop();

    This is the strategy we will follow. The generated code is a bit larger, and we will require a preamble to define push, pop, etc, but compilation becomes much easier. Furthermore, the fact that the generated code isn’t as human readable only highlights the benefits of using source maps!

    Creating Source Maps

    If we weren’t generating source maps along with our generated JavaScript, we could build the generated code via concatenating strings of code:

    code += "push(" + operand1.compile() + " "
      + operator.compile() + " "
      + operand2.compile() + ");n";

    However, this doesn’t work when we are creating source maps because we need to maintain line and column information. When we concatenate strings of code, we lose that information.

    The source-map library contains SourceNode for exactly this reason. If we add a new method on our base AstNode prototype, we can rewrite our example like this:

    var SourceNode = require("source-map").SourceNode;
    AstNode.prototype._sn = function (originalFilename, chunk) {
      return new SourceNode(this._line, this._column, originalFilename, chunk);
    };
     
    ...
     
    code = this._sn("foo.rpn", [code,
                                "push(",
                                operand1.compile(), " ",
                                operator.compile(), " ",
                                operand2.compile(), ");n"]);

    Once we have completed building the SourceNode structure for the whole input program, we can generate the compiled source and the source map by calling the SourceNode.prototype.toStringWithSourceMap method. This method returns an object with two properties: code, which is a string containing the generated JavaScript source code; and map, which is the source map.

    Implementing Compilation

    Now that we have a strategy for generating code, and understand how to maintain line and column information so that we can generate source maps easily, we can add the methods to compile our AST nodes to lib/rpn/ast.js.

    To play nice with the global JavaScript environment, we will namespace push, pop, etc, under __rpn.

    function push(val) {
      return ["__rpn.push(", val, ");n"];
    }
     
    AstNode.prototype.compile = function (data) {
      throw new Error("Not Yet Implemented");
    };
    AstNode.prototype.compileReference = function (data) {
      return this.compile(data);
    };
    AstNode.prototype._sn = function (originalFilename, chunk) {
      return new SourceNode(this._line, this._column, originalFilename, chunk);
    };
     
    exports.Number.prototype.compile = function (data) {
      return this._sn(data.originalFilename,
                      push(this._value.toString()));
    };
     
    exports.Variable.prototype.compileReference = function (data) {
      return this._sn(data.originalFilename,
                      push(["'", this._name, "'"]));
    };
    exports.Variable.prototype.compile = function (data) {
      return this._sn(data.originalFilename,
                      push(["window.", this._name]));
    };
     
    exports.Expression.prototype.compile = function (data) {
      var temp = "__rpn.temp";
      var output = this._sn(data.originalFilename, "");
     
      switch (this._operator.symbol) {
      case 'print':
        return output
          .add(this._left.compile(data))
          .add(this._right.compile(data))
          .add([temp, " = __rpn.pop();n"])
          .add(["if (", temp, " <= 0) throw new Error('argument must be greater than 0');n"])
          .add(["if (Math.floor(", temp, ") != ", temp,
                ") throw new Error('argument must be an integer');n"])
          .add([this._operator.compile(data), "(__rpn.pop(), ", temp, ");n"]);
      case '=':
        return output
          .add(this._right.compile(data))
          .add(this._left.compileReference(data))
          .add(["window[__rpn.pop()] ", this._operator.compile(data), " __rpn.pop();n"]);
      case '/':
        return output
          .add(this._left.compile(data))
          .add(this._right.compile(data))
          .add([temp, " = __rpn.pop();n"])
          .add(["if (", temp, " === 0) throw new Error('divide by zero error');n"])
          .add(push(["__rpn.pop() ", this._operator.compile(data), " ", temp]));
      default:
        return output
          .add(this._left.compile(data))
          .add(this._right.compile(data))
          .add([temp, " = __rpn.pop();n"])
          .add(push(["__rpn.pop() ", this._operator.compile(data), " ", temp]));
      }
    };
     
    exports.Operator.prototype.compile = function (data) {
      if (this.symbol === "print") {
        return this._sn(data.originalFilename,
                        "__rpn.print");
      }
      else {
        return this._sn(data.originalFilename,
                        this.symbol);
      }
    };

    Gluing it Together

    From here we have done all the difficult work, and we can run a victory lap by connecting the modules together with a public API, and by creating a command line script to call the compiler.

    The public API resides in lib/rpn.js. It also contains the preamble, to initialize __rpn:

    var jison = require("jison");
    var sourceMap = require("source-map");
    var lex = require("./rpn/lex").lex;
    var bnf = require("./rpn/bnf").bnf;
     
    var parser = new jison.Parser({
      lex: lex,
      bnf: bnf
    });
     
    parser.yy = require("./rpn/ast");
     
    function getPreamble () {
      return new sourceMap.SourceNode(null, null, null, "")
        .add("var __rpn = {};n")
        .add("__rpn._stack = [];n")
        .add("__rpn.temp = 0;n")
     
        .add("__rpn.push = function (val) {n")
        .add("  __rpn._stack.push(val);n")
        .add("};n")
     
        .add("__rpn.pop = function () {n")
        .add("  if (__rpn._stack.length > 0) {n")
        .add("    return __rpn._stack.pop();n")
        .add("  }n")
        .add("  else {n")
        .add("    throw new Error('can\'t pop from empty stack');n")
        .add("  }n")
        .add("};n")
     
        .add("__rpn.print = function (val, repeat) {n")
        .add("  while (repeat-- > 0) {n")
        .add("    var el = document.createElement('div');n")
        .add("    var txt = document.createTextNode(val);n")
        .add("    el.appendChild(txt);n")
        .add("    document.body.appendChild(el);n")
        .add("  }n")
        .add("};n");
    }
     
    exports.compile = function (input, data) {
      var expressions = parser.parse(input.toString());
      var preamble = getPreamble();
     
      var result = new sourceMap.SourceNode(null, null, null, preamble);
      result.add(expressions.map(function (exp) {
        return exp.compile(data);
      }));
     
      return result;
    };

    Create the command line script in bin/rpn.js:

    #!/usr/bin/env node
    var fs = require("fs");
    var rpn = require("rpn");
     
    process.argv.slice(2).forEach(function (file) {
      var input = fs.readFileSync(file);
      var output = rpn.compile(input, {
        originalFilename: file
      }).toStringWithSourceMap({
        file: file.replace(/.[w]+$/, ".js.map")
      });
      var sourceMapFile = file.replace(/.[w]+$/, ".js.map");
      fs.writeFileSync(file.replace(/.[w]+$/, ".js"),
                       output.code + "n//# sourceMappingURL=" + sourceMapFile);
      fs.writeFileSync(sourceMapFile, output.map);
    });

    Note that our script will automatically add the //# sourceMappingURL comment directive so that the browser’s debugger knows where to find the source map.

    After you create the script, update your package.json:

    ...
    "bin": {
      "rpn.js": "./bin/rpn.js"
    },
    ...

    And link the package again so that the script is installed on your system:

    $ npm link .

    Seeing Results

    Here is an RPN program that we can use to test our compiler. I have saved it in examples/simple-example.rpn:

    a 8 =;
    b 3 =;
    c a b 1 - / =;
    c 1 print;

    Next, compile the script:

    $ cd examples/
    $ rpn.js simple-example.rpn

    This generates simple-example.js and simple-example.js.map. When we include the JavaScript file in a web page we should see the result of the computation printed on the page:

    Screenshot of simple-example.rpn's result

    Great success!

    However, we aren’t always so lucky, and our arithmetic might have some errors. Consider the following example, examples/with-error.rpn:

    a 9 =;
    b 3 =;
    c a b / =;
    c a b c - / =;
    c 1 print;

    We can compile this script and include the resulting JavaScript in a web page, but this time we won’t see any output on the page.

    By opening the debugger, setting the pause on exceptions option, and reloading, we can see how daunting debugging without source maps can be:

    Screenshot of enabling pause on exceptions.

    Screenshot of debugging with-error.rpn without source maps.

    The generated JavaScript is difficult to read, and unfamiliar to anyone who authored the original RPN script. By enabling source maps in the debugger, we can refresh and the exact line where the error ocurred in our original source will be highlighted:

    Screenshot of enabling source maps.


    Screenshot of debugging with-error.rpn with source maps.

    The debugging experience with source maps is orders of magnitude improved, and makes compiling languages to JavaScript a serious possibility.

    At the end of the day though, the debugging experience is only as good as the information encoded in the source maps by your compiler. It can be hard to judge the quality of your source maps simply by looking at the set of source location coordinates that they are mapping between, so Tobias Koppers created a tool to let you easily visualize your source maps.

    Here is the visualization of one of our source maps:


    Screenshot of the source map visualization tool.

    Good luck writing your own compiler that targets JavaScript!

    References

  2. Detecting touch: it's the 'why', not the 'how'

    One common aspect of making a website or application “mobile friendly” is the inclusion of tweaks, additional functionality or interface elements that are particularly aimed at touchscreens. A very common question from developers is now “How can I detect a touch-capable device?”

    Feature detection for touch

    Although there used to be a few incompatibilities and proprietary solutions in the past (such as Mozilla’s experimental, vendor-prefixed event model), almost all browsers now implement the same Touch Events model (based on a solution first introduced by Apple for iOS Safari, which subsequently was adopted by other browsers and retrospectively turned into a W3C draft specification).

    As a result, being able to programmatically detect whether or not a particular browser supports touch interactions involves a very simple feature detection:

    if ('ontouchstart' in window) {
      /* browser with Touch Events
         running on touch-capable device */
    }

    This snippet works reliably in modern browser, but older versions notoriously had a few quirks and inconsistencies which required jumping through various different detection strategy hoops. If your application is targetting these older browsers, I’d recommend having a look at Modernizr – and in particular its various touch test approaches – which smooths over most of these issues.

    I noted above that “almost all browsers” support this touch event model. The big exception here is Internet Explorer. While up to IE9 there was no support for any low-level touch interaction, IE10 introduced support for Microsoft’s own Pointer Events. This event model – which has since been submitted for W3C standardisation – unifies “pointer” devices (mouse, stylus, touch, etc) under a single new class of events. As this model does not, by design, include any separate ‘touch’, the feature detection for ontouchstart will naturally not work. The suggested method of detecting if a browser using Pointer Events is running on a touch-enabled device instead involves checking for the existence and return value of navigator.maxTouchPoints (note that Microsoft’s Pointer Events are currently still vendor-prefixed, so in practice we’ll be looking for navigator.msMaxTouchPoints). If the property exists and returns a value greater than 0, we have touch support.

    if (navigator.msMaxTouchPoints > 0) {
      /* IE with pointer events running
         on touch-capable device */
    }

    Adding this to our previous feature detect – and also including the non-vendor-prefixed version of the Pointer Events one for future compatibility – we get a still reasonably compact code snippet:

    if (('ontouchstart' in window) ||
         (navigator.maxTouchPoints > 0) ||
         (navigator.msMaxTouchPoints > 0)) {
          /* browser with either Touch Events of Pointer Events
             running on touch-capable device */
    }

    How touch detection is used

    Now, there are already quite a few commonly-used techniques for “touch optimisation” which take advantage of these sorts of feature detects. The most common use cases for detecting touch is to increase the responsiveness of an interface for touch users.

    When using a touchscreen interface, browsers introduce an artificial delay (in the range of about 300ms) between a touch action – such as tapping a link or a button – and the time the actual click event is being fired.

    More specifically, in browsers that support Touch Events the delay happens between touchend and the simulated mouse events that these browser also fire for compatibility with mouse-centric scripts:

    touchstart > [touchmove]+ > touchend > delay > mousemove > mousedown > mouseup > click

    See the event listener test page to see the order in which events are being fired, code available on GitHub.

    This delay has been introduced to allow users to double-tap (for instance, to zoom in/out of a page) without accidentally activating any page elements.

    It’s interesting to note that Firefox and Chrome on Android have removed this delay for pages with a fixed, non-zoomable viewport.

    <meta name="viewport" value="... user-scalable = no ...">

    See the event listener with user-scalable=no test page, code available on GitHub.

    There is some discussion of tweaking Chrome’s behavior further for other situations – see issue 169642 in the Chromium bug tracker.

    Although this affordance is clearly necessary, it can make a web app feel slightly laggy and unresponsive. One common trick has been to check for touch support and, if present, react directly to a touch event (either touchstart – as soon as the user touches the screen – or touchend – after the user has lifted their finger) instead of the traditional click:

    /* if touch supported, listen to 'touchend', otherwise 'click' */
    var clickEvent = ('ontouchstart' in window ? 'touchend' : 'click');
    blah.addEventListener(clickEvent, function() { ... });

    Although this type of optimisation is now widely used, it is based on a logical fallacy which is now starting to become more apparent.

    The artificial delay is also present in browsers that use Pointer Events.

    pointerover > mouseover > pointerdown > mousedown > pointermove > mousemove > pointerup > mouseup > pointerout > mouseout > delay > click

    Although it’s possible to extend the above optimisation approach to check navigator.maxTouchPoints and to then hook up our listener to pointerup rather than click, there is a much simpler way: setting the touch-action CSS property of our element to none eliminates the delay.

    /* suppress default touch action like double-tap zoom */
    a, button {
      -ms-touch-action: none;
          touch-action: none;
    }

    See the event listener with touch-action:none test page, code available on GitHub.

    False assumptions

    It’s important to note that these types of optimisations based on the availability of touch have a fundamental flaw: they make assumptions about user behavior based on device capabilities. More explicitly, the example above assumes that because a device is capable of touch input, a user will in fact use touch as the only way to interact with it.

    This assumption probably held some truth a few years back, when the only devices that featured touch input were the classic “mobile” and “tablet”. Here, touchscreens were the only input method available. In recent months, though, we’ve seen a whole new class of devices which feature both a traditional laptop/desktop form factor (including a mouse, trackpad, keyboard) and a touchscreen, such as the various Windows 8 machines or Google’s Chromebook Pixel.

    As an aside, even in the case of mobile phones or tablets, it was already possible – on some platforms – for users to add further input devices. While iOS only caters for pairing an additional bluetooth keyboard to an iPhone/iPad purely for text input, Android and Blackberry OS also let users add a mouse.

    On Android, this mouse will act exactly like a “touch”, even firing the same sequence of touch events and simulated mouse events, including the dreaded delay in between – so optimisations like our example above will still work fine. Blackberry OS, however, purely fires mouse events, leading to the same sort of problem outlined below.

    The implications of this change are slowly beginning to dawn on developers: that touch support does not necessarily mean “mobile” anymore, and more importantly that even if touch is available, it may not be the primary or exclusive input method that a user chooses. In fact, a user may even transition between any of their available input methods in the course of their interaction.

    The innocent code snippets above can have quite annoying consequences on this new class of devices. In browsers that use Touch Events:

    var clickEvent = ('ontouchstart' in window ? 'touchend' : 'click');

    is basically saying “if the device support touch, only listen to touchend and not click” – which, on a multi-input device, immediately shuts out any interaction via mouse, trackpad or keyboard.

    Touch or mouse?

    So what’s the solution to this new conundrum of touch-capable devices that may also have other input methods? While some developers have started to look at complementing a touch feature detection with additional user agent sniffing, I believe that the answer – as in so many other cases in web development – is to accept that we can’t fully detect or control how our users will interact with our web sites and applications, and to be input-agnostic. Instead of making assumptions, our code should cater for all eventualities. Specifically, instead of making the decision about whether to react to click or touchend/touchstart mutually exclusive, these should all be taken into consideration as complementary.

    Certainly, this may involve a bit more code, but the end result will be that our application will work for the largest number of users. One approach, already familiar to developers who’ve strived to make their mouse-specific interfaces also work for keyboard users, would be to simply “double up” your event listeners (while taking care to prevent the functionality from firing twice by stopping the simulated mouse events that are fired following the touch events):

    blah.addEventListener('touchend', function(e) {
      /* prevent delay and simulated mouse events */
      e.preventDefault();
      someFunction()
    });
    blah.addEventListener('click', someFunction);

    If this isn’t DRY enough for you, there are of course fancier approaches, such as only defining your functions for click and then bypassing the dreaded delay by explicitly firing that handler:

    blah.addEventListener('touchend', function(e) {
      /* prevent delay and simulated mouse events */
      e.preventDefault();
      /* trigger the actual behavior we bound to the 'click' event */
      e.target.click();
    })
    blah.addEventListener('click', function() {
      /* actual functionality */
    });

    That last snippet does not cover all possible scenarios though. For a more robust implementation of the same principle, see the FastClick script from FT labs.

    Being input-agnostic

    Of course, battling with delay on touch devices is not the only reason why developers want to check for touch capabilities. Current discussions – such as this issue in Modernizr about detecting a mouse user – now revolve around offering completely different interfaces to touch users, compared to mouse or keyboard, and whether or not a particular browser/device supports things like hovering. And even beyond JavaScript, similar concepts (pointer and hover media features) are being proposed for Media Queries Level 4. But the principle is still the same: as there are now common multi-input devices, it’s not straightforward (and in many cases, impossible) anymore to determine if a user is on a device that exclusively supports touch.

    The more generic approach taken in Microsoft’s Pointer Events specification – which is already being scheduled for implementation in other browser such as Chrome – is a step in the right direction (though it still requires extra handling for keyboard users). In the meantime, developers should be careful not to draw the wrong conclusions from touch support detection and avoid unwittingly locking out a growing number of potential multi-input users.

    Further links

  3. Serving Backbone for Robots & Legacy Browsers

    I like the Single Page Application model and Backbone.js, because I get it. As a former Java developer, I am used to object oriented coding and events for messaging. Within our HTML5 consultancy, SC5, Backbone has become almost a synonym for single page applications, and it is easy to move between projects because everybody gets the same basic development model.

    We hate the fact that we need to have server side workarounds for robots. Making applications crawlable is very reasonable business-wise, but ill-suited for the SPA model. Data-driven single page applications typically get only served a HTML page skeleton, and the actual construction of all the visual elements is done in browser. Any other way would easily lead into double code paths (one on a browser, one on a server). Some have even concerned on giving up the SPA model and moving the logic and representation back to the server.

    Still, we should not let the tail wag the dog. Why sacrifice the user experience of 99,9% of the users for the sake of the significant 0.1%? Instead, for such low traffic, a better suited solution would be to create a server side workaround.

    Solving the Crawling Problem with an App Proxy

    The obvious solution for the problem is running the same application code at the both ends. Like in the digital television transformation, a set-top box would fill in the gap of legacy televisions by crunching the digital signal into analog form. Correspondingly, a proxy would run the application server side and serve the resulting HTML back to the crawlers. Smart browsers would get all the interactive candy, whereas crawlers and legacy browsers would just get the pre-processed HTML document.

    Proxy pattern explained through a TV set metaphor

    Thanks to node.js, JavaScript developers have been able to use their favourite language on the both ends for some time already, and proxy-like solutions have become a plausible option.

    Implementing DOM and Browser APIs on the Server

    Single page applications typically heavily depend on DOM manipulation. Typical server applications combine several view templates into a page through concatenation, whereas Backbone applications append the views into DOM as new elements. Developer would either need to emulate DOM on the server side, or build an abstraction layer that would permit using DOM on the browser and template concatenation on the server. DOM can either be serialized into a HTML document or vice versa, but these techniques cannot be easily mixed runtime.

    A typical Backbone application talks with the browser APIs through several different layers – either by using Backbone or jQuery APIs, or accessing the APIs directly. Backbone itself has only minor dependencies to layers below – jQuery is used in DOM manipulation and AJAX requests, and application state handling is done using pushState.

    Sample Backbone layers

    Node.js has ready-made modules for each level of abstraction: JSDOM offers a full DOM implementation on the server-side, whereas Cheerio provides a jQuery API on top of a fake DOM with a better performance. Some of the other server-side Backbone implementations, like AirBnB Rendr and Backbone.LayoutManager, set the abstraction level to the level of Backbone APIs (only), and hide the actual DOM manipulation under a set of conventions. Actually, Backbone.LayoutManager does offer the jQuery API through Cheerio, but the main purpose of the library itself is to ease the juggling between Backbone layouts, and hence promote a higher level of abstraction.

    Introducing backbone-serverside

    Still, we went for our own solution. Our team is a pack of old dogs that do not learn new tricks easily. We believe there is no easy way of fully abstracting out the DOM without changing what Backbone applications essentially are. We like our Backbone applications without extra layers, and jQuery has always served us as a good compatibility layer to defend ourselves against browser differences in DOM manipulation. Like Backbone.LayoutManager, we choose Cheerio as our jQuery abstraction. We solved the Backbone browser API dependencies by overriding Backbone.history and Backbone.ajax with API compatible replacements. Actually, in the first draft version, these implementations remain bare minimum stubs.

    We are quite happy about the solution we have in the works. If you study the backbone-serverside example, it looks quite close to what a typical Backbone application might be. We do not enforce working on any particular level of abstraction; you can use either Backbone APIs or the subset of APIs that jQuery offers. If you want to go deeper, nothing stops from implementing server-side version of a browser API. Insuch cases, the actual server side implementation may be a stub. For example, needs touch event handling on the server?

    The current solution assumes a node.js server, but it does not necessarily mean drastic changes to an existing server stack. An existing servers for API and static assets can remain as-is, but there should be a proxy to forward the requests of dumb clients to our server. The sample application serves static files, API and the proxy from the same server, but they all could be decoupled with small modifications.

    backbone-serverside as a proxy

    Writing Apps That Work on backbone-serverside

    Currently the backbone-serverside core is a bare minimum set of adapters to make Backbone run on node.js. Porting your application to run on server may require further modifications.

    If the application does not already utilise a module loader, such as RequireJS or Browserify, you need to figure out on how to load the same modules on the server. In our example below, we use RequireJS and need a bit JavaScript to use Cheerio instead of vanilla jQuery on the server. Otherwise we are pretty able to use the same stack we typically use (jQuery, Underscore/Lo-Dash, Backbone and Handlebars.When choosing the modules, you may need to limit to the ones that do not play with Browser APIs directly, or be prepared to write a few stubs by yourself.

    // Compose RequireJS configuration run-time by determining the execution
    // context first. We may pass different values to browser and server.
    var isBrowser = typeof(window) !== 'undefined';
     
    // Execute this for RequireJS (client or server-side, no matter which)
    requirejs.config({
     
        paths: {
            text: 'components/requirejs-text/text',
            underscore: 'components/lodash/dist/lodash.underscore',
            backbone: 'components/backbone/backbone',
            handlebars: 'components/handlebars/handlebars',
            jquery: isBrowser ? 'components/jquery/jquery' : 'emptyHack'
        },
     
        shim: {
            'jquery': {
                deps: ['module'],
                exports: 'jQuery',
                init: function (module) {
                    // Fetch the jQuery adapter parameters for server case
                    if (module && module.config) {
                        return module.config().jquery;
                    }
     
                    // Fallback to browser specific thingy
                    return this.jQuery.noConflict();
                }
            },
            'underscore': {
                exports: '_',
                init: function () {
                    return this._.noConflict();
                }
            },
            'backbone': {
                deps: ['underscore', 'jquery'],
                exports: 'Backbone',
                init: function (_, $) {
                    // Inject adapters when in server
                    if (!isBrowser) {
                        var adapters = require('../..');
                        // Add the adapters we're going to be using
                        _.extend(this.Backbone.history,
                            adapters.backbone.history);
                        this.Backbone.ajax = adapters.backbone.ajax;
                        Backbone.$ = $;
                    }
     
                    return this.Backbone.noConflict();
                }
            },
            'handlebars': {
                exports: 'Handlebars',
                init: function() {
                    return this.Handlebars;
                }
            }
        },
     
        config: {
            // The API endpoints can be passed via URLs
            'collections/items': {
                // TODO Use full path due to our XHR adapter limitations
                url: 'http://localhost:8080/api/items'
            }
        }
    });

    Once the configuration works alright, the application can be bootstrapped normally. In the example, we use Node.js express server stack and pass specific request paths to Backbone Router implementation for handling. When done, we will serialize the DOM into text and send that to the client. Some extra code needs to be added to deal with Backbone asynchronous event model. We will discuss that more thoroughly below.

    // URL Endpoint for the 'web pages'
    server.get(//(items/d+)?$/, function(req, res) {
        // Remove preceeding '/'
        var path = req.path.substr(1, req.path.length);
        console.log('Routing to '%s'', path);
     
        // Initialize a blank document and a handle to its content
        //app.router.initialize();
     
        // If we're already on the current path, just serve the 'cached' HTML
        if (path === Backbone.history.path) {
            console.log('Serving response from cache');
            res.send($html.html());
        }
     
        // Listen to state change once - then send the response
        app.router.once('done', function(router, status) {
            // Just a simple workaround in case we timeouted or such
            if (res.headersSent) {
                console.warn('Could not respond to request in time.');
            }
     
            if (status === 'error') {
                res.send(500, 'Our framework blew it. Sorry.');
            }
            if (status === 'ready') {
                // Set the bootstrapped attribute to communicate we're done
                var $root = $html('#main');
                $root.attr('data-bootstrapped', true);
     
                // Send the changed DOM to the client
                console.log('Serving response');
                res.send($html.html());
            }
        });
     
        // Then do the trick that would cause the state change
        Backbone.history.navigate(path, { trigger: true });
    });

    Dealing with Application Events and States

    Backbone uses an asynchronous, event-driven model for communicating between the models views and other objects. For an object oriented developer, the model is fine, but it causes a few headaches on node.js. After all, Backbone applications are data driven; pulling data from a remote API endpoint may take seconds, and once it eventually arrives, the models will notify views to repaint themselves. There is no easy way to know when all the application DOM manipulation is finished, so we needed to invent our own mechanism.

    In our example we utilise simple state machines to solve the problem. Since the simplified example does not have a separate application singleton class, we use a router object as the single point of control. Router listens for changes in states of each view, and only notifies the express server about readiness to render when all the views are ready. In the beginning of the request, router resets the view states to pending and does not notify the browser or server until it knows all the views are done. Correspondingly, the views do not claim to be done until they know they are fed with valid data from their corresponding model/collection. The state machine is simple and can be consistently applied throughout the different Backbone objects.

    Activity diagram of a Backbone app event flow

    Beyond the Experimental Hack

    The current version is still experimental work, but it proves Backbone applications can happily live on the server without breaking Backbone APIs or introducing too many new conventions. Currently in SC5 we have a few projects starting that could utilise the this implementation, so we will
    continue the effort.

    We believe the web stack community benefits from this effort, thus we have published the work in GitHub. It is far from being finished and we would appreciate all community continueributions in the forms of ideas and code. Share the love, criticism and all in between: @sc5io #backboneserverside.

    Particularly,we plan to change and hope to get contributions for the following:

    • The current example will likely misbehave on concurrent requests. It shares a single DOM representation for all the ongoing requests, which can easily mess up each other.
    • The state machine implementation is just one idea on how to determine when to serialize the DOM back to the client. It likely can be drastically simplified for most use cases, and it is quite possible to find a better generic solution.
    • The server-side route handling is naive. To emphasize that only the crawlers and legacy browsers might need server-side rendering, the sample could use projects like express-device to detect if we are serving a legacy browser or a server.
    • The sample application is a very rudimentary master-details view application and will not likely cause any wow effect. It needs a little bit of love.

    We encourage you to fork the repository and start from modifying the example for your needs. Happy Hacking!

  4. Adding cursor swipe to the Firefox OS keyboard

    In this article we will take a look at how to approach adding features to a core component in the system such as the input keyboard. It turns out it is pretty easy!

    Before we start, take a look at this concept video from Daniel Hooper to get an idea of what we want to implement:

    Cool, huh? Making such a change for other mobile platforms would be pretty hard or just plain impossible, but in Firefox OS it is quite simple and it will take us less than 50 lines of code.

    The plan

    Conceptually, what we want to achieve is that when the user swipes her finger on the keyboard area, the cursor in the input field moves a distance and direction proportional to the swiping, left or right.

    Since a common scenario is that the user might be pressing a wrong key and would like to slide to a close-by key to correct it, we will only start moving the cursor when the swipe distance is longer than the width of a single key.

    Preparing your environment

    In order to start hacking Firefox OS itself, you will need a copy of Gaia (the collection of webapps that make up the frontend of Firefox OS) and B2G desktop (a build of the B2G app runtime used on devices where all apps should run as they would on a device).

    You can take a look at this previous article from Mozilla Hacks in which we guide you through setting up and hacking on Gaia. There is also a complete guide to setting up this environment at https://wiki.mozilla.org/Gaia/Hacking.

    Once you get Gaia to run in B2G, you are ready to hack!

    Ready to hack!

    Firefox OS is all HTML5, and internally it is composed by several ‘apps’. We can find the main system apps in the apps folder in the gaia repository that you cloned before, including the keyboard app that we will be modifying.
    In this post we will be editing only apps/keyboard/js/keyboard.js, which is where
    a big chunk of the keyboard logic lives.

    We start by initializing some extra variables at the top of the file that will help us keep track of the swiping later.

    var swipeStartMovePos = null; // Starting point of the swiping
    var swipeHappening = false; // Are we in the middle of swiping?
    var swipeLastMousex = -1; // Previous mouse position
    var swipeMouseTravel = 0; // Amount traveled by the finger so far
    var swipeStepWidth = 0; // Width of a single keyboard key

    Next we should find where the keyboard processes touch events. At
    the top of keyboard.js we see that the event handlers for touch events are
    declared:

    var eventHandlers = {
      'touchstart': onTouchStart,
      'mousedown': onMouseDown,
      'mouseup': onMouseUp,
      'mousemove': onMouseMove
    };

    Nice! Now we need to store the coordinates of the initial touch event. Both onTouchStart and onMouseDown end up calling the function startPress after they do their respective post-touch tasks, so we will take care of storing the coordinates there.

    startPress does some work for when a key is pressed, like highlighting the key or checking whether the user is pressing backspace. We will write our logic after that. A convenient thing is that one of the arguments in its signature is coords, which refers to the coordinates where the user started touching, in the context of the keyboard element. So storing the coordinates is as easy as that:

    function startPress(target, coords, touchId) {
      swipeStartMovePos = { x: coords.pageX, y: coords.pageY };
      ...

    In that way we will always have available the coordinates of the last touch even starting point.

    The meat of our implementation will happen during the mousemove event, though. We see that the function onMouseMove is just a simple proxy function for the bigger movePress function, where the ‘mouse’ movements are processed. Here is where we will write our cursor-swiping logic.

    We will use the width of a keyboard key as our universal measure. Since the width of keyboard keys changes from device to device, we will first have to retrieve it calling a method in IMERender, which is the object that controls how the keyboard is rendered on the screen:

    swipeStepWidth = swipeStepWidth || IMERender.getKeyWidth();

    Now we can check if swiping is happening, and whether the swiping is longer than swipeStepWidth. Conveniently enough, our movePress function also gets passed the coords object:

    if (swipeHappening || (swipeStartMovePos && Math.abs(swipeStartMovePos.x - coords.pageX) > swipeStepWidth)) {

    Most of our logic will go inside that ‘if’ block. Now that we know that swiping is happening, we have to determine what direction it is going, assigning 1 for right and -1 for left to our previously initialized variable swipeDirection. After that, we add the amount of distance traveled to the variable swipeMouseTravel, and set swipeLastMousex to the current touch coordinates:

    var swipeDirection = coords.pageX > swipeLastMousex ? 1 : -1;
     
    if (swipeLastMousex > -1) {
      swipeMouseTravel += Math.abs(coords.pageX - swipeLastMousex);
    }
    swipeLastMousex = coords.pageX;

    Ok, now we have to decide how the pixels travelled by the user’s finger will translate into cursor movement. Let’s make that half the width of a key. That means that for every swipeStepWidth / 2 pixels travelled, the cursor in the input field will move one character.

    The way we will move the cursor is a bit hacky. What we do is to simulate the pressing of ‘left arrow’ or ‘right arrow’ by the user, even if these keys don’t even exist in the phone’s virtual keyboard. That allows us to move the cursor in the input field. Not ideal, but Mozilla is about to push a new Keyboard IME API that will give the programmer a proper API to manipulate curor positions and selections. For now, we will just workaround it:

    var stepDistance = swipeStepWidth / 2;
    if (swipeMouseTravel > stepDistance) {
      var times = Math.floor(swipeMouseTravel / stepDistance);
      swipeMouseTravel = 0;
      for (var i = 0; i < times; i++)
        navigator.mozKeyboard.sendKey(swipeDirection === -1 ? 37 : 39, undefined);
    }

    After that we just need to confirm that swiping is happening and do some cleanup of timeouts and intervals initialized in other areas of the file, that because of our new swiping functionality ouldn’t get executed otherwise. We also call hideAlternatives to avoid the keyboard to present us with alternative characters while we are swiping.

    swipeHappening = true;
     
    clearTimeout(deleteTimeout);
    clearInterval(deleteInterval);
    clearTimeout(menuTimeout);
    hideAlternatives();
    return;

    The only thing left to do is to reset all the values we’ve set when the user lifts her finger off the screen. The event handler for that is onMouseUp, which calls the function endPress, at the beginning of which we will put our logic:

    // The user is releasing a key so the key has been pressed. The meat is here.
    function endPress(target, coords, touchId) {
        swipeStartMovePos = null;
        ...
        if (swipeHappening === true) {
            swipeHappening = false;
            swipeLastMousex = -1;
            return;
        }

    With this last bit, our implementation is complete. Here is a rough video I’ve made with the working implementation:

    You can see the complete implementation code changes on GitHub.

    Conclusion

    Contributing bugfixes or features to Firefox OS is as easy as getting Gaia, B2G and start hacking in HTML5. If you are comfortable programming in JavaScript and familiar with making web pages, you can already contribute to the mobile operating system from Mozilla.

    Appendix: Finding an area to work on

    If you already know what bug you want to solve or what feature you want to implement in Firefox OS, first check if it has already been filed in Bugzilla, which is the issue repository that Mozilla uses to keep track of bugs. If it hasn’t, feel free to add it. Otherwise, if you are looking for new bugs to fix, a quick search will reveal many new ones that are sill unassigned. Feel free to pick them up!

  5. Capturing – Improving Performance of the Adaptive Web

    Responsive design is now widely regarded as the dominant approach to building new websites. With good reason, too: a responsive design workflow is the most efficient way to build tailored visual experiences for different device screen sizes and resolutions.

    Responsive design, however, is only the tip of the iceberg when it comes to creating a rich, engaging mobile experience.


    Image Source: For a Future-Friendly Web by Brad Frost

    The issue of performance with responsive websites

    Performance is one of the most important features of a website, but is also frequently overlooked. Performance is something that many developers struggle with – in order to create high-performing websites you need to spend a lot of time tuning your site’s backend. Even more time is required to understand how browsers work, so that you make rendering pages as fast as possible.

    When it comes to creating responsive websites, the performance challenges are even more difficult because you have a single set of markup that is meant to be consumed by all kinds of devices. One problem you hit is the responsive image problem – how do you ensure that big images intended for your Retina Macbook Pro are not downloaded on an old Android phone? How do you prevent desktop ads from rendering on small screen devices?

    It’s easy to overlook performance as a problem because we often conduct testing under perfect conditions – using a fast computer, fast internet, and close proximity to our servers. Just to give you an idea of how evident this problem is, we conducted an analysis into some top responsive e-commerce sites which revealed that the average responsive site home page consists of 87.2 resources and is made up of 1.9 MB of data.

    It is possible to solve the responsive performance problem by making the necessary adjustments to your website manually, but performance tuning by hand involves both complexity and repetition, and that makes it a great candidate for creating tools. With Capturing, we intend to make creating high-performing adaptive web experiences as easy as possible.

    Introducing Capturing

    Capturing is a client-side API we’ve developed to give developers complete control over the DOM before any resources have started loading. With responsive sites, it is a challenge to control what resources you want to load based on the conditions of the device: all current solutions require you to make significant changes to your existing site by either using server-side user-agent detection, or by forcing you to break semantic web standards (for example, changing the src attribute to data-src).

    Our approach to give you resource control is done by capturing the source markup before it has a chance to be parsed by the browser, and then reconstructing the document with resources disabled.

    The ability to control resources client-side gives you an unprecedented amount of control over the performance of your website.

    Capturing was a key feature of Mobify.js 1.1, our framework for creating mobile and tablet websites using client-side templating. We have since reworked Mobify.js in our 2.0 release to be a much more modular library that can be used in any existing website, with Capturing as the primary focus.

    A solution to the responsive image problem

    One way people have been tackling the responsive image problem is by modifying existing backend markup, changing the src of all their img elements to something like data-src, and accompanying that change with a <noscript> fallback. The reason this is done is discussed in this CSS-Tricks post

    “a src that points to an image of a horse will start downloading as soon as that image gets parsed by the browser. There is no practical way to prevent this.

    With Capturing, this is no longer true.

    Say, for example, you had an img element that you want to modify for devices with Retina screens, but you didn’t want the original image in the src attribute to load. Using Capturing, you could do something like this:

    if (window.devicePixelRatio && window.devicePixelRatio >= 2) {
        var bannerImg = capturedDoc.getElementById("banner");
        bannerImg.src = "retinaBanner.png"
    }

    Because we have access to the DOM before any resources are loaded, we can swap the src of images on the fly before they are downloaded. The latter example is very basic – a better example to highlight the power of capturing it to demonstrate a perfect implementation of the picture polyfill.

    Picture Polyfill

    The Picture element is the official W3C HTML extension for dealing with adaptive images. There are polyfills that exist in order to use the Picture element in your site today, but none of them are able to do a perfect polyfill – the best polyfill implemented thus far requires a <noscript> tag surrounding an img element in order to support browsers without Javascript. Using Capturing, you can avoid this madness completely.

    Open the example and be sure to fire up the network tab in web inspector to see which resources get downloaded:

    Here is the important chunk of code that is in the source of the example:

    <picture>
        <source src="/examples/assets/images/small.jpg">
        <source src="/examples/assets/images/medium.jpg" media="(min-width: 450px)">
        <source src="/examples/assets/images/large.jpg" media="(min-width: 800px)">
        <source src="/examples/assets/images/extralarge.jpg" media="(min-width: 1000px)">
        <img src="/examples/assets/images/small.jpg">
    </picture>

    Take note that there is an img element that uses a src attribute, but the browser only downloads the correct image. You can see the code for this example here (note that the polyfill is only available in the example, not the library itself – yet):

    Not all sites use modified src attributes and <noscript> tags to solve the responsive image problem. An alternative, if you don’t want to rely on modifying src or adding <noscript> tags for every image of your site, is to use server-side detection in order to swap out images, scripts, and other content. Unfortunately, this solution comes with a lot of challenges.

    It was easy to use server-side user-agent detection when the only device you needed to worry about was the iPhone, but with the amount of new devices rolling out, keeping a dictionary of all devices containing information about their screen width, device pixel ratio, and more is a very painful task; not to mention there are certain things you cannot detect with server-side user-agent – such as actual network bandwidth.

    What else can you do with Capturing?

    Solving the responsive image problem is a great use-case for Capturing, but there are also many more. Here’s a few more interesting examples:

    Media queries in markup to control resource loading

    In this example, we use media queries in attributes on images and scripts to determine which ones will load, just to give you an idea of what you can do with Capturing. This example can be found here:

    Complete re-writing of a page using templating

    The primary function of Mobify.js 1.1 was client-side templating to completely rewrite the pages of your existing site when responsive doesn’t offer enough flexibility, or when changing the backend is simply too painful and tedious. It is particularly helpful when you need a mobile presence, fast. This is no longer the primary function of Mobify.js, but it still possible using Capturing.

    Check out this basic example:

    In this example, we’ve taken parts of the existing page and used them in a completely new markup rendered to browser.

    Fill your page with grumpy cats

    And of course, there is nothing more useful then replacing all the images in a page with grumpy cats! In a high-performing way, of course ;-).

    Once again, open up web inspector to see that the original images on the site did not download.

    Performance

    So what’s the catch? Is there a performance penalty to using Capturing? Yes, there is, but we feel the performance gains you can make by controlling your resources outweigh the minor penalty that Capturing brings. On first load, the library (and main executable if not concatenated together), must download and execute, and the load time here will vary depending on the round trip latency of the device (ranges from around ~60ms to ~300ms). However, the penalty of every subsequent request will be reduced by at least half due to the library being cached, and the just-in-time (JIT) compiler making the compilation much more efficient. You can run the test yourself!

    We also do our best to keep the size of the library to a minimum – at the time of publishing this blog post, the library is 4KB minified and gzipped.

    Why should you use Capturing?

    We created Capturing to give more control of performance to developers on the front-end. The reason other solutions fail to solve this problem is because the responsibilities of the front-end and backend have become increasingly intertwined. The backend’s responsibility should be to generate semantic web markup, and it should be the front-end’s responsibility to take the markup from the backend and processes it in such a way that it is best visually represented on the device, and in a high-performing way. Responsive design solves the first issue (visually representing data), and Capturing helps solve the next (increasing performance on websites by using front-end techniques such as determining screen size and bandwidth to control resource loading).

    If you want to continue to obey the laws of the semantic web, and if you want an easy way to control performance at the front-end, we highly recommend that you check out Mobify.js 2.0!

    How can I get started using Capturing?

    Head over to our quick start guide for instructions on how to get setup using Capturing.

    What’s next?

    We’ve begun with an official developer preview of Mobify.js 2.0, which includes just the Capturing portion, but we will be adding more and more useful features.

    The next feature on the list to add is automatic resizing of images, allowing you to dynamically download images based on the size of the browser window without the need to modify your existing markup (aside from inserting a small javascript snippet)!

    We also plan to create other polyfills that can only be solved with Capturing, such as the new HTML5 Template Tag, for example.

    We look forward to your feedback, and we are excited to see what other developers will do with our new Mobify.js 2.0 library!

  6. Building User-Extensible Webapps with Local

    In an interview with Andrew Binstock in 2012, Alan Kay described the browser as “a joke.” If that surprises you, you’ll be glad to know that Mr. Binstock was surprised as well.

    Part of the problem Kay pointed out is well-known: feature-set. Browsers are doing today what word-processors and presentation tools have done for decades. But that didn’t seem to be the problem that bothered him most. The real problem? Browser-makers thought they were making an application, when they were really building an OS.

    The browser tab is a very small environment. Due to the same-origin policy, the application’s world is limited to what its host reveals. Unfortunately, remote hosts are often closed networks, and users don’t control them. This stops us from doing composition (no pipe in the browser) and configuration (no swapping out backends for your frontend). You can change tabs, but you can’t combine them.

    Built out of IRON

    Despite these problems, the Web is successful, and the reasons for that are specific. In a paper published in 2011, Microsoft, UT, and Penn researchers outlined the necessary qualities (PDF): Isolated, Rich, On-demand, and Networked. Those qualities are why, on the whole, you can click around the Web and do interesting things without worrying a virus will infect your computer. As they point out, if we want to improve the Web, we have to be careful not to soften it.

    That research team proposed a less-featured core browser which downloads its high-level capabilities with the page. Their approach could improve richness and security for the Web, but it requires a “radical refactor” first. With a need for something more immediate, I’ve developed Local, an in-browser program architecture which is compatible with HTML5 APIs.

    HTTP over Web Workers

    Local uses Web Workers to run its applications. They’re the only suitable choice available, as iframes and object-capabilities tools (like Google’s Caja or Crockford’s ADsafe) share the document’s thread. Workers, however, lack access to the document, making them difficult to use. Local’s solution to this is to treat the Workers like Web hosts and dispatch requests over the postMessage API. The Workers respond in turn with HTML, which the document renders.

    This leaves it to the document to make a lot of decisions: traffic permissions, HTML behaviors, which apps to load, and so on. Those decisions make up the page’s “environment,” and they collectively organize the apps into either a host-driven site, a pluggable web app, or a user-driven desktop environment.

    One of Local’s fundamental requirements is composition. The Internet’s strength– distributed interconnection– should be reflected in its software. REST is a unified interface to Local’s architecture, a philosophy which is borrowed from the Plan9 file-system. In HTML5 + Local, URIs can represent remote service endpoints, local service endpoints, and encoded chunks of data. The protocol to target javascript (httpl://) allows client regions to link to and target the Workers without event-binding.

    This keeps HTML declarative: there’s no application-specific setup. Additional interface primitives can be introduced by the Environment. Grimwire.com tries its own take on Web Intents, which produces a drag-and-drop-based UX. For programmatic composition, Local leans on the Link header, and provides the “navigator” prototype to follow those links in a hypermedia-friendly way.

    Security is also a fundamental requirement for Local. The Web Worker provides a secure sandbox for untrusted code (source (PDF), source). Content Security Policies allow environments to restrict inline scripts, styling, and embeds (including images). Local then provides a traffic dispatch wrapper for the environment to examine, scrub, route or deny application requests. This makes it possible to set policies (such as “local requests only”) and to intercept Cookie, Auth, and other session headers. The flexibility of those policies vary for each environment.

    Example Environment: a Markdown Viewer

    To get an idea of how this works, let’s take a quick tour through a simple environment. These snippets are from blog.grimwire.com. The page HTML, JS, and markdown are served statically. A Worker application, “markdown.js”, proxies its requests to the hosted blog posts and converts their content to HTML. The environment then renders that HTML into the Content “client region,” which is an area segmented by Local into its own browsing context (like an iframe).

    index.js

    The first file we’ll look at is “index.js,” the script which sets up the environment:

    // The Traffic Mediator
    // examines and routes all traffic in the application
    // (in our simple blog, we'll permit all requests and log the errors)
    Environment.setDispatchWrapper(function(request, origin, dispatch) {
        var response = dispatch(request);
        // dispatch() responds with a promise which is
        //   fulfilled on 2xx/3xx and rejected on 4xx/5xx
        response.except(console.log.bind(console));
        return response;
    });
     
    // The Region Post-processor
    // called after a response is rendered
    // (gives the environment a chance to add plugins or styles to new content)
    Environment.setRegionPostProcessor(function(renderTargetEl) {
        Prism.highlightAll(); // add syntax highlighting with prismjs
                              // (http://prismjs.com/)
    });
     
    // Application Load
    // start a worker and configure it to load our "markdown.js" file
    Environment.addServer('markdown.util', new Environment.WorkerServer({
        scriptUrl:'/local/apps/util/markdown.js',
        // ^^ this tells WorkerServer what app to load
        baseUrl:'/posts'
        // ^^ this tells markdown.js where to find the markdown files
    }));
     
    // Client Regions
    // creates browsing regions within the page and populates them with content
    var contentRegion = Environment.addClientRegion('content');
    contentRegion.dispatchRequest('httpl://markdown.util/frontpage.md');

    The environment here is very minimal. It makes use of two hooks: the dispatch wrapper and the region post-processor. A more advanced environment might sub-type the ClientRegion and WorkerServer prototypes, but these two hooks should provide a lot of control on their own. The dispatch wrapper is primarily used for security and debugging, while the region post-processor is there to add UI behaviors or styles after new content enters the page.

    Once the hooks are defined, the environment loads the markdown proxy and dispatches a request from the content region to load ‘frontpage.md’. Workers load asynchronously, but the WorkerServer buffers requests made during load, so the content region doesn’t have to wait to dispatch its request.

    When a link is clicked or a form is submitted within a ClientRegion, Local converts that event into a custom ‘request’ DOM event and fires it off of the region’s element. Another part of Local listens for the ‘request’ event and handles the dispatch and render process. We use dispatchRequest() to programmatically fire our own ‘request’ event at the start. After that, markdown files can link to “httpl://markdown.util/:post_name.md” and the region will work on its own.

    markdown.js

    Let’s take a quick look at “markdown.js”:

    // Load Dependencies
    // (these calls are synchronous)
    importScripts('linkjs-ext/responder.js');
    importScripts('vendor/marked.js'); // https://github.com/chjj/marked
     
    // Configure Marked.js
    marked.setOptions({ gfm: true, tables: true });
     
    // Pipe Functions
    // used with `Link.Responder.pipe()` to convert the response markdown to html
    function headerRewrite(headers) {
        headers['content-type'] = 'text/html';
        return headers;
    }
    function bodyRewrite(md) { return (md) ? marked(md) : ''; }
     
    // WorkerServer Request Handler
    app.onHttpRequest(function(request, response) {
        // request the markdown file
        var mdRequest = Link.dispatch({
            method  : 'get',
            url     : app.config.baseUrl + request.path,
                                // ^^ the `baseUrl` given to us by index.js
            headers : { accept:'text/plain' }
        });
        // use helper libraries to pipe and convert the response back
        Link.responder(response).pipe(mdRequest, headerRewrite, bodyRewrite);
    });
     
    // Inform the environment that we're ready to handle requests
    app.postMessage('loaded');

    This script includes all of the necessary pieces for a Worker application. At minimum, the app must define an HTTP request handler and post the ‘loaded’ message back to the environment. (postMessage() is part of MyHouse, the low-level Worker manager which HTTPL is built on.)

    Before the application is loaded, Local nulls any APIs which might allow data leaks (such as XMLHttpRequest). When a Worker uses Link.dispatch, the message is transported to the document and given to the dispatch wrapper. This is how security policies are enforced. Local also populates the app.config object with the values given to the WorkerServer constructor, allowing the environment to pass configuration to the instance.

    With those two snippets, we’ve seen the basics of how Local works. If we wanted to create a more advanced site or desktop environment, we’d go on to create a layout manager for the client regions, UIs to load and control Workers, security policies to enforce permissions, and so on.

    You can find the complete source for the blog at github.com/pfraze/local-blog.

    User-Driven Software

    Local’s objective is to let users drive the development of the Web. In its ideal future, private data can be configured to save to private hosts, peer-to-peer traffic can go unlogged between in-browser servers with WebRTC, APIs can be mashed up on the fly, and users can choose the interfaces. Rather than fixed websites, I’d like to see hosts provide platforms built around different tasks (blogging, banking, shopping, developing, etc) and competing on services for their user’s apps. Then, services like Mint.com could stop asking for your banking credentials. Instead, they’d just host a JS file.

    You can get started with Local by reading its documentation and blog, and by trying out Grimwire, a general-purpose deployment in its early stages. The source can be found on GitHub under the MIT license.

  7. Finding Words by Synonym with Cinnamon.js

    There are only two hard things in Computer Science: cache invalidation and naming things.

    — Phil Karlton

    Naming things in web development is hard too, from evolving CSS classes to headers and links. From the perspective of information architecture, headers and links serve as visual waypoints, helping users build mental models of a site and navigate from page to page.

    But a second, underappreciated role that header and link names play is through the browser’s built-in Find function. I can only speak from personal experience — and maybe I’m the exception to the rule — but I often rely on Find to do existence checks on in-page content and quickly jump to it.

    Sometimes Find falls short though. For instance, consider a visitor that likes your site and decides to subscribe to your RSS feed. They search the page for “RSS” but nothing comes up. The problem is that you named your link “Feed” or “Subscribe”, or used the RSS symbol. They shrug their shoulders and move on — and you’ve lost a potential follower.

    I wrote Cinnamon.js to ease the pain of naming things, by having Find work with synonyms (demo).

    Try It Out

    To use Cinnamon.js, you can simply include the script on your page:

    <script src="cinnamon.js"></script>

    Then wrap your word with synonyms, separated by commas, like so:

    <span data-cinnamon="Blaze,Flame,Pyre">Fire</span>

    This is an example of a markup API, requiring only a bit of HTML to get going.

    The Basic Style

    In a nutshell, the script takes each synonym listed in the data-cinnamon attribute and creates a child element, appropriately styled.

    To style the synonyms, I stack them behind the original text with the following CSS. The synonym text is hidden while the original text gets highlighted.

    position: absolute;
    top: 0;
    left: 0;
    z-index: -1;
    display: inline-block;
    width: 100%;
    height: 100%;
    overflow: hidden;
    color: transparent;

    Here’s how it looks in Firefox’s 3D view. The green blocks represent the synonyms.

    Firefox 3D View

    Cross-Browser Quirks

    For the purposes of the script, when a synonym is found, the text should stay invisible while its background gets highlighted. This gives the illusion that the original word is the one being highlighted.

    In testing this, I discovered some differences in how browsers handle Find. These are edge cases that you hopefully won’t ever have to deal with, but they loomed larger in making Cinnamon.js.

    Finding Invisible Text

    If text is set to display: none;, Find doesn’t see it at all — this much is true of all browsers. Same goes for visibility: hidden; (except for Opera, where Find matches the synonym but nothing is seen).

    When opacity is set to 0, most browsers match the text, but nothing is visibly highlighted (Opera is the odd man out again, highlighting the background of the matched text).

    When text is set to color: transparent;, most browsers including Firefox and Chrome will highlight the area while the text stays transparent — just what we want for our script.

    Safari

    However, Safari does things differently. When transparent text is found, Safari will display it as black text on yellow. If the text is buried under elements with a higher z-index, it brings it to the top.

    Another difference: most browsers match text in the middle of a string. Safari only does so when the string is CamelCase.

    Other Issues

    Hidden text, used deceptively, can be penalized in Google’s search results. Given the techniques used, Cinnamon.js carries some small measure of risk, especially if it’s misused.

    Another issue is the impact of Cinnamon.js on accessibility. Fortunately, there’s aria-hidden="true", which is used to tell screen readers to ignore synonyms.

    Keep On Searching

    I’ve used the browser’s Find function for years without giving it much thought. But in writing Cinnamon.js, I’ve learned quite a bit about the web and how a small piece of it might be extended. You just never know what’ll inspire your next hack.

  8. Simplifying audio in the browser

    The last few years have seen tremendous gains in the capabilities of browsers, as the latest HTML5 standards continue to get implemented. We can now render advanced graphics on the canvas, communicate in real-time with WebSockets, access the local filesystem, create offline apps and more. However, the one area that has lagged behind is audio.

    The HTML5 Audio element is great for a small set of uses (such as playing music), but doesn’t work so well when you need low-latency, precision playback.

    Over the last year, a new audio standard has been developed for the browser, which gives developers direct access to the audio data. Web Audio API allows for high precision and high performing audio playback, as well as many advanced features that just aren’t possible with the HTML5 Audio element. However, support is still limited, and the API is considerably more complex than HTML5 Audio.

    Introducing howler.js

    The most obvious use-case for high-performance audio is games, but most developers have had to settle for HTML5 Audio with a Flash fallback to get browser compatibility. My company, GoldFire Studios, exclusively develops games for the open web, and we set out to find an audio library that offered the kind of audio support a game needs, without relying on antiquated technologies. Unfortunately, there were none to be found, so we wrote our own and open-sourced it: howler.js.

    Howler.js defaults to Web Audio API and uses HTML5 Audio as the fallback. The library greatly simplifies the API and handles all of the tricky bits automatically. This is a simple example to create an audio sprite (like a CSS sprite, but with an audio file) and play one of the sounds:

    var sound = new Howl({
      urls: ['sounds.mp3', 'sounds.ogg'],
      sprite: {
        blast: [0, 2000],
        laser: [3000, 700],
        winner: [5000, 9000]
      }
    });
     
    // shoot the laser!
    sound.play('laser');

    Using feature detection

    At the most basic level, this works through feature detection. The following snippet detects whether or not Web Audio API is available and creates the audio context if it is. Current support for Web Audio API includes Chrome 10+, Safari 6+, and iOS 6+. It is also in the pipeline for Firefox, Opera and most other mobile browsers.

    var ctx = null,
      usingWebAudio = true;
    if (typeof AudioContext !== 'undefined') {
      ctx = new AudioContext();
    } else if (typeof webkitAudioContext !== 'undefined') {
      ctx = new webkitAudioContext();
    } else {
      usingWebAudio = false;
    }

    Audio support for different codecs varies across browsers as well, so we detect which format is best to use from your provided array of sources with the canPlayType method:

    var audioTest = new Audio();
    var codecs = {
      mp3: !!audioTest.canPlayType('audio/mpeg;').replace(/^no$/,''),
      ogg: !!audioTest.canPlayType('audio/ogg; codecs="vorbis"').replace(/^no$/,''),
      wav: !!audioTest.canPlayType('audio/wav; codecs="1"').replace(/^no$/,''),
      m4a: !!(audioTest.canPlayType('audio/x-m4a;') || audioTest.canPlayType('audio/aac;')).replace(/^no$/,''),
      webm: !!audioTest.canPlayType('audio/webm; codecs="vorbis"').replace(/^no$/,'')
    };

    Making it easy

    These two key components of howler.js allows the library to automatically select the best method of playback and source file to load and play. From there, the library abstracts away the two different APIs and turns this (a simplified Web Audio API example without all of the extra fallback support and extra features):

    // create gain node
    var gainNode, bufferSource;
    gainNode = ctx.createGain();
    gainNode.gain.value = volume;
    loadBuffer('sound.wav');
     
    var loadBuffer = function(url) {
      // load the buffer from the URL
      var xhr = new XMLHttpRequest();
      xhr.open('GET', url, true);
      xhr.responseType = 'arraybuffer';
      xhr.onload = function() {
        // decode the buffer into an audio source
        ctx.decodeAudioData(xhr.response, function(buffer) {
          if (buffer) {
            bufferSource = ctx.createBufferSource();
            bufferSource.buffer = buffer;
            bufferSource.connect(gainNode);
            bufferSource.start(0);
          }
        });
      };
      xhr.send();
    };

    (Note: some old deprecated names were createGainNode and noteOn, if you see them in other examples on the web)

    Into this:

    var sound = new Howl({
      urls: ['sound.wav'],
      autoplay: true
    });

    It is important to note that neither Web Audio API nor HTML5 Audio are the perfect solution for everything. As with anything, it is important to select the right tool for the right job. For example, you wouldn’t want to load a large background music file using Web Audio API, as you would have to wait for the entire data source to load before playing. HTML5 Audio is able to play very quickly after the download begins, which is why howler.js also implements an override feature that allows you to mix-and-match the two APIs within your app.

    Audio in the browser is ready

    I often hear that audio in the browser is broken and won’t be useable for anything more than basic audio streaming for quite some time. This couldn’t be further from the truth. The tools are already in today’s modern browsers. High quality audio support is here today, and Web Audio API and HTML5 combine to offer truly plugin-free, cross-browser audio support. Browser audio is no longer a second-class citizen, so let’s all stop treating it like one and keep making apps for the open web.

  9. Story of a Knight: the making of

    The travel of a medieval knight through fullscreen DOM. The ‘making of’ the demo that has won the November Dev Derby.

    Technologies used:

    Markup And Style

    Markup and style are organized in this way:

    • An external wrapper that contains everything
    • Three controls boxes with fixed positions and high z-index
    • An internal wrapper that contains the Google Maps iframe, canvas path and 8 div elements for the story

    The external wrapper and control boxes

    The external wrapper contains:

    • The audio tag with ogg and mp3 sources, on top left;
    • The div which is populated with fullscreen switcher if the browser supports it, on top right;
    • The navigation with numbers to move through the canvas path, on bottom right.

    <div id="external-wrapper">
      <audio controls="controls">
        <source src="assets/saltarello.ogg" type="audio/ogg" />
        <source src="assets/saltarello.mp3" type="audio/mp3" />
        Your browser does not support the audio element.
      </audio>
     
      <div id="fullscreen-control">
      </div>
     
      <ul class="navigation">
        <li class="active"><a href="#start">1</a></li>
        <li><a href="#description">2</a></li>
        ...
      </ul>

    The internal wrapper

    The internal wrapper contains:

    • The iframe with the big Google Map embedded, absolutely positioned with negative x and y;
    • A div of the same size and the same absolute position of the map, but with a bigger z-index, which has a “background-size: cover” semi-transparent image of old paper to give a parchment effect;
    • The canvas path (once activated the javascript plugin, it will be drawn here);
    • The 8 divs that tells the story with texts and images, absolutely positioned.

    <div class="wrapper">
      <iframe width="5000" height="4500" frameborder="0" scrolling="no" marginheight="0" marginwidth="0" src="http://maps.google.it/?ie=UTF8&amp;t=k&amp;ll=44.660839,14.811584&amp;spn=8.79092,13.730164&amp;z=9&amp;output=embed"></iframe>
      <div style="position: absolute; top: -2000px; left: -1100px; background: transparent url(assets/bg_paper.jpg) no-repeat top left; width: 5000px; height: 4500px;background-size: cover;opacity: 0.4;z-index: 999;"></div>
     
      <div class="demo">
        <img src="assets/knight.png" alt="">
        <h1><span>&#8225;</span> Story of a Knight <span>&#8225;</span></h1>
        <span class="arrow">&#8224;</span> Of Venetian lagoon AD 1213 <span class="arrow">&#8224;</span>
      </div>
     
      <div class="description">
        <span class="big">He learnedthe profession of arms<br/>in an Apennines' fortress.</span>
        <img src="assets/weapons.png" alt="" style="position: absolute; top: 180px; left: 20px;">
      </div>
      ...
    </div>

    JavaScript

    The Scrollpath plugin

    Available at https://github.com/JoelBesada/scrollpath

    First we need to embed the jQuery library in the last part of the page

    <script src="http://code.jquery.com/jquery-latest.pack.js"></script>

    Then we can call the scrollpath.js plugin, the demo.js where we give the instructions to draw the canvas path and initiate it, the easing.js to have a smooth movement (also include the scrollpath.css in the head of the document).

    <script src="script/min.jquery.scrollpath.js"></script>
    <script src="script/demo.js"></script>
    <script src="script/jquery.easing.js"></script>
    <link rel="stylesheet" type="text/css" href="style/scrollpath.css" />

    Let’s see the relevant parts of the demo.js file:

    1. At the beginning there are the instructions for drawing the path, using the methods “moveTo”, “lineTo”, “arc” and declaring x/y coordinates;
    2. Then there’s the initialization of the plugin on the internal wrapper;
    3. Finally there’s the navigation implementation with smooth scrolling.
    $(document).ready(init);
     
      function init() {
      /* ========== DRAWING THE PATH AND INITIATING THE PLUGIN ============= */
     
      var path = $.fn.scrollPath("getPath");
      // Move to 'start' element
      path.moveTo(400, 50, {name: "start"});
      // Line to 'description' element
      path.lineTo(400, 800, {name: "description"});
      // Arc down and line
      path.arc(200, 1200, 400, -Math.PI/2, Math.PI/2, true);
      ...
     
      // We're done with the path, let's initiate the plugin on our wrapper element
      $(".wrapper").scrollPath({drawPath: true, wrapAround: true});
     
      // Add scrollTo on click on the navigation anchors
      $(".navigation").find("a").each(function() {
        var target = this.getAttribute("href").replace("#", "");
        $(this).click(function(e) {
          e.preventDefault();
     
          // Include the jQuery easing plugin (http://gsgd.co.uk/sandbox/jquery/easing/)
          // for extra easing functions like the one below
          $.fn.scrollPath("scrollTo", target, 1000, "easeInOutSine");
        });
      });
     
      /* ===================================================================== */
    }

    The jQuery-FullScreen plugin

    Available at https://github.com/martinaglv/jQuery-FullScreen

    To cap it all, the fullscreen. Include the jQuery-FullScreen plugin, then verify with a script if the browser supports the functionality: in case yes, it will append the switcher on the top right corner; then initialize it on the external wrapper to push everything fullscreen.

    <script src="script/jquery.fullscreen.js"></script>
     
    <script>
      (function () {
        //Fullscreen for modern browser
        if($.support.fullscreen){
          var fullScreenButton = $('#fullscreen-control').append('<a id="goFullScreen">Watch full screen</a>');
          fullScreenButton.click(function(e){
            e.preventDefault();
            $('#external-wrapper').fullScreen();
          });
        }
      })();
    </script>

    Summary

    The hardest part was to figure out which size/zoom level give to the Google Maps iframe and then where to position it in relation to the div with the canvas.
    The other thing that has reserved some problems was the loading time: I had initially placed the video of a medieval battle in slow-motion along the path, but then I removed it because the page was loaded too slowly

    As you have seen everything is very simple, the good result depends only on the right mix of technology, storytelling and aesthetics. I think that the front-end is entering in a golden age, a period rich of expressive opportunities: languages and browsers are evolving rapidly, so there’s the chance to experiment mixing different techniques and obtain creative results.

  10. Koalas to the Max – a case study

    One day I was browsing reddit when I came across this peculiar link posted on it: http://www.cesmes.fi/pallo.swf

    The game was addictive and I loved it but I found several design elements flawed. Why did it start with four circles and not one? Why was the color split so jarring? Why was it written in flash? (What is this, 2010?) Most importantly, it was missing a golden opportunity to split into dots that form an image instead of just doing random colors.

    Creating the project

    This seemed like a fun project, and I reimplemented it (with my design tweaks) using D3 to render with SVG.

    The main idea was to have the dots split into the pixels of an image, with each bigger dot having the average color of the four dots contained inside of it recursively, and allow the code to work on any web-based image.
    The code sat in my ‘Projects’ folder for some time; Valentines day was around the corner and I thought it could be a cute gift. I bought the domain name, found a cute picture, and thus “koalastothemax.com (KttM)” was born.

    Implementation

    While the user-facing part of KttM has changed little since its inception, the implementation has been revisited several times to incorporate bug fixes, improve performance, and bring support to a wider range of devices.

    Notable excerpts are presented below and the full code can be found on GitHub.

    Load the image

    If the image is hosted on koalastothemax.com (same) domain then loading it is as simple as calling new Image()

    var img = new Image();
    img.onload = function() {
     // Awesome rendering code omitted
    };
    img.src = the_image_source;

    One of the core design goals for KttM was to let people use their own images as the revealed image. Thus, when the image is on an arbitrary domain, it needs to be given special consideration. Given the same origin restrictions, there needs to be a image proxy that could channel the image from the arbitrary domain or send the image data as a JSONP call.

    Originally I used a library called $.getImageData but I had to switch to a self hosted solution after KttM went viral and brought the $.getImageData App Engine account to its limits.

    Extract the pixel data

    Once the image loads, it needs to be resized to the dimensions of the finest layer of circles (128 x 128) and its pixel data can be extracted with the help of an offscreen HTML5 canvas element.

    koala.loadImage = function(imageData) {
     // Create a canvas for image data resizing and extraction
     var canvas = document.createElement('canvas').getContext('2d');
     // Draw the image into the corner, resizing it to dim x dim
     canvas.drawImage(imageData, 0, 0, dim, dim);
     // Extract the pixel data from the same area of canvas
     // Note: This call will throw a security exception if imageData
     // was loaded from a different domain than the script.
     return canvas.getImageData(0, 0, dim, dim).data;
    };

    dim is the number of smallest circles that will appear on a side. 128 seemed to produce nice results but really any power of 2 could be used. Each circle on the finest level corresponds to one pixel of the resized image.

    Build the split tree

    Resizing the image returns the data needed to render the finest layer of the pixelization. Every successive layer is formed by grouping neighboring clusters of four dots together and averaging their color. The entire structure is stored as a (quaternary) tree so that when a circle splits it has easy access to the dots from which it was formed. During construction each subsequent layer of the tree is stored in an efficient 2D array.

    // Got the data now build the tree
    var finestLayer = array2d(dim, dim);
    var size = minSize;
     
    // Start off by populating the base (leaf) layer
    var xi, yi, t = 0, color;
    for (yi = 0; yi < dim; yi++) {
     for (xi = 0; xi < dim; xi++) {
       color = [colorData[t], colorData[t+1], colorData[t+2]];
       finestLayer(xi, yi, new Circle(vis, xi, yi, size, color));
       t += 4;
     }
    }

    Start by going through the color data extracted in from the image and creating the finest circles.

    // Build up successive nodes by grouping
    var layer, prevLayer = finestLayer;
    var c1, c2, c3, c4, currentLayer = 0;
    while (size < maxSize) {
     dim /= 2;
     size = size * 2;
     layer = array2d(dim, dim);
     for (yi = 0; yi < dim; yi++) {
       for (xi = 0; xi < dim; xi++) {
         c1 = prevLayer(2 * xi    , 2 * yi    );
         c2 = prevLayer(2 * xi + 1, 2 * yi    );
         c3 = prevLayer(2 * xi    , 2 * yi + 1);
         c4 = prevLayer(2 * xi + 1, 2 * yi + 1);
         color = avgColor(c1.color, c2.color, c3.color, c4.color);
         c1.parent = c2.parent = c3.parent = c4.parent = layer(xi, yi,
           new Circle(vis, xi, yi, size, color, [c1, c2, c3, c4], currentLayer, onSplit)
         );
       }
     }
     splitableByLayer.push(dim * dim);
     splitableTotal += dim * dim;
     currentLayer++;
     prevLayer = layer;
    }

    After the finest circles have been created, the subsequent circles are each built by merging four dots and doubling the radius of the resulting dot.

    Render the circles

    Once the split tree is built, the initial circle is added to the page.

    // Create the initial circle
    Circle.addToVis(vis, [layer(0, 0)], true);

    This employs the Circle.addToVis function that is used whenever the circle is split. The second argument is the array of circles to be added to the page.

    Circle.addToVis = function(vis, circles, init) {
     var circle = vis.selectAll('.nope').data(circles)
       .enter().append('circle');
     
     if (init) {
       // Setup the initial state of the initial circle
       circle = circle
         .attr('cx',   function(d) { return d.x; })
         .attr('cy',   function(d) { return d.y; })
         .attr('r', 4)
         .attr('fill', '#ffffff')
           .transition()
           .duration(1000);
     } else {
       // Setup the initial state of the opened circles
       circle = circle
         .attr('cx',   function(d) { return d.parent.x; })
         .attr('cy',   function(d) { return d.parent.y; })
         .attr('r',    function(d) { return d.parent.size / 2; })
         .attr('fill', function(d) { return String(d.parent.rgb); })
         .attr('fill-opacity', 0.68)
           .transition()
           .duration(300);
     }
     
     // Transition the to the respective final state
     circle
       .attr('cx',   function(d) { return d.x; })
       .attr('cy',   function(d) { return d.y; })
       .attr('r',    function(d) { return d.size / 2; })
       .attr('fill', function(d) { return String(d.rgb); })
       .attr('fill-opacity', 1)
       .each('end',  function(d) { d.node = this; });
    }

    Here the D3 magic happens. The circles in circles are added (.append('circle')) to the SVG container and animated to their position. The initial circle is given special treatment as it fades in from the center of the page while the others slide over from the position of their “parent” circle.

    In typical D3 fashion circle ends up being a selection of all the circles that were added. The .attr calls are applied to all of the elements in the selection. When a function is passed in it shows how to map the split tree node onto an SVG element.

    .attr('cx', function(d) { return d.parent.x; }) would set the X coordinate of the center of the circle to the X position of the parent.

    The attributes are set to their initial state then a transition is started with .transition() and then the attributes are set to their final state; D3 takes care of the animation.

    Detect mouse (and touch) over

    The circles need to split when the user moves the mouse (or finger) over them; to be done efficiently the regular structure of the layout can be taken advantage of.

    The described algorithm vastly outperforms native “onmouseover” event handlers.

    // Handle mouse events
    var prevMousePosition = null;
    function onMouseMove() {
     var mousePosition = d3.mouse(vis.node());
     
     // Do nothing if the mouse point is not valid
     if (isNaN(mousePosition[0])) {
       prevMousePosition = null;
       return;
     }
     
     if (prevMousePosition) {
       findAndSplit(prevMousePosition, mousePosition);
     }
     prevMousePosition = mousePosition;
     d3.event.preventDefault();
    }
     
    // Initialize interaction
    d3.select(document.body)
     .on('mousemove.koala', onMouseMove)

    Firstly a body wide mousemove event handler is registered. The event handler keeps track of the previous mouse position and calls on the findAndSplit function passing it the line segments traveled by the user’s mouse.

    function findAndSplit(startPoint, endPoint) {
     var breaks = breakInterval(startPoint, endPoint, 4);
     var circleToSplit = []
     
     for (var i = 0; i < breaks.length - 1; i++) {
       var sp = breaks[i],
           ep = breaks[i+1];
     
       var circle = splitableCircleAt(ep);
       if (circle && circle.isSplitable() && circle.checkIntersection(sp, ep)) {
         circle.split();
       }
     }
    }

    The findAndSplit function splits a potentially large segment traveled by the mouse into a series of small segments (not bigger than 4px long). It then checks each small segment for a potential circle intersection.

    function splitableCircleAt(pos) {
     var xi = Math.floor(pos[0] / minSize),
         yi = Math.floor(pos[1] / minSize),
         circle = finestLayer(xi, yi);
     if (!circle) return null;
     while (circle && !circle.isSplitable()) circle = circle.parent;
     return circle || null;
    }

    The splitableCircleAt function takes advantage of the regular structure of the layout to find the one circle that the segment ending in the given point might be intersecting. This is done by finding the leaf node of the closest fine circle and traversing up the split tree to find its visible parent.

    Finally the intersected circle is split (circle.split()).

    Circle.prototype.split = function() {
     if (!this.isSplitable()) return;
     d3.select(this.node).remove();
     delete this.node;
     Circle.addToVis(this.vis, this.children);
     this.onSplit(this);
    }

    Going viral

    Sometime after Valentines day I meet with Mike Bostock (the creator of D3) regarding D3 syntax and I showed him KttM, which he thought was tweet-worthy – it was, after all, an early example of a pointless artsy visualization done with D3.

    Mike has a twitter following and his tweet, which was retweeted by some members of the Google Chrome development team, started getting some momentum.

    Since the koala was out of the bag, I decided that it might as well be posted on reddit. I posted it on the programing subreddit with the tile “A cute D3 / SVG powered image puzzle. [No IE]” and it got a respectable 23 points which made me happy. Later that day it was reposted to the funny subreddit with the title “Press all the dots :D” and was upvoted to the front page.

    The traffic went exponential. Reddit was a spike that quickly dropped off, but people have picked up on it and spread it to Facebook, StumbleUpon, and other social media outlets.

    The traffic from these sources decays over time but every several months KttM gets rediscovered and traffic spikes.

    Such irregular traffic patterns underscore the need to write scalable code. Conveniently KttM does most of the work within the user’s browser; the server needs only to serve the page assets and one (small) image per page load allowing KttM to be hosted on a dirt-cheap shared hosting service.

    Measuring engagement

    After KttM became popular I was interested in exploring how people actually interacted with the application. Did they even realize that the initial single circle can split? Does anyone actually finish the whole image? Do people uncover the circles uniformly?

    At first the only tracking on KttM was the vanilla GA code that tracks pageviews. This quickly became underwhelming. I decided to add custom event tracking for when an entire layer was cleared and when a percentage of circles were split (in increments of 5%). The event value is set to the time in seconds since page load.

    As you can see such event tracking offers both insights and room for improvement. The 0% clear event is fired when the first circle is split and the average time for that event to fire seems to be 308 seconds (5 minutes) which does not sound reasonable. In reality this happens when someone opens KttM and leaves it open for days then, if a circle is split, the event value would be huge and it would skew the average. I wish GA had a histogram view.

    Even basic engagement tracking sheds vast amounts of light into how far people get through the game. These metrics proved very useful when the the mouse-over algorithm was upgraded. I could, after several days of running the new algorithm, see that people were finishing more of the puzzle before giving up.

    Lessons learned

    While making, maintaining, and running KttM I learned several lessons about using modern web standards to build web applications that run on a wide range of devices.

    Some native browser utilities give you 90% of what you need, but to get your app behaving exactly as you want, you need to reimplement them in JavaScript. For example, the SVG mouseover events could not cope well with the number of circles and it was much more efficient to implement them in JavaScript by taking advantage of the regular circle layout. Similarly, the native base64 functions (atob, btoa) are not universally supported and do not work with unicode. It is surprisingly easy to support the modern Internet Explorers (9 and 10) and for the older IEs Google Chrome Frame provides a great fallback.

    Despite the huge improvements in standard compliance it is still necessary to test the code on a wide variety of browsers and devices, as there are still differences in how certain features are implemented. For example, in IE10 running on the Microsoft Surface html {-ms-touch-action: none; } needed to be added to allow KttM to function correctly.

    Adding tracking and taking time to define and collect the key engagement metrics allows you to evaluate the impact of changes that get deployed to users in a quantitative manner. Having well defined metrics allows you to run controlled tests to figure out how to streamline your application.

    Finally, listen to your users! They pick up on things that you miss – even if they don’t know it. The congratulations message that appears on completion was added after I received complaints that is was not clear when a picture was fully uncovered.

    All projects are forever evolving and if you listen to your users and run controlled experiments then there is no limit to how much you can improve.