Video Articles

Sort by:


  1. Distributed On-the-Fly Image Processing and Open Source at Vimeo

    When you think of Vimeo, you probably think of video — after all, it’s what we do. However, we also have to handle creation and distribution a lot of images: thumbnails, user portraits, channel headers, and all the various awesome graphics around Vimeo, to name a few.

    For a very long time, all of this content was static and served from the CDN as-is. If we wanted to introduce a new video resolution, we would have to run batch jobs to get new, higher resolution thumbnails from all of the videos on the site, where possible. This also means that if we ever wanted to tweak the quality of said images, we would be out of luck. It also meant on mobile, or on a high DPI screen, we had to serve the same size images as on our main site, unless we wanted to store higher and/or lower resolution versions of the same images.

    Enter ViewMaster.

    About two years ago, during one of our code jams, one of our mobile site developers brought the issue to us, the transcode team, in search of a backend solution. ViewMaster was born that night, but sat idle for a long time after, due to heavy workloads, before being picked up again a few months ago.

    We’ll go into more detail below, but a quick summary of what ViewMaster is and does is:

    • Written in Go and C.
    • Resizes, filters, crops, and encodes (with optimizations such as pngout to different formats on-the-fly, and entirely in memory.
    • Can be scaled out; each server is ‘dumb’.
    • Reworked thumbnailing that picks one ‘good’ thumbnail per video, during upload, and stores a master for use with the on-the-fly processing later on.
    • Migrates our existing N versions of each image to one master, high quality image to be stored.

    This allows us to:

    • Serve smaller or larger images to different screen types, depending on DPI, and OS.
    • Serve optimized images for each browser; e.g. WebP for Chrome, and JPEG-XR for IE11+.
    • Easily introduce new video resolutions and player sizes.
    • Scale thumbnail images to the size of an embed, for display.
    • Introduce new optimizations such as mozjpeg instantly, and without any significant migration problems.

    Now for the slightly more technical bits.

    General Architectural Overview and Migration

    ViewMaster Flow

    A general look at the process is given in the diagram above. If you’d like a more detailed look at the infrastructure and migration strategy (and a higher res diagram), including what some of those funny names mean, head over to the Making Vimeo Blog to check it out!

    Open Source

    The actual image processing happens entirely in memory — the disk is never hit. The main image processing service is written in Go, and making somewhat liberal use of its C FFI to call several libraries and a few tiny C routines, open source or otherwise. It is known that calling C functions from Go has an overhead, but in practice, this has been negligible compared to the time taken by much more intensive operations inside the libraries, such as decoding, encoding, resizing, etc.

    The process is rather straight forward: The video frame is seeked to and decoded and converted to RGB (yes, JPEG is YCbCr, but it made more sense for the master to be stored as RGB to us) and/or the image is decoded, and various calculations are done to account for things like non-square pixels, cropping, resizing, aspect ratios, etc. The image is then resized, encoded, and optimized. All of this is done in-memory using buffered IO in Go (via bufio), and if need be piped to an external process and back to the service where libraries are not available, such as the case is with Gifsicle and pngout.

    Plenty of tricks are used to speed things up, such as detecting the image type and resolution based on mime-type, libmagic, and the libraries listed below, so we don’t need to call avformat_find_stream_info, which does a full decode of the image to get this information.

    A few of the notable open source libraries we leverage (and contribute to!), include:

    • FFmpeg & Libav – Base image decoding libraries (libavcodec), resizing (swscale), remote image access. Now supports CMYK JPEGs too!
    • FFMS2 – Frame accurate seeking using the above libraries.
    • libwebp – WebP encoding.
    • LCMS2 – ICC profile handling.

    On top of those, we’ve written several Go packages to aid in this as well, some of which we have just open sourced:

    • go-util – General utility functions for Go.
    • go-iccjpeg – ICC profile extraction from a generic io.Reader.
    • go-magic – Idiomatic Go bindings for the libmagic C API using io.Reader.
    • go-imgparse – Resolution extraction from JPEG, PNG, and WebP images optimized for I/O and convenience, again using a standard io.Reader.
    • go-taglog – Extended logging package compatible with the standard Go logging package.
    • go-mediainfo – Very basic binding for MediaInfo.


    Although we are currently optimizing quite well for PNG, and WebP, there is still lots to be done. To that end, we have been involved with an contributing to a number open source projects to create a better and faster web experience. A few are discussed below. It may not have been obvious though, since we tend to use our standard email accounts to contribute, rather than our corporate ones… Sneaky!

    mozjpeg – Very promising already, having added features such as scan optimization, trellis quantization, DHT/DQT table merging, and deringing via overshoot clipping, with future features such as optimized quantization tables for high DPI displays and globally optimal edge extension. We plan to roll this out after the plan for ABI compatibility is implemented in 3.0, and also we plan to then add support to ImageMagick to benefit the greater community, if someone else has not already.

    jxrlib – Awesome of Microsoft to open source this, but it needs a bit of work API-wise (that is, an actual API). Until fairly recently, it could not even be built as a library.

    jpeg-recompress – Alongside mozjpeg, something akin to this is very desirable for JPEG generation. Uses the excellent IQA with mozjpeg and some other metrics (one implemented poorly) by me!).

    Open Source PNG optimization library – This was a bit of a sticking point with us. The current open source PNG optimization utils do not support any in-memory API at all, or in fact, even piping via stdin/stdout. pngout is the only tool which even supports piping. Long term, we’d like to be able to ditch the closed source tool and contribute an API to one of these projects.

    Photoshop, GIMP, etc. plugins – I plan to implement these using the above-mentioned libraries, so designers can more easily reap the benefits of better image compression.

  2. Building Interactive HTML5 Videos

    The HTML5 <video> element makes embedding videos into your site as easy as embedding images. And since all major browsers support <video> since 2011, it’s also the most reliable way to get your moving pictures seen by people.

    A more recent addition to the HTML5 family is the <track> element. It’s a sub-element of <video>, intended to make the video timeline more accessible. Its main use case is adding closed captions. These captions are loaded from a separate text file (a WebVTT file) and printed over the bottom of the video display. Ian Devlin has written an excellent article on the subject.

    Beyond captions though, the <track> element can be used for any kind of interaction with the video timeline. This article explores 3 examples: chapter markers, preview thumbnails, and a timeline search. By the end, you will have sufficient understanding of the <track> element and its scripting API to build your own interactive video experiences.

    Chapter Markers

    Let’s start with an example made popular by DVD disks: chapter markers. These allow viewers to quickly jump to a specific section. It’s especially useful for longer movies like Sintel:

    The chapter markers in this example reside in an external VTT file and are loaded on the page through a <track> element with a kind of **chapters. The track is set to load by default:

    <video width="480" height="204" poster="assets/sintel.jpg" controls>
      <source src="assets/sintel.mp4" type="video/mp4">
      <track src="assets/chapters.vtt" kind="chapters" default>

    Next, we use JavaScript to load the cues of the text track, format them, and print them in a controlbar below the video. Note we have to wait until the external VTT file is loaded:

    track.addEventListener('load',function() {
        var c = video.textTracks[0].cues;
        for (var i=0; i<c.length; i++) {
          var s = document.createElement("span");
          s.innerHTML = c[i].text;

    In above code block, we’re adding 2 properties to the list entries to hook up interactivity. First, we set a data attribute to store the start position of the chapter, and second we add a click handler for an external seek function. This function will jump the video to the start position. If the video is not (yet) playing, we’ll make that so:

    function seek() {
      video.currentTime = this.getAttribute('data-start');
      if(video.paused){; }

    That’s it! You now have a visual chapter menu for your video, powered by a VTT track. Note the actual live Chapter Markers example has a little bit more logic than described, e.g. to toggle playback of the video on click, to update the controlbar with the video position, and to add some CSS styling.

    Preview Thumbnails

    This second example shows a cool feature made popular by Hulu and Netflix: preview thumbnails. When mousing over the controlbar (or dragging on mobile), a small preview of the position you’re about to seek to is displayed:

    This example is also powered by an external VTT file, loaded in a metadata track. Instead of texts, the cues in this VTT file contain links to a separate JPG image. Each cue could link to a separate image, but in this case we opted to use a single JPG sprite – to keep latency low and management easy. The cues link to the correct section of the sprite by using Media Fragment URIs.Example:,0,160,90

    Next, all important logic to get the right thumbnail and display it lives in a mousemove listener for the controlbar:

    controlbar.addEventListener('mousemove',function(e) {
      // first we convert from mouse to time position ..
      var p = (e.pageX - controlbar.offsetLeft) * video.duration / 480;
      // ..then we find the matching cue..
      var c = video.textTracks[0].cues;
      for (var i=0; i<c.length; i++) {
          if(c[i].startTime <= p && c[i].endTime > p) {
      // we unravel the JPG url and fragment query..
      var url =c[i].text.split('#')[0];
      var xywh = c[i].text.substr(c[i].text.indexOf("=")+1).split(',');
      // ..and last we style the thumbnail overlay = 'url('+c[i].text.split('#')[0]+')'; = '-'+xywh[0]+'px -'+xywh[1]+'px'; = e.pageX - xywh[2]/2+'px'; = controlbar.offsetTop - xywh[3]+8+'px'; = xywh[2]+'px'; = xywh[3]+'px';

    All done! Again, the actual live Preview Thumbnails example contains some additional code. It includes the same logic for toggling playback and seeking, as well as logic to show/hide the thumbnail when mousing in/out of the controlbar.

    Timeline Search

    Our last example offers yet another way to unlock your content, this time though in-video search:

    This example re-uses an existing captions VTT file, which is loaded into a captions track. Below the video and controlbar, we print a basic search form:

        <input type="search" />
        <button type="submit">Search</button>

    Like with the thumbnails example, all key logic resides in a single function. This time, it’s the event handler for submitting the form:

    form.addEventListener('submit',function(e) {
      // First we’ll prevent page reload and grab the cues/query..
      var c = video.textTracks[0].cues;
      var q = document.querySelector("input").value.toLowerCase();
      // ..then we find all matching cues..
      var a = [];
      for(var j=0; j<c.length; j++) {
        if(c[j].text.toLowerCase().indexOf(q) > -1) {
      // ..and last we highlight matching cues on the controlbar.
      for (var i=0; i<a.length; i++) {
        var s = document.createElement("span"); = (a[i].startTime/video.duration*480-2)+"px";

    Three time’s a charm! Like with the other ones, the actual live Timeline Search example contains additional code for toggling playback and seeking, as well as a snippet to update the controlbar help text.

    Wrapping Up

    Above examples should provide you with enough knowledge to build your own interactive videos. For some more inspiration, see our experiments around clickable hot spots, interactive transcripts, or timeline interaction.

    Overall, the HTML5 <track> element provides an easy to use, cross-platform way to add interactivity to your videos. And while it definitely takes time to author VTT files and build similar experiences, you will see higher accessibility of and engagement with your videos. Good luck!

  3. Adding captions and subtitles to HTML5 video

    This article is also available on MDN.

    With the introduction of the <video> and <audio> elements to HTML5, we finally have a native way to add video and audio to our websites. We also have a JavaScript API that allows us to interact with this media content in different ways, be it writing our own controls or simply seeing how long a video file is. As responsible web developers, we should also be constantly thinking about making our content more accessible, and this doesn’t stop with video and audio content. Making our content accessible to all is an important step, be it for someone who is hard of hearing or someone who doesn’t understand the language that the content is delivered in, inclusion can be paramount.

    Thankfully HTML5 also provides us with a native way of making our media content more accessible by adding subtitles and captions via the <track> element. Most major browsers support this natively to varying degrees, which the first part of this article shows, but it also provides a JavaScript API, with which we can access and use the text tracks (e.g. subtitles) that are available. This article also shows how this API can be used to detect what captions/subtitles have been added to a HTML5 video, and how that data can be used to build a selectable menu of available text tracks and ultimately provide a more consistent interface across the various browsers.

    In articles on MDN, we have looked at how to build a cross browser video player using the HTMLMediaElement and Window.fullScreen APIs, and also at how to style the player. This article will take the same player and show how to add captions and subtitles to it, using Web_Video_Text_Tracks_Format and the <track> element.

    Captioned video example

    In this article, we will refer to the Video player with captions example. This example uses an excerpt from the Sintel open movie, created by the Blender Foundation.

    Video player with stand controls such as play, stop, volume, and captions on and off. The video playing shows a scene of a man holding a spear-like weapon, and a caption reads "Esta hoja tiene pasado oscuro."

    Note: You can find the source on Github, and also view the example live.

    HTML5 and Video Captions

    Before diving into how to add captions to the video player, there are a number of things that we will first mention, which you should be aware of before we start.

    Captions versus subtitles

    Captions and subtitles are not the same thing: they have significantly different audiences, and convey different information, and it is recommended that you read up on the differences if you are not sure what they are. They are however implemented in the same way technically, so the material in this article will apply to both.

    For this article we will refer to the text tracks displayed as captions, as their content is aimed at hearing people who have difficulty understanding the language of the film, rather than deaf or hard-of-hearing people.

    The <track> element

    HTML5 allows us to specify captions for a video using the Web Video Text Tracks (WebVTT) format. The WebVTT specification is still being worked on, but major parts of it are stable so we can use it today.

    Video providers (such as the Blender Foundation) provide captions and subtitles in a text format with their videos, but they’re usually in the SubRip Text (SRT) format. These can be easily converted to WebVTT using an online converter such as srt2vtt.

    Modifications to the HTML and CSS

    This section summarises the modifications made to the previous article’s code in order to facilitate the addition of subtitles to the video. If you are not interested in thism and just want to get straight into the JavaScript and more relevant CSS, skip to the Caption implementation section.

    In this example we are using a different video, Sintel, as it actually has some speech in it and therefore is better for illustrating how captions work!

    HTML Markup

    As mentioned above, we need to make use of the new HTML5 <track> element to add our caption files to the HTML5 video. We actually have our captions in three different languages — English, German, and Spanish — so we will reference all three of the relevant VTT files by adding <track> elements inside our HTML5 <video> element:

    <video id="video" controls preload="metadata">
       <source src="video/sintel-short.mp4" type="video/mp4">
       <source src="video/sintel-short.webm" type="video/webm">
       <track label="English" kind="captions" srclang="en" src="captions/vtt/sintel-en.vtt" default>
       <track label="Deutsch" kind="captions" srclang="de" src="captions/vtt/sintel-de.vtt">
       <track label="Español" kind="captions" srclang="es" src="captions/vtt/sintel-es.vtt">

    As you can see, each <track> element has the following attributes set:

    • kind is given a value of captions, indicating the type of content the files contain
    • label is given a value indicating which language that caption set is for for example English or Deutsch — these labels will appear in the user interface to allow the user to easily select which caption language they want to see.
    • src is assigned a valid URL pointing to the relevant WebVTT caption file in each case.
    • srclang indicates what language each captions files’ contents are in.
    • The default attribute is set on the English <track> element, indicating to the browser that this is the default caption file definition to use when captions have been turned on and the user has not made a specific selection.

    In addition to adding the <track> elements, we have also added a new button to control the captions menu that we will build. As a consequence, the video controls now look as follows:

    <div id="video-controls" class="controls" data-state="hidden">
       <button id="playpause" type="button" data-state="play">Play/Pause</button>
       <button id="stop" type="button" data-state="stop">Stop</button>
       <div class="progress">
          <progress id="progress" value="0" min="0">
             <span id="progress-bar"></span>
       <button id="mute" type="button" data-state="mute">Mute/Unmute</button>
       <button id="volinc" type="button" data-state="volup">Vol+</button>
       <button id="voldec" type="button" data-state="voldown">Vol-</button>
       <button id="fs" type="button" data-state="go-fullscreen">Fullscreen</button>
       <button id="captions" type="button" data-state="captions">CC</button>

    CSS Changes

    The video controls have undergone some minor changes in order to make space for the extra button, but these are relatively straightforward.

    No image is used for the captions button, so it is simply styled as:

    .controls button[data-state="captions"] {

    There are also other CSS changes that are specific to some extra JavaScript implementation, but these will be mentioned at the appropriate place below.

    Caption implementation

    A lot of what we do to access the video captions revolves around JavaScript. Similar to the video controls, if a browser supports HTML5 video captions, there will be a button provided within the native control set to access them. However, since we have defined our own video controls, this button is hidden, and we need to define our own.

    Browsers do vary as to what they support, so we will be attempting to bring a more unified UI to each browser where possible. There’s more on browser compatibility issues later on.

    Initial setup

    As with all the other buttons, one of the first things we need to do is store a handle to the captions’ button:

    var captions = document.getElementById('captions');

    We also initially turn off all captions, in case the browser turns any of them on by default:

    for (var i = 0; i &lt; video.textTracks.length; i++) {
       video.textTracks[i].mode = 'hidden';

    The video.textTracks property contains an array of all the text tracks attached to the video. We loop through each one and set its mode to hidden.

    Note: The WebVTT API gives us access to all the text tracks that are defined for an HTML5 video using the <track> element.

    Building a caption menu

    Our aim is to use the captions button we added earlier to display a menu that allows users to choose which language they want the captions displayed in, or to turn them off entirely.

    We have added the button, but before we make it do anything, we need to build the menu that goes with it. This menu is built dynamically, so that languages can be added or removed later by simply editing the <track> elements within the video’s markup.

    All we need to do is to go through the video’s textTracks, reading their properties and building the menu up from there:

    var captionsMenu;
    if (video.textTracks) {
       var df = document.createDocumentFragment();
       var captionsMenu = df.appendChild(document.createElement('ul'));
       captionsMenu.className = 'captions-menu';
       captionsMenu.appendChild(createMenuItem('captions-off', '', 'Off'));
       for (var i = 0; i < video.textTracks.length; i++) {
          captionsMenu.appendChild(createMenuItem('captions-' + video.textTracks[i].language, video.textTracks[i].language,         video.textTracks[i].label));

    This code creates a documentFragment, which is used to hold an unordered list containing our captions menu. First of all an option is added to allow the user to switch all captions off, and then buttons are added for each text track, reading the language and label from each one.

    The creation of each list item and button is done by the createMenuItem() function, which is defined as follows:

    var captionMenuButtons = [];
    var createMenuItem = function(id, lang, label) {
       var listItem = document.createElement('li');
       var button = listItem.appendChild(document.createElement('button'));
       button.setAttribute('id', id);
       button.className = 'captions-button';
       if (lang.length > 0) button.setAttribute('lang', lang);
       button.value = label;
       button.setAttribute('data-state', 'inactive');
       button.addEventListener('click', function(e) {
          // Set all buttons to inactive
, i, a) {
             captionMenuButtons[i].setAttribute('data-state', 'inactive');
          // Find the language to activate
          var lang = this.getAttribute('lang');
          for (var i = 0; i < video.textTracks.length; i++) {
             // For the 'captions-off' button, the first condition will never match so all will captions be turned off
             if (video.textTracks[i].language == lang) {
                video.textTracks[i].mode = 'showing';
                this.setAttribute('data-state', 'active');
             else {
                video.textTracks[i].mode = 'hidden';
 = 'none';
       return listItem;

    This function builds the required <li> and <button> elements, and returns them so they can be added to the captions menu list. It also sets up the required event listeners on the button to toggle the relevant caption set on or off. This is done by simply setting the required caption’s mode attribute to showing, and setting the others to hidden.

    Once the menu is built, it is then inserted into the DOM at the bottom of the videoContainer.

    Initially the menu is hidden by default, so an event listener needs to be added to our captions button to toggle it:

    captions.addEventListener('click', function(e) {
       if (captionsMenu) {
 = ( == 'block' ? 'none' : 'block');

    Caption menu CSS

    We also added some rudimentary styling for the newly created captions menu:

    .captions-menu {
    .captions-menu li {
    .captions-menu li button {
        padding:2px 5px;

    Styling the displayed captions

    One of the less well known about and supported features of WebVTT is the ability to style the individual captions (something called text cues) via CSS Extensions.

    The ::cue pseudo-element is the key to targetting individual text track cues for styling, as it matches any defined cue. There are only a handful of CSS properties that can be applied to a text cue:

    For example, to change the text colour of the text track cues you can write:

    ::cue {

    If the WebVTT file uses voice spans, which allow cues to be defined as having a particular “voice”:

    00:00:00.000 --> 00:00:12.000
    <v Test>[Test]</v>

    Then this specific ‘voice’ will be stylable like so:

    ::cue(v[voice='Test']) {

    Note: Some of the styling of cues with ::cue currently works on Chrome, Opera, and Safari, but not yet on Firefox.

    Browser Compatibility

    Browser support for WebVTT and the <track> element is fairly good, although some browsers differ slightly in their implementation.

    Internet Explorer

    Since Internet Explorer 10+ captions are enabled by default, and the default controls contain a button and a menu that offers the same functionality as the menu we just built. The default attribute is also supported.

    Note: IE will completely ignore WebVTT files unless you setup the MIME type. This can easily be done by adding an .htaccess file to an appropriate directory that contains AddType text/vtt .vtt.


    Safari 6.1+ has similar support to Internet Explorer 11, displaying a menu with the different available options, with the addition of an “Auto” option, which allows the browser to choose.

    Chrome and Opera

    These browsers have similar implementations again: captions are enabled by default and the default control set contains a ‘cc’ button that turns captions on and off. Chrome and Opera ignore the default attribute on the <track> element and will instead try to match the browser’s language to the caption’s language.


    Firefox’s implementation was completely broken due to a bug, leading to Mozilla turning off WebVTT support by default (you can turn it on via the media.webvtt.enabled flag.) However, this bug looks to have been fixed and WebVTT support re-enabled as of Gecko 31, so this will not be a problem for Firefox final release users for much longer (on Gecko 29 as of the time of this writing.) this has been fixed as of Firefox 31, and everything works as it should.


    If, after reading through this article you decide that you can’t be bothered to do all of this and want someone else to do it for you, there are plenty of plugins out there that offer caption and subtitle support that you can use.

    This small plugin implements subtitles, captions, and chapters as well as both WebVTT and SRT file formats.
    This video player is very extensive and does a lot more than simply support video captions. It supports WebVTT, SRT, and DFXP file formats.
    Another complete video player that also support video captions, albeit only in SRT format.
    LeanBack Player
    Yet another video player that supports WebVTT captions as well as providing other standard player functionality.
    This player also supports captions through WebVTT and SRT files.
    Supports WebVTT video subtitles.

    Note: You can find an excellent list of HTML5 Video Players and their current state at HTML5 Video Player Comparison.

  4. Inside the Party Bus: Building a Web App with Multiple Live Video Streams + Interactive Graphics

    Gearcloud Labs is exploring the use of open technologies to build new kinds of shared video experiences. Party Bus is a demo app that mixes multiple live video streams together with interactive graphics and synchronized audio. We built it using a combination of node.js, WebSockets, WebRTC, WebGL, and Web Audio. This article shares a few things we learned along the way.

    User experience

    First, take a ride on the Party Bus app to see what it does. You need Firefox or Chrome plus a decent GPU, but if that’s not handy you can get an idea of the app by watching the example video on YouTube.

    Since the app uses WebRTC getUserMedia(), you have to give permission for the browser to use your camera. After it starts, the first thing you’ll notice is your own video stream mounted to a seat on the 3D bus (along with live streams from any other concurrent riders). In most scenes, you can manipulate the bus in 3D using the left mouse (change camera angle), scroll wheel (zoom in/out), and right mouse (change camera position). Also try the buttons in the bottom control bar to apply effects to your own video stream: from changing your color, to flipping yourself upside down, bouncing in your seat, etc.

    How party bus uses WebRTC

    Party Bus uses WebRTC to set up P2P video streams needed for the experience. WebRTC does a great job supporting native video in the browser, and punching out firewalls to enable peer connections (via STUN). But with WebRTC, you also need to provide your own signaler to coordinate which endpoints will participate in a given application session.

    The Party Bus app uses a prototype platform we built called Mixology to handle signaling and support the use of dynamic peer topologies. Note that many apps can simply use peer.js, but we are using Mixology to explore new and scalable approaches for combining large numbers of streams in a variety of different connection graphs.

    For example, if a rider joins a bus that already has other riders, the system takes care of building the necessary connection paths between the new rider and peers on the same bus, and then notifying all peers through a WebSocket that the new rider needs to be assigned a seat.

    Specifically, clients interact with the Mixology signaling server by instantiating a Mixology object

    var m = new Mixology(signalerURL);

    and then using it to register with the signaler

    m.register(['mix-in'], ['mix-out']);

    The two arguments give specific input and output stream types supported by the client. Typing inputs and outputs in this way allows Mixology to assemble arbitrary graphs of stream connections, which may vary depending on application requirements. In the case of Party Bus, we’re just using a fully connected mesh among all peers. That is, all clients register with the same input and output types.

    The signaler is implemented as a node.js application that maintains a table of registered peers and the connections among them. The signaler can thus take care of handling peer arrivals, departures, and other events — updating other peers as necessary via callback functions. All communications between peers and the signaler are implemented internally using WebSockets, using

    For example, when a new peer is registered, the server updates the topology table, and uses a callback function to notify other peers that need to know about the new connection.

    m.onPeerRegistered = function(peer) { ... }

    In this function, peers designated to send streams initiate the WebRTC offer code. Peers designated to receive streams initiate the WebRTC answer code (as well as provide a callback function onAddStream() to be used when the new input stream is ready).

    In the case of Party Bus, it’s then up to the app to map the new video stream to the right seat in the 3D bus model, and from then on, apply the necessary 3D transforms using three.js. Similarly, if a rider leaves the bus, the system takes care of notifying other clients that a peer has exited, so they can take appropriate action to remove what would otherwise be a dead video stream in the display.

    Party Bus organizes the “riders” on a bus using an array of virtual screens:

    var vsArray = new Array(NUM_SEATS);

    After registering itself with Mixology, the app receives a callback whenever a new peer video stream becomes available for its bus instance:

    function onAddStream(stream, peerId) {
        var i = getNextOpenSeat();
        vsArray[i] = new VScreen(stream, peerId);

    The Party Bus app creates a virtual screen object for every video stream on the current bus. The incoming streams are associated with DOM video objects in the virtual screen constructor:

    function VScreen(stream, id) {
        var v = document.createElement(‘video’);
        v.setAttribute(“id”, “monitor:+ id); = “hidden”;
        v.src = window.URL.createObjectURL(stream);  // binds stream to dom video object
        v.autoplay =true;

    Movie or app?

    Party Bus uses three.js to draw a double decker bus, along with virtual screens “riding” in the seats. The animation loop runs about two minutes, and consists of about a dozen director “shots”. Throughout the demo, the individual video screens are live, and can be manipulated by each rider. The overall sequence of shots is designed to change scene lighting and present other visual effects, such as bus thrusters which were created with the particle engine of Stemkoski.

    Party Bus is a web app, but the animation is programmed so the user can just let it run like a movie. The curious user may try to interact with it, and find that in most scenes it’s also possible to change the 3D view. However, in shots with a moving camera or bus, we found it necessary to block certain camera translations (movements in x, y, z position), or rotations (turning on x, y, z axis) — otherwise, the mouse will “fight” the program, resulting in a jerky presentation.

    But most of the fun in Party Bus is just hamming it up for the camera, applying visual effects to your own stream, and looking for other live riders on the bus.

    More info

    For more information on the Party Bus app, or to stay in the loop on development of the Mixology platform, please check out

  5. It's a wrap! "App Basics for FirefoxOS" is out and ready to get you started

    A week ago we announced a series of video tutorials around creating HTML5 apps for Firefox OS. Now we released all the videos and you can watch the series in one go.

    Photo by Olliver Hallmann

    The series is aimed at web developers who want to build their first HTML5 application. Specifically it is meant to be distributed in the emerging markets, where Firefox OS is the first option to get an affordable smartphone and start selling apps to the audiences there.

    Over the last week, we released the different videos of the series – one each day:

    Yesterday we announced the last video in the series. For all of you who asked for the whole series to watch in one go, you now got the chance to do so.

    There are various resources you can use:

    What’s next?

    There will be more videos on similar topics coming in the future and we are busy getting the videos dubbed in other languages. If you want to help us get the word out, check the embedded versions of the videos on, where we use Amara to allow for subtitles.

    Speaking of subtitles and transcripts, we are currently considering both, depending on demand. If you think this would be a very useful thing to have, please tell us in the comments.


    Many thanks to Sergi, Jan, Jakob, Ketil, Nathalie and Anne from Telenor, Brian Bondy from Khan Academy, Paul Jarrat and Chris Heilmann of Mozilla to make all of this possible. Technologies used to make this happen were Screenflow, Amazon S3, by and YouTube.

  6. App basics for Firefox OS – a screencast series to get you started

    Over the next few days we’ll release a series of screencasts explaining how to start your first Open Web App and develop for Firefox OS.

    Firefox OS - Intro and hello

    Each of the screencasts is terse enough to watch in a short break and the whole series should not take you more than an hour of your time. The series features Jan Jongboom (@janjongboom), Sergi Mansilla (@sergimansilla) of Telenor Digital and Chris Heilmann (@codepo8) of Mozilla and was shot in three days in Oslo, Norway at the offices of Telenor Digital in February 2014.

    Here are the three of us telling you about the series and what to expect:

    Firefox OS is an operating system that brings the web to mobile devices. Instead of being a new OS with new technologies and development environments it builds on standardised web technologies that have been in use for years now. If you are a web developer and you want to build a mobile app, Firefox OS gives you the tools to do so, without having to change your workflow or learn a totally new development environment. In this series of short videos, developers from Mozilla and Telenor met in Oslo, Norway to explain in a few steps how you can get started to build applications for FirefoxOS. You’ll learn:

    • how to build your first application for Firefox OS
    • how to debug and test your application both on the desktop and the real device
    • how to get it listed in the marketplace
    • how to use the APIs and special interfaces Firefox OS offers a JavaScript developer to take advantage of the hardware available in smartphones.

    In addition to the screencasts, you can download the accompanying code samples from GitHub . If you want to try the code examples out for yourself, you will need to set up a very simple development environment. All you need is:

    • A current version of Firefox (which comes out of the box with the developer tools you need) – we recommend getting Firefox Aurora or Nightly if you really want to play with the state-of-the-art technology.
    • A text editor – in the screencasts we used Sublime Text, but any will do. If you want to be really web native, you can try Adobe Brackets.
    • A local server or a server to push your demo files to. A few of the demo apps need HTTP connections instead of local ones.

    sergi and chris recording

    Over the next few days we’ll cover the following topics:

    In addition to the videos, you can also go to the Wiki page of the series to get extra information and links on the subjects covered.

    Come back here to see the links appear day by day or follow us on Twitter at @mozhacks to get information when the next video is out.

    jan recording his video

    Once the series is out, there’ll be a Wiki resource to get them all in one place. Telenor are also working on getting these videos dubbed in different languages. For now, stay tuned.

    Many thanks to Sergi, Jan, Jakob, Ketil, Nathalie and Anne from Telenor to make all of this possible.

  7. Firefox OS Security: Part 1 – The Web Security Model

    When presenting Firefox OS to people, security is a big topic. Can an operating system built on web technologies be secure? What has Mozilla built in to avoid drive-by downloads and malware? In this two part video series Christian Heilmann (@codepo8), principal evangelist of Mozilla, talks to Michael Coates (@_mwc), chair of @OWASP Board about all things security in Firefox OS.

    Firefox OS was built on top of the technologies that power the Web. Following Mozilla’s security practices and knowledge from over 10 years of securing Firefox, Firefox OS is engineered as a multi-tiered system that protects users while delivering the power of the mobile web. The design ensures users are in control of their data and developers have APIs and technologies at their disposal to unlock the power of the Web.

    Watch the following video where we talk more about the security design and controls present in Firefox OS. In this, the first of two videos on Firefox OS security, we’ll cover items such as the multi-tiered architecture, the permission model, run time decision making, protection of users data and the update model. You can watch the video on YouTube.

    Additional links for more information:

  8. Make your Firefox OS app feel alive with video and audio

    Firefox OS applications aren’t just about text: there is no better way to make your app feel alive than adding some videos or audio to it. Let’s explore different ways we can use as developers to enhance our mobile masterpiece.

    Audio and video HTML tags

    Since we are talking about HTML, it makes total sense to think about using the <audio>, and <video> tag to play those media in your Firefox OS app. If you want to add a video in your application, just use this code.

    <video src="" controls>
      Your browser does not support the video element.

    In this code example, the user will see a video player with controls, and will have the opportunity to start the video. If your application is running in a browser not supporting the video tag, the user will see the text between the tag. It’s still a good practice to do so, even if your primary target is a Firefox OS app, because since it uses HTML5, someone may access it from another browser if it’s a hosted app. Note that you can use other attributes for this element.

    As for the audio tag, it’s basically the same.

    <audio id="demo" src="/music/audio.mp3" autoplay loop></audio>

    In this example, the audio will start automatically, and will play the audio file, in a loop, from the relative path: it’s perfect for background music if you are building a game. Note that you can add other attributes to this element too.

    Of course, using those elements without JavaScript give you basic features, but no worries, you can programmatically control them with code. Once you have your HTML element, like the audio example you just saw, you can use JavaScript to play, pause, change the volume, and more.

    document.querySelector("#demo").play(); //Play the Audio
    document.querySelector("#demo").pause(); //Pause the Audio
    document.querySelector("#demo").volume+=0.1; //Increase Volume
    document.querySelector("#demo").volume-=0.1; //Decrease Volume

    You can read more on what you can do with those two elements in the Mozilla Developer Network documentation. You also want to give a closer look to the supported format list.

    Use audio while the screen is locked

    Maybe you are building a podcast app, or at least you need to be able to play audio while the screen is locked? There is a way to do it by using the audio tag. You simply need to add the mozaudiochannel attribute with the value of content to your actual tag.

    <audio mozaudiochannel="content" preload="none"

    Actually, it’s not quite true as this code won’t work as is. You also need to add a permission to the manifest file.

    "permissions": {
        "description":"Use the audio channel for the music player"

    Having the manifest line above will authorize your application to use the audio channel to play music, even when the screen is locked. Having said that, you probably realize that this code is specific to Firefox OS for now. I intentionally put the end of the last sentence in bold as it’s one thing you need to understand about Firefox OS: we had to create some APIs, features or elements to give the power HTML deserve for developers, but we are working with the W3C to make those standards. In the case that the standards won’t be the same as what we created, we’ll change it to reflect it.

    Firefox OS Web activities

    Finally, something very handy for Firefox OS developers: the Web Activities. They define a way for applications to delegate an activity to another (usually user-chosen) application. They aren’t standardized, at the time of writing. In the case that will be interesting to us, we’ll use the Web Activity call open, to open music or video files. Note that for video, you can also use the view activity that basically does the same. Let’s say I want to open a remote video when someone clicks on a button with the id open-video: I’ll use the following code in my JavaScript to make it happen.

    var openVideo = document.querySelector("#open-video");
    if (openVideo) {
        openVideo.onclick = function () {
            var openingVideo = new MozActivity({
                name: "open",
                data: {
                    type: [
                    url: ""

    In that situation, the video player of Firefox OS will open, and play the video: it’s that easy!

    In the end…

    You may or may not need to use those tricks in your app, but adding videos or audio can enhance the quality of your application, and make it feel alive. At the end, you have to give a strong experience to your users, and it’s what will make the difference between a good and a great app!

  9. Firefox OS Development: Web Components and Mozilla Brick

    In this edition of “Firefox OS: The platform HTML5 deserves” (the previous six videos are published here), Mozilla’s Principal Evangelist Chris Heilmann (@codepo8) grilled Mozilla’s “Senior HTML5 Engineer Angle Bracket Coordinator” Matthew Claypotch (@potch) about the exciting new possibilities of Web Components for Web App developers and how Mozilla’s Brick library, a collection of custom elements to build applications with, can help with the transition. You can watch the interview on YouTube.

    The Why of Web components

    There is a problem with the Web as a platform for applications: HTML, the language that makes it easy to mark up documents and give them meaning doesn’t have enough elements to build applications. There are quite a few new elements in the HTML5 spec, but their support is sketchy across browsers and there are still a lot of widgets missing that other platforms like Flex or iOS give developers out-of-the-box. As a result, developers build their own “widgets” like menu bars, slider controls and calendars using non-semantic HTML (mostly DIV elements) and make them interactive using JavaScript and theme-able using CSS.

    This is a great workaround but the issue is that we add on top of the functionality of browsers instead of extending the way they already function. In other words, a browser needs to display HTML and does a great job doing that at least 60 frames per second. We then add our own widget functionality on top of that and animate and change the display without notifying the browser. We constantly juggle the performance of the browser and our own code on top of it. This leads to laggy interfaces, battery drain and flickering.

    To work around that problem a few companies and standards body members are working on the Web Components specification which allows developers to extend the browser’s understanding of markup with own elements. Instead of writing a slider control and make it work after the browser already displayed the document, you define a slider element and become part of the normal display flow. This means our widgets get more responsive, don’t work against the browser’s rendering flow and all in all perform better. Especially on low spec mobile devices this is a massive win. The whole thing already happens: if you for example add a video element to the document you see a video controller with a timed slider bar, a play button and volume controls. All of these are HTML, CSS and JavaScript and you can even see them in the debugging tools:

    Anatomy of a video element

    Firefox OS, being targeted at low end devices can benefit a lot from widgets that are part of the rendering flow, which is why Mozilla created Mozilla Brick, a collection of custom elements to build applications with. Earlier we introduced the concept using a library called XTags, which powers Brick. Using Brick, it is very simple to create for example a deck based application layout using the following markup:

    <x-deck selected-index="0">
        0<span>I'm the first card!</span>
          These cards can contain any markup!<br />
          <img src="../../site/img/grounds_keeping_it_real_s3.gif">
          <img src="../../site/img/grounds_keeping_it_real_s1.gif">
          <img src="../../site/img/grounds_keeping_it_real_s2.gif">
        2 <img src="../../site/img/thumbs_up.gif">

    The resulting app consists of three decks that can be animated into another without having to do anything but call a deck.shuffleNext(); function.

    Web Components are a huge topic right now and many libraries and frameworks appear each week. We hope that by using Brick we can enable developers to build very responsive apps for Firefox OS quickly and cleanly and leave the pain of making your app perform really well up to the OS.

  10. Web Activities – Firefox OS: the platform HTML5 deserves

    In the sixth video of our “Firefox OS – the platform HTML5 deserves” series (the previous five videos are published here) we talk about how Web Activities allow you as a developer to access parts of the hardware without having to package your app.

    Firefox OS - be the future

    Check out the video featuring Chris Heilmann (@codepo8) from Mozilla and Daniel Appelquist (@torgo) from Telefónica Digital/ W3C talking about the why and how of Web Activities. You can watch the video here.

    Web Activities are a way to extend the functionality of HTML5 apps without having to access the hardware on behalf of the user. In other words, you don’t need to ask the user to access the camera or the phone, but instead your app asks for an image or initiate a call and the user then picks the app most appropriate for the task. In the case of a photo the user might pick it from the gallery, the wallpapers or shoot a new photo with the camera app. You then get the photo back as a file blob. The code is incredibly simple:

    var pick = new MozActivity({
       name: "pick",
       data: {
           type: ["image/png", "image/jpg", "image/jpeg"]}

    You invoke the “pick” activity and you ask for an image by listing all the MIME types you require. This small script will cause a Firefox OS device or an Android device running Firefox to show the user the following dialog:

    pick dialog

    All activities have a success and failure handler. In this case you could create a new image when the user successfully picked a source image or show an alert when the user didn’t allow you to take a picture or it was the wrong format:

    pick.onsuccess = function () {// Create image and set the returned blob as the src
        var img = document.createElement("img");
        img.src = window.URL.createObjectURL(this.result.blob);
        // Present that image in your app
        var imagePresenter = document.querySelector("#image-presenter");
    pick.onerror = function () {// If an error occurred or the user canceled the activity
        alert("Can't view the image!");

    Other Web Activities work in a similar fashion, for example to ask the user to call a number you write the following:

    var call = new MozActivity({
        name: "dial",
        data: {
            number: "+46777888999"

    This opens the application the user has defined as the one to make phone calls, and asks to call the number. Once the user hangs up, you get a success handler object back.

    Web Activities have a few benefits:

    • They allow secure access to hardware – instead of asking the user to allow yet another app to use the camera you send the user to the application they already trust to do this.
    • They allow your app to be part of the user’s device experience – instead of building a camera interface you send the user to the one they already are familiar with to take photos
    • You allow apps to become an ecosystem on the device – instead of having each app do the same things, you allow them to specialise on doing one thing and one thing well
    • You keep the user in control – they can provide you with the photo from anywhere they want and they can store results from your app’s functionality where they want rather than in yet another database on their device

    We’ve covered the subject here before in detail in the Introducing Web Activities post.

    The simplest way to get started with Web Activities on a Firefox OS device (or simulator) or an Android phone running Firefox is to download the Firefox OS Boilerplate App and play with the activities and the code:

    Firefox OS Boilerplate App

    Web Activities are a simple way to enable the apps hosted on your servers to reach further into the hardware without acting on behalf of the user. Instead, you let users decide how to get the information you want and concentrate on what to do with the data once you have it instead.