Audio Articles

Sort by:


  1. Videos and Firefox OS

    Before HTML5

    Those were dark times Harry, dark times – Rubeus Hagrid

    Before HTML5, displaying video on the Web required browser plugins and Flash.

    Luckily, Firefox OS supports HTML5 video so we don’t need to support these older formats.

    Video support on the Web

    Even though modern browsers support HTML5, the video formats they support vary:

    In summary, to support the most browsers with the fewest formats you need the MP4 and WebM video formats (Firefox prefers WebM).

    Multiple sizes

    Now that you have seen what formats you can use, you need to decide on video resolutions, as desktop users on high speed wifi will expect better quality videos than mobile users on 3G.

    At Rormix we decided on 720p for desktop, 360p for mobile connections, and 180p specially for Firefox OS to reduce the cost in countries with high data charges.

    There are no hard and fast rules — it depends on who your market audience is.


    The best streaming solution would be to automatically serve the user different videos sizes depending on their connection status (adaptive streaming) but support for this technology is poor.

    HTTP live streaming works well on Apple devices, but has poor support on Android.

    At the time of writing, the most promising technology is MPEG DASH, which is an international standard.

    In summary, we are going to have to wait before we get an adaptive streaming technology that is widely accepted (Firefox does not support HLS or MPEG DASH).

    DIY Adaptive streaming

    In the absence of adaptive streaming we need to try to work out the best video quality to load at the outset. The following is a quick guide to help you decide:

    Wifi or 3G

    Using a certified Firefox OS app you can check to see if the user is on wifi or not.

    var lock    = navigator.mozSettings.createLock();
    var setting = lock.get('wifi.enabled');
    setting.onsuccess = function () {
      console.log('wifi.enabled: ' + setting.result);
    setting.onerror = function () {
      console.warn('An error occured: ' + setting.error);

    There is some more information at the W3C Device API.

    Detecting screen size

    There is no point sending a 720p video to a user with a screen smaller than 720p. There are many ways to get the different bounds of a user’s screen; innerWidth and width allow you to get a good idea:

    function getVidSize()
      //Get the width of the phone (rotation independent)
      var min = Math.min($(window).innerHeight(),$(window).innerWidth());
      //Return a video size we have
      if(min < 320)      return '180';
      else if(min < 550) return '360';
      else               return '720';

    Determining internet speed

    It is difficult to get an accurate read of a user’s internet speed using web technologies — usually they involve loading a large image onto the user’s device and timing it. This has the disadvantage of having to send more data to the user. Some services such as: exist, but still require data downloads to the user’s device. (Stackoverflow has some more options.)

    You can be slightly more clever by using HTML5, and checking the time it takes between the user starting the video and a set amount of the video loading. This way we do not need to load any extra data on the user’s device. A quick VideoJS example follows:

    var global_speedcount = 0;
    var global_video = null;
    global_video = videojs("video", {}, function(){
    //Set up video sources
      //User has clicked play
      global_speedcount = new Date().getTime();
    function timer()
      var diff = new Date().getTime() - global_speedcount;
      //Remove this handler as it is run multiple times per second!'timeupdate',timer);

    This code starts timing when the user clicks play, and when the browser starts to play the video it sends timing information to timeupdate. You can also use this function to detect if lots of buffering is happening.

    Detect high resolution devices

    One final thing to determine is whether or not a user has a high pixel density screen. In this case even if they have a small screen it can still have a large number of pixels (and therefore require a higher resolution video).

    Modernizr has a plugin for detecting hi-res screens.

    if (Modernizr.highresdisplay)
      alert('Your device has a high resolution screen');

    WebP Thumbnails

    Not to get embroiled in an argument, but at Rormix we have seen an average decrease of 30% in file size (WebP vs JPEG) with no loss of quality (in some cases up to 50% less). And in countries with expensive data plans, the less data the better.

    We encode all of our thumbnails in multiple resolutions of WebP and send them to every device that supports them to reduce the amount of data being sent to the user.

    Mobile considerations

    If you are playing HTML5 videos on mobile devices, their behavior differs. On iOS it automatically goes to full screen on iPhones/iPods, but not on tablets.

    Some libraries such as VideoJS have removed the controls from mobile devices until their stability increases.

    Useful libraries

    There are a few useful HTML5 video libraries:

    Mozilla links

    Mozilla has some great articles on web video:

    Other useful Links

  2. Blend4Web: the Open Source Solution for Online 3D

    Half year ago Blend4Web was first released publicly. In this article I’ll show what Blend4Web is, how it is evolved and and how it can be used for web development.

    What Is Blend4Web?

    In short, Blend4Web is an open source framework for creating 3D web applications. It uses Blender – the popular open source 3D modeling suite – as the primary authoring tool. 3D graphics is rendered by means of WebGL which is also an open standard technology. The two main keywords here – Blender and Web(GL) – explain the purpose of this engine perfectly.

    The full source code of Blend4Web together with some usage examples is available under GPLv3 on GitHub (there is also a commercial licensing option).

    The 3D Web

    On June the 2nd Apple presented their new operating systems – OS X Yosemite and iOS 8 – both featuring WebGL support in their Safari browser. That marked the end of a 5 year cycle during which the WebGL technology has been evolving, starting with the first unstable browser builds (if anybody remembers Firefox 3.7 alpha?). Now, all the major browsers on all desktop and mobile systems support this open standard for rendering 3D graphics, everywhere, without any plugins.

    That was a long and difficult road, along which Blend4Web development has been following WebGL development as a shadow. Broken rendering, tab crashes, security “warnings” from some big guys, unavailability in public browser builds, all sorts of fears, uncertainty and doubts. All this didn’t matter, because we have the opportunity to do 3D graphics (and sound) in browsers!


    The first Blender 2.5x builds appeared in summer 2010. At the time we, the programming geeks, were pushed to learn the basics of 3D modeling by the beautiful Sintel from the open source movie of the same name. After choosing Blender, we could be as independent as possible, with a full open source pipeline organized on a Linux platform. Blender gave us the power to make our own 3D scenes, and later helped to attract talanted artists from its wonderful community to join us.

    Blend4Web Evolution in Demos

    Our demo scenes matured together with the development of Blend4Web. The first one was a quite low-poly and almost non-interactive demo called The Island. It was created in 2011 and polished a bit before the public release. In this demo we introduced our Blender-based pipeline in which all the assets are stored in separate files and are linked into the main file for level design and further exporting (for this reason some of Blend4Web users call it “free Unity Pro”).

    In Fashion Show we developed cloth animation techniques. Some post-processing effects, dynamic reflection and particle systems were added later. After Blend4Web has gone public we summarized these cloth-releated tricks in one of our tutorials.

    The Farm is a huge scene (in the scale of a browser): over 25 hectares of land, buildings, animated animals and foliage. We added some gamedev elements into it, including the ability of first-person walking, interacting with objects, driving a vehicle. The demo features spatial audio (via Web Audio) and physics (via Bullet and Asm.js). The Freedesktop folks tried it as a benchmark while testing the Mesa drivers (and got “massive crashes” :).

    We also tried some visualization and created Nature Morte. In this scene we used carefully crafted textures and materials, as well as post-processing effects to improve realism. However, the technology used for this demo was
    quite simple and old-school, as we had no support for visual shader editing yet.

    Things have changed when Blender’s node materials have become available to our artists. They created over 40 different materials for the Sports Car model: chromed metal, painted metal, glass, rubber, leather etc.

    In our latest release we stepped even further by adding support for the animation control by the user. Now interactivity can be implemented without any coding. In order to demonstrate the new opening possibilities we presented interactive infographic of a light helicopter.

    Among the other possible applications of this simple yet effective tool (called NLA Script) we can list the following: interactive 3D web design, product promotions, learning materials, cartoons with the ability to choose between different story lines, point-and-click games and any other applications previously created with Flash.

    Using Blend4Web

    It is very easy to start using Blend4Web – just download and install the Blender addon as shown in this video tutorial:

    The most wonderful thing is that your Blender scene can be exported into a self-contained HTML file that can be emailed, uploaded to your own website or to a cloud – in short shared however you like. This freedom is a fundamental difference from numerous 3D web publishing services as we don’t lock our users to our technology by any means.

    For those who want to create highly interactive 3D web apps we offer the SDK. Some notable examples of what is possible with the Blend4Web API are demonstrated in our programming tutorials, ranging from web design to games.

    Programming 3D web apps with Blend4Web is not much harder than building average RIAs. Unlike some other WebGL frameworks in the wild we tried to offload all graphics, animation and audio tasks to respective professionals. The programmer just loads the scene…

    var m_data = require("data");
    m_data.load("example.json", load_cb);

    …and then writes the logic which triggers the 3D scene changes that are “hard-coded” by the artists, e.g. plays the animation for the user-selected object:

    var m_scenes = require("scenes");
    var m_anim = require("animation");
    var myobj = m_scenes.pick_object(event.clientX, event.clientY);

    As you can see the APIs are structured in a CommonJS way which we believe is important for creating compact and fast web apps.

    The Future

    There are many possible ways in which the Internet and IT will be going but there is no doubt that the strong and steady development of 3D Web is already here. We expect that more and more users will change their expectations about how web content should look and feel like. We’re gonna help the web developers meet these demands with plans to improve usability and performance and to implement new interesting graphics effects.

    We also follow the development of WebGL 2.0 (thanks Mozilla for your job) and expect to create even more nice things on top of it.

    Stay Tuned

    Read our blog, join us on Twitter, Google+, Facebook and Reddit, watch the demos and tutorials on our YouTube channel, fork Blend4Web at GitHub.

  3. Introducing the Web Audio Editor in Firefox Developer Tools

    In Firefox 32, the Web Audio Editor joins the Shader Editor and Canvas Debugger in Firefox Developer Tools for debugging media-rich content on the web. When developing HTML5 games or fun synthesizers using web audio, the Web Audio Editor assists in visualizing and modifying all of the audio nodes within the web audio AudioContext.

    Visualizing the Audio Context

    When working with the Web Audio API‘s modular routing, it can be difficult to translate how all of the audio nodes are connected just by listening to the audio output. Often, it is challenging to debug our AudioContext just by listening to the output and looking at the imperative code that creates audio nodes. With the Web Audio Editor, all of the AudioNodes are rendered in a directed graph, illustrating the hierarchy and connections of all audio nodes. With the rendered graph, a developer can ensure that all of the nodes are connected in a way that they expect. This can be especially useful when the context becomes complex, with a network of nodes dedicated to manipulating audio and another for analyzing the data, and we’ve seen some pretty impressive uses of Web Audio resulting in such graphs!

    To enable the Web Audio Editor, open up the options in the Developer Tools, and check the “Web Audio Editor” option. Once enabled, open up the tool and reload the page so that all web audio activity can be monitored by the tool. When new audio nodes are created, or when nodes are connected and disconnected from one another, the graph will update with the latest representation of the context.

    Modifying AudioNode Properties

    Once the graph is rendered, individual audio nodes can be inspected. Clicking on an AudioNode in the graph opens up the audio node inspector where AudioParam‘s and specific properties on the node can be viewed and modified.

    Future Work

    This is just our first shippable release of the Web Audio Editor, and we are looking forward to making this tool more powerful for all of our audio developers.

    • Visual feedback for nodes that are playing, and time/frequency domain visualizations.
    • Ability to create, connect and disconnect audio nodes from the editor.
    • Tools for debugging onaudioprocess events and audio glitches.
    • Display additional AudioContext information and support multiple contexts.
    • Modify more than just primitives in the node inspector, like adding an AudioBuffer.

    We have many dream features and ideas that we’re excited about, and you can view all open bugs for the Web Audio Editor or submit new bugs. Be sure to check out the MDN documentation on the Web Audio Editor and we would also love feedback and thoughts at our UserVoice feedback channel and on Twitter @firefoxdevtools.

  4. Easy audio capture with the MediaRecorder API

    The MediaRecorder API is a simple construct, used inside Navigator.getUserMedia(), which provides an easy way of recording media streams from the user’s input devices and instantly using them in web apps. This article provides a basic guide on how to use MediaRecorder, which is supported in Firefox Desktop/Mobile 25, and Firefox OS 2.0.

    What other options are available?

    Capturing media isn’t quite as simple as you’d think on Firefox OS. Using getUserMedia() alone yields raw PCM data, which is fine for a stream, but then if you want to capture some of the audio or video you start having to perform manual encoding operations on the PCM data, which can get complex very quickly.

    Then you’ve got the Camera API on Firefox OS, which until recently was a certified API, but has been downgraded to privileged recently.

    Web activities are also available to allow you to grab media via other applications (such as Camera).

    the only trouble with these last two options is that they would capture only video with an audio track, and you would still have separate the audio if you just wanted an audio track. MediaRecorder provides an easy way to capture just audio (with video coming later — it is _just_ audio for now.)

    A sample application: Web Dictaphone

    An image of the Web dictaphone sample app - a sine wave sound visualization, then record and stop buttons, then an audio jukebox of recorded tracks that can be played back.

    To demonstrate basic usage of the MediaRecorder API, we have built a web-based dictaphone. It allows you to record snippets of audio and then play them back. It even gives you a visualization of your device’s sound input, using the Web Audio API. We’ll concentrate on the recording and playback functionality for this article.

    You can see this demo running live, or grab the source code on Github (direct zip file download.)

    CSS goodies

    The HTML is pretty simple in this app, so we won’t go through it here; there are a couple of slightly more interesting bits of CSS worth mentioning, however, so we’ll discuss them below. If you are not interested in CSS and want to get straight to the JavaScript, skip to the “Basic app setup” section.

    Keeping the interface constrained to the viewport, regardless of device height, with calc()

    The calc function is one of those useful little utility features that’s cropped up in CSS that doesn’t look like much initially, but soon starts to make you think “Wow, why didn’t we have this before? Why was CSS2 layout so awkward?” It allows you do a calculation to determine the computed value of a CSS unit, mixing different units in the process.

    For example, in Web Dictaphone we have theee main UI areas, stacked vertically. We wanted to give the first two (the header and the controls) fixed heights:

    header {
      height: 70px;
    .main-controls {
      padding-bottom: 0.7rem;
      height: 170px;

    However, we wanted to make the third area (which contains the recorded samples you can play back) take up whatever space is left, regardless of the device height. Flexbox could be the answer here, but it’s a bit overkill for such a simple layout. Instead, the problem was solved by making the third container’s height equal to 100% of the parent height, minus the heights and padding of the other two:

    .sound-clips {
      box-shadow: inset 0 3px 4px rgba(0,0,0,0.7);
      background-color: rgba(0,0,0,0.1);
      height: calc(100% - 240px - 0.7rem);
      overflow: scroll;

    Note: calc() has good support across modern browsers too, even going back to Internet Explorer 9.

    Checkbox hack for showing/hiding

    This is fairly well documented already, but we thought we’d give a mention to the checkbox hack, which abuses the fact that you can click on the <label> of a checkbox to toggle it checked/unchecked. In Web Dictaphone this powers the Information screen, which is shown/hidden by clicking the question mark icon in the top right hand corner. First of all, we style the <label> how we want it, making sure that it has enough z-index to always sit above the other elements and therefore be focusable/clickable:

    label {
        font-family: 'NotoColorEmoji';
        font-size: 3rem;
        position: absolute;
        top: 2px;
        right: 3px;
        z-index: 5;
        cursor: pointer;

    Then we hide the actual checkbox, because we don’t want it cluttering up our UI:

    input[type=checkbox] {
       position: absolute;
       top: -100px;

    Next, we style the Information screen (wrapped in an <aside> element) how we want it, give it fixed position so that it doesn’t appear in the layout flow and affect the main UI, transform it to the position we want it to sit in by default, and give it a transition for smooth showing/hiding:

    aside {
       position: fixed;
       top: 0;
       left: 0;
       text-shadow: 1px 1px 1px black;
       width: 100%;
       height: 100%;
       transform: translateX(100%);
       transition: 0.6s all;
       background-color: #999;
        background-image: linear-gradient(to top right, rgba(0,0,0,0), rgba(0,0,0,0.5));

    Last, we write a rule to say that when the checkbox is checked (when we click/focus the label), the adjacent <aside> element will have it’s horizontal translation value changed and transition smoothly into view:

    input[type=checkbox]:checked ~ aside {
      transform: translateX(0);

    Basic app setup

    To grab the media stream we want to capture, we use getUserMedia() (gUM for short). We then use the MediaRecorder API to record the stream, and output each recorded snippet into the source of a generated <audio> element so it can be played back.

    First, we’ll add in a forking mechanism to make gUM work, regardless of browser prefixes, and so that getting the app working on other browsers once they start supporting MediaRecorder will be easier in the future.

    navigator.getUserMedia = ( navigator.getUserMedia ||
                           navigator.webkitGetUserMedia ||
                           navigator.mozGetUserMedia ||

    Then we’ll declare some variables for the record and stop buttons, and the <article> that will contain the generated audio players:

    var record = document.querySelector('.record');
    var stop = document.querySelector('.stop');
    var soundClips = document.querySelector('.sound-clips');

    Finally for this section, we set up the basic gUM structure:

    if (navigator.getUserMedia) {
       console.log('getUserMedia supported.');
       navigator.getUserMedia (
          // constraints - only audio needed for this app
             audio: true
          // Success callback
          function(stream) {
          // Error callback
          function(err) {
             console.log('The following gUM error occured: ' + err);
    } else {
       console.log('getUserMedia not supported on your browser!');

    The whole thing is wrapped in a test that checks whether gUM is supported before running anything else. Next, we call getUserMedia() and inside it define:

    • The constraints: Only audio is to be captured; MediaRecorder only supports audio currently anyway.
    • The success callback: This code is run once the gUM call has been completed successfully.
    • The error/failure callback: The code is run if the gUM call fails for whatever reason.

    Note: All of the code below is placed inside the gUM success callback.

    Capturing the media stream

    Once gUM has grabbed a media stream successfully, you create a new Media Recorder instance with the MediaRecorder() constructor and pass it the stream directly. This is your entry point into using the MediaRecorder API — the stream is now ready to be captured straight into a Blob, in the default encoding format of your browser.

    var mediaRecorder = new MediaRecorder(stream);

    There are a series of methods available in the MediaRecorder interface that allow you to control recording of the media stream; in Web Dictaphone we just make use of two. First of all, MediaRecorder.start() is used to start recording the stream into a Blob once the record button is pressed:

    record.onclick = function() {
      console.log("recorder started"); = "red"; = "black";

    When the MediaRecorder is recording, the MediaRecorder.state property will return a value of “recording”.

    Second, we use the MediaRecorder.stop() method to stop the recording when the stop button is pressed, and finalize the Blob ready for use somewhere else in our application.

    stop.onclick = function() {
      console.log("recorder stopped"); = ""; = "";

    When recording has been stopped, the state property returns a value of “inactive”.

    Note that there are other ways that a Blob can be finalized and ready for use:

    • If the media stream runs out (e.g. if you were grabbing a song track and the track ended), the Blob is finalized.
    • If the MediaRecorder.requestData() method is invoked, the Blob is finalized, but recording then continues in a new Blob.
    • If you include a timeslice property when invoking the start() method — for example start(10000) — then a new Blob will be finalized (and a new recording started) each time that number of milliseconds has passed.

    Grabbing and using the blob

    When the blob is finalized and ready for use as described above, a dataavailable event is fired, which can be handled using a mediaRecorder.ondataavailable handler:

    mediaRecorder.ondataavailable = function(e) {
      console.log("data available");
      var clipName = prompt('Enter a name for your sound clip');
      var clipContainer = document.createElement('article');
      var clipLabel = document.createElement('p');
      var audio = document.createElement('audio');
      var deleteButton = document.createElement('button');
      audio.setAttribute('controls', '');
      deleteButton.innerHTML = "Delete";
      clipLabel.innerHTML = clipName;
      var audioURL = window.URL.createObjectURL(;
      audio.src = audioURL;
      deleteButton.onclick = function(e) {
        evtTgt =;

    Let’s go through the above code and look at what’s happening.

    First, we display a prompt asking the user to name their clip.

    Next, we create an HTML structure like the following, inserting it into our clip container, which is a <section> element.

    <article class="clip">
      <audio controls></audio>
      <p><em>your clip name</em></p>

    After that, we create an object URL pointing to the event’s data attribute, using window.URL.createObjectURL( this attribute contains the Blob of the recorded audio. We then set the value of the <audio> element’s src attribute to the object URL, so that when the play button is pressed on the audio player, it will play the Blob.

    Finally, we set an onclick handler on the delete button to be a function that deletes the whole clip HTML structure.


    And there you have it; MediaRecorder should serve to make your app media recording needs easier. Have a play around with it and let us know what you think: we are looking forward to seeing what you’ll build!

  5. Audio Tags: Web Components + Web Audio = ♥

    Article written by Soledad Penadés, edited by Angelina Fabbro.

    Last week we released Brick 1.0, our carefully curated set of web components for rapid development. Using components makes it very easy to use and integrate these UI widgets with existing code and frameworks.

    And this week we bring you Audio Tags, an experiment building Web Components that represent Web Audio blocks that let us construct a complete instrument with an interface to play it. With reusable audio blocks, developers can experiment with Web Audio without having to write a lot of boilerplate code.

    Let’s build a simple synthesiser to demonstrate how the different tags work together!

    The Audio Context

    The first thing we need is an audio context. If you’ve ever done any Canvas programming, this will sound familiar. The context is akin to a toolbox: it’s got the functions (the tools) that you need and it’s also where everything happens. All other audio tags will be placed inside a context.

    This is how an audio context looks like when using Audio Tags:


    That’s it!


    While being able to create an audio context by typing just one tag declaration is great, it is not particularly exciting if we can’t get any audible output. We need something that generates a sound, and for this we’ll star with something simple and use an oscillator. As the name implies, the output is a signal that oscillates between two values: -1 and 1, generating a periodic waveform. We will place it inside an audio context to have its output automatically routed via the context’s output to the computer’s speakers:


    Context with oscillator

    (See it live).

    In the real world, oscillators can generate different signal shapes. Likewise, in the Web Audio world we have analogous wave types that we can use: sine, square, sawtooth, and triangle. Since web components are first class DOM elements, we can specify the desired wave type by using an attribute:

    <audio-oscillator type="square"></audio-oscillator>

    You could even change it live by opening the console and typing this:

    document.querySelector('audio-oscillator').type = 'square';

    Similarly, you can also change the frequency the oscillator is running at by setting the frequency attribute:

    <audio-oscillator frequency="220"></audio-oscillator>


    Having a running oscillator is just the first step. Most synthesisers available have more than one oscillator playing at the same time to make the sound more complex and nuanced. We need some way of playing two or more sounds in parallel, while combining them into a single output.

    This is commonly known as mixing audio, and therefore we need a mixer:

            <audio-oscillator frequency="220"></audio-oscillator>
            <audio-oscillator frequency="440"></audio-oscillator>

    The mixer will take the output of each of its children, and join them together to form its own output, which is then connected to the context’s output. Note also that since we’re dealing with DOM elements, when we say “children” we literally mean the mixer’s DOM children elements.



    Chain (and oscilloscope)

    When you start adding multiple sounds it’s useful to be able to see what is going on in the synthesiser. What if we could, somehow, plug a component between the output of one child and the input of another, and display what the sound wave looks like at that point?

    We can’t do that with the mixer, because it just joins all the outputs together. We need a new abstract structure: chains. An audio chain will connect the output of its first children to the input of the second children, and the output of the second children to the input of the third one, and so on, until we reach the last children and just connect its output to the chain output.

    Or in other words: while the mixer connects things in parallel, the chain connects them serially.

    Let’s connect a new element –the oscilloscope– to the output of an oscillator, using a chain. The oscilloscope will just display what is connected to its input, and the signal will pass through to its output without being modified at all. You can change the oscillator’s wave type to square, and see how the oscilloscope changes its display accordingly.

            <audio-oscillator frequency="220"></audio-oscillator>




    Synthesisers don’t limit themselves to just running several oscillators at the same time. They often add postprocessing units to this raw generated audio, which give the synthesiser its own distinctive sound.

    There are many types of postprocessing effects, and some of the most popular are filters, which roughly work by highlighting certain frequencies or removing others. For example, we can chain a low pass filter to the output of an oscillator, and that would only allow the lower frequencies to go through. This produces a sort of dampening effect, as if we had put on some ear muffs, because higher frequencies travel through the air and we hear them with our ear pavilions, while lower frequencies tend to travel through the earth and objects too. So rather than hearing them, we feel them with our body, and it doesn’t matter whether you have something over your ears or not.

            <audio-oscillator frequency="220"></audio-oscillator>
            <audio-filter type="lowpass"></audio-filter>



    Web Audio natively implements biquad pole filters, and as happens with the audio-oscillator tag, you can alter the filter behaviour by setting its type attribute. For example:

    <audio-filter type="highpass"></audio-filter>

    You could even insert several oscilloscopes: one before and another after a filter, to see the effect the filter has on the signal:

            <audio-oscillator frequency="220"></audio-oscillator>
            <audio-filter type="lowpass"></audio-filter>

    Filter with two oscilloscopes


    And finally, the minisynth

    We have enough components to build a synthesiser now! We want two oscillators playing together (one an octave higher than the other), and a filter to make the sound a little bit less harsh and more “self-contained”. So, without further ado, this is the structure for representing our minimal synth, the <mini-synth>, using the components we’ve introduced so far:

        <audio-filter type="lowpass"></audio-filter>

    For the sake of comparison, this is more or less how we would assemble a similar setup using raw Web Audio API objects and functions:

    var mixerGain = context.createGain();
    var osc1 = context.createOscillator();
    var osc2 = context.createOscillator();
    var filter = context.createBiquadFilter();
    // and the actual output is at *filter*

    It’s not that the code is particularly complicated, it just doesn’t have the nice visual hierarchy of the declarative syntax. The visual cues from the syntax make understanding the relationship between elements quick and easy.

    We still need a few lines of JavaScript to make the <mini-synth> component behave like a synthesiser: it has to start and stop both oscillators at the same time. We can take advantage from the fact that the AudioTag prototype has some common base methods that we can overload to get specific behaviours in our custom components.

    In this particular case we’ll overload the start and stop methods to make the oscillators start and stop playing respectively when we call those methods in the synth. This way we abstract the internals of the synthesiser from the world, while still exposing a consistent interface.

    start: function(when) {
        // We want to make sure we don't clip (i.e. go under -1 or over 1),
        // so we'll divide the gain by the number of oscillators in the synth
        var oscGain = this.oscillators.length > 0 ? 1.0 / this.oscillators.length : 1.0;
        this.oscillators.forEach(function(osc) {
            osc.gain = oscGain;
    stop: function(when) {
        this.oscillators.forEach(function(osc) {

    The implementation should be fairly easy to follow.

    You might be wondering about the when parameter. It is used to tell the browser when to actually start the action, so that you can schedule various events in the future with accurate timing. It means “execute this code at when milliseconds”. In our case we’re just using a value of 0, which means “do that immediately”. I advise you to read more about when in the Web Audio spec.

    We also need to implement a method for actually telling the synthesiser which note to play, or in other words, which frequency should each oscillator be running at. So let’s implement noteOn:

    noteOn: function(noteNumber) {
        this.oscillators.forEach(function(osc, index) {
            // Each oscillator should play in a higher octave
            // Each octave is composed of 12 notes
            var oscNoteNumber = noteNumber + 12 * index;
            // We're using a library to convert note numbers to frequencies
            var frequency = MIDIUtils.noteNumberToFrequency(oscNoteNumber);
            osc.frequency = frequency;

    You don’t need to use MIDIUtils, but it comes handy if you ever want to jam with an instrument in your browser and someone else using a more traditional MIDI instrument. By using standard frequencies you can be sure that both your instruments will be tuned in the same pitch, and that is GOOD.

    We also need a way for triggering notes in the synthesiser, so what better way than having an on screen keyboard component?

    <audio-keyboard octaves="2"></audio-keyboard>

    will insert a keyboard component with 2 octaves. Once the keyboard gets focus (by clicking on it) you can tap keys on your computer’s keyboard and it will emit noteon events. If we listen to those, we can then send them to the synthesiser. And the same goes for the noteoff events:

    keyboard.addEventListener('noteon', function(e) {
      var noteIndex = e.detail.index;
      // 48 is the base note here = C-3
      minisynth.noteOn(parseInt(noteIndex, 10) + 48);
    }, false);
    keyboard.addEventListener('noteoff', function(e) {
    }, false);

    So it is DEMO time!


    And now that we have a synthesiser we can say we’re rockstars! But rockstars need to look cool. Real-life rockstars have their own signature guitars and customised cabinets. And we have… CSS! We can go as wild as we want with CSS, so just press the Become a rockstar button on the demo and watch as the synthesiser becomes something else thanks to the magic of CSS.

    Minisynth, rockstarified

    Looking behind the curtains

    So far we’ve only talked about these fancy new audio tags and assumed that they are magically available in your browser, even though it is obvious they are non-standard elements. We haven’t explained where they come from. Well, if you’ve read this far already you deserve to be shown the secrets of the kingdom!

    If you look at the source code of any the examples, you’ll notice that we’re consistently including the AudioTags.bundle.js (line 18) and AudioTags.bundle.css (line 6) files. The CSS is not particularly exciting and the real magic happens in the JavaScript. This file includes a couple of utility libraries that give us the ability to define custom tags in the browser, and then the code for defining and making available these new tags in the browser.

    On the utility side, we first include AudioContext-MonkeyPatch, for unifying Web Audio API disparities between browsers and enabling us to use a modern, consistent syntax. If you want to know more about writing portable Web Audio code, you can have a look at this article.

    The second library we’re including is X-Tag, and more specifically, its very innermost core. X-Tag is a custom elements polyfill, and custom elements are a part of the emerging Web Components spec, meaning this stuff will be built right into the browser soon. X-Tag is the same library that Mozilla Brick uses. You can learn how to make your own custom elements with this article.

    That said, if you plan to use Brick and Audio Tags in the same project, a disaster might probably ensue, since both Brick and Audio Tags include X-Tag’s core in their distribution bundles. The authors of both libraries are discussing what’s the best way to proceed about this, but we haven’t settled on any action yet, because Audio Tags is such a newcomer to the X-Tag powered library-scene. In any case, the most likely outcome is that we’ll offer an option to build Brick and Audio Tags without including X-Tag core.

    Also, here is a video of this same material at CascadiaJS, so you can watch someone build it right in front of you. It may help your understanding of the topic:

    What’s next for Audio Tags?

    Many people have been asking me what’s next for Audio Tags. What are the upcoming features? What have I planned? How do you contribute? How do we go about adding new tags?

    To be honest? I have no idea! But that’s the beauty of it. This is just a starting point, an invitation to think, play with and discuss about this notion of declarative audio components. There’s, of course, a list of things that don’t work yet, some random ideas and maybe potential features in the Audio Tags’ README file. I will probably keep extending it and filling the gaps–it is a good playground for experimenting with audio without getting too messy, and also a good test for Web Components that go past the usual “encapsulated UI widgets on steroids” notion.

    Some people have found the project inspiring in itself; others thought that it would be useful for teaching signal processing, others mixed it with accelerometer data to create physically-controlled synthesisers, and others decided to ditch the audio side of it and just build custom components for WebRTC purposes. It’s up to each one of you to contribute if you feel like doing so!

  6. Monster Madness – creating games on the web with Emscripten

    When our engineering teams at Trendy Entertainment & Nom Nom Games decided on the strategy of developing one of our new Unreal Engine 3 games — Monster Madness Online — as a cross-platform title, we knew that a frictionless multiplayer web browser version would be central to this experience. The big question, however, was determining what essential technologies to utilize in order to bring our game onto the web. As a C++ oriented developer, we determined quickly that rewriting the game engine from the ground-up was out of the question. We’d need a solution that would allow us to port our existing code in an efficient manner into a format usable in the browser…

    TL;DR? Watch the video!

    Playing the Field

    We looked hard at the various options in front of us: FlasCC (a GCC Flash compiler), Google’s NaCl, a custom native C++ extension, or Mozilla’s Emscripten & asm.js.

    In our tests, Flash ran slowly and had inconsistent behaviors between Pepper (Chrome) and Adobe’s plugin version. Combined with the increasingly onerous plugin requirement, we opted to look elsewhere for a more seamless, forward-thinking approach.

    NaCl had the issue of requiring a walled-garden distribution site that would separate us from direct contact with our users, and also being processor-specific. pNaCL eliminated the walled-garden requirement and added dynamic code compilation support, but still had the issues of being processor-specific code necessitating in our view device-specific testing, and a potentially long startup time as the code would be linked on first run. Finally, only working in Chrome would be a dealbreaker for our desire to have our game run in all major browsers.

    A custom plugin/extension with C++ would require lots of testing & maintenance efforts on our part to run across different browsers, processor architectures, and operating systems, and such an installation requirement would likely scare away many potential players.

    As it turned out, for our team’s purposes the Emscripten compiler & asm.js proved to be the best solution to these challenges, and when combined with a set of other new-ish Web API’s, enabled the browser to become a fully featured platform for instant high-end 3D gaming. This just took a little trial & error to figure out exactly how we’d piece it together…and that’s what we’ll be reviewing here!

    First Steps into a Brave New World

    We Trendy game engineers are primarily old-school C++ programmers, so it was something of a revelation that Emscripten could compile our existing application (built on Epic Game’s Unreal Engine 3) into asm.js optimized Javascript with little to no changes.

    The primary Unreal Engine 3-specific code tweaks that were necessary to get the project to compile & run with Emscripten, were essentially… 1, 2, 3:

    // Esmcripten needs 4 byte alignment
    FNameEntry* Allocate( INT Size )
        #if EMSCRIPTEN
           Size = Align( Size, 4 );
    // Script execution: llvm needs aligned data
        #define XFER(T)
            T Temp;
            if (!Ar.IsLoading())
                appMemcpy(&Temp, &Script(iCode), sizeof(T));
            Ar << Temp;
            if (!Ar.IsSaving())
                appMemcpy(&Script(iCode), &Temp, sizeof(T));
            iCode += sizeof(T);
        #define XFER(T) { Ar << *(T*)&Script(iCode); iCode += sizeof(T); }
    // This function needs to synchronously complete IO requests for single-threaded Emscripten IO to work, so added a ServiceRequestsSynchronously() to the end of it which flushes & blocks till the IO request finishes.

    No really, that was about it! Within a day of fiddling with it, we had our game’s Javascript ‘executable’ compiled & running in the browser. Crashing, due to not having a graphics API implememented — but running with log output! Thankfully, we already had Unreal Engine3’s OpenGL ES2 version of the rendering subsystem ready to utilize, so porting the renderer to WebGL only took another day.

    WebGL appeared to essentially have a superset of features compared to OpenGL ES2, so the shaders and methods used matched up by simply changing some API calls. In fact, we were able to do improvements by making use of WebGL’s floating point render targets for certain postprocessing effects, such as edge outlining and dynamic shadows.

    Postprocessing makes everything prettier!

    But how’s it run?

    Now we had something rendering in the browser, and with a quick change to capture input, we could start playing the game and analyzing its performance. What we found was very encouraging: straight ‘out of the box’, in Firefox the asm.js version of the game was getting nearly 33% of the performance of the native executable. And this was comparing a single-threaded web application to the multi-threaded native executable (so really, not a fair comparison! ;). This was about 2x the performance we saw with our quick Flash port (which we still utilize as a fallback for older browsers that don’t yet support asm.js, though we eventually hope to deprecate entirely).

    Its performance in Chrome was less astonishing, more towards 20% of native performance, but still within our target margins: namely, can it run on a 2011-model Macbook Air at 45-60 FPS (with Vsync disabled)? The answer, thankfully, was yes. We hope Google will continue to improve the performance of asm.js on their browser over time. But as it currently stands, we believe unless you’re making the browser version of ‘Crysis’ with this tech (which may not be far off), it seems you have enough performance even in Chrome to do most kinds of web games.

    60 FPS on an old Macbook Air

    Putting the Pieces into Place

    So within a week from starting, we had turned our Unreal Engine 3 PC game into a well-running, graphically-rich web game. But where to take it from here? Well, it’d still need: Audio, Networking, Streaming, and Storage. Let’s discuss the various techniques used for each of these systems.


    This was a no-brainer, as there is only really one robust standardized web audio system apart from Flash: WebAudio. Again, this API matched up pretty well to its mobile cousin, OpenSL, for which we already had an integration. So once we switched out the various calls, we had .

    There was an apparent issue in Mac Chrome where sounds flagged “looping” would sometimes never become destroyed, so we implemented a Chrome-specific hack to manually loop the sound, and filed a bug report with Google. Ah well, one thing we’ve seen with browser API’s is there’s not a 100% guarantee that every browser will implement the functionality to perfect specification, but it gets the job done!


    This proved a little trickier. First, we investigated WebRTC as used in the Bananabread demo, but WebRTC of course is for browser-to-browser communication which is actually not what we were looking to do. Our online game service uses a server-and-client architecture with centralized infrastructure, and so WebSockets is the API to utilize in that case. The tricky part is that we have to handle all the WebSockets incoming and outgoing data in JavaScript buffers, and then pass that along to the “C++” Emscripten-compiled game.

    With some callbacks, this worked out, but we also had to take our UDP game server code and place the WebSockets TCP-style layer onto it — some iteration was necessary to get the packets to be formatted in exactly the way that WebSockets expects, but once we did that, our browser game was communicating with our backend-hosted Linux dedicated game servers with no problems!

    Streaming & Storage

    One advantage to being on the web is easy access to the browser’s asynchronous downloading functionalities to stream-in content. We certainly made use of this with our game, with the initial download clocking in at under 10 MB. Everything else streams in on-demand as you play using standard browser http download requests: Textures, Sound Effects, Music, even Skeletal Meshes and Level Packages. But the bigger question is how to reliably store this content. We don’t want to just rely on the browser cache, since it’s not good for guaranteed immediate gameplay loading as we can’t pre-query whether something exists on disk in the regular browser cache or not.

    For this, we used the IndexedDB API, which lets us asynchronously save and retrieve data objects from a secure abstracted storage location. It works in both Chrome and Firefox, though it’s still finicky as the database can occasionally become corrupted (perhaps if terminated during async writes) and has to be regenerated. In the worst case, this simply results in a re-download of content the user already had received.

    We’re currently looking into this issue, but that aside, IndexedDB certainly works well and has the advantage of providing our application standard file IO functionality, useful to store content that we download. (UPDATE: Firefox Nightly build as of 12/10 seems to automatically reset the IndexedDB storage if this happens and it may not recur.)

    Play it Now, and Embrace the Future!

    While we still have more profiling and tweaking to, as we’re just now starting to use Firefox’s VTune support to symbolically profile the asm.js performance within the browser. Even still, we’re pretty pleased with where the things currently stand. But don’t take our word for it, please try it yourselves right here, no installation or sign-up required:

    Try our demo test anonymously In-browser Here!
    (Please bear with us if our game servers limit access under load, we’re still testing our backend scalability!)

    We at Trendy envision a day when anybody can play any game no matter where they are or what device they happen to have, without friction or gateways or middlemen. With the right combination of these cutting-edge web technologies, that day can be today. We hope other enterprising game developers will join us in reaching players directly through the web, which thanks to Emscripten & asm.js, may well become the most powerful and far-reaching “game console” of all!

  7. Introducing the Whiteboard Drum – WebRTC and Web Audio API magic

    Browser functionality has expanded rapidly, way beyond merely “browsing” a document. Recently, Web browsers finally gained audio processing abilities with the Web Audio API. It is powerful to the point of building serious music applications.

    Not only that, but it is also very interesting when combined with other APIs. One of these APIs is getUserMedia(), which allows us to capture audio and/or video from the local PC’s microphone / camera devices. Whiteboard Drum (code on GitHub) is a music application, and a great example of what can be achieved using Web Audio API and getUserMedia().

    I demonstrated Whiteboard Drum at the Web Music Hackathon Tokyo in October. It was a very exciting event on the subject of Web Audio API and Web MIDI API. Many instruments can collaborate with the browser, and it can also create new interfaces to the real world.

    I believe this suggests further possibilities of Web-based music applications, especially using Web Audio API in conjunction with other APIs. Let’s explain how the key features of Whiteboard Drum work, showing relevant code fragments as we go.


    First of all, let me show you a picture from the Hackathon:

    And a easy movie demo:

    As you can see, Whiteboard Drum plays a rhythm according to the matrix pattern on the whiteboard. The whiteboard has no magic; it just needs to be pointed at by a WebCam. Though I used magnets in the demo, you can draw the markers using pen if you wish. Each row represents the corresponding instruments of Cymbal, Hi-Hat, Snare-Drum and Bass-Drum, and each column represents timing steps. In this implementation, the sequence has 8 steps. The color blue will activate the grid normally, and the red will activate with accent.

    The processing flow is:

    1. The whiteboard image is captured by the WebCam
    2. The matrix pattern is analysed
    3. This pattern is fed to the drum sound generators to create the corresponding sound patterns

    Although it uses nascent browser technologies, each process itself is not so complicated. Some key points are described below.

    Image capture by getUserMedia()

    getUserMedia() is a function for capturing video/audio from webcam/microphone devices. It is a part of WebRTC and a fairly new feature in web browsers. Note that the user’s permission is required to get the image from the WebCam. If we were just displaying the WebCam image on the screen, it would be trivially easy. However, we want to access the image’s raw pixel data in JavaScript for more processing, so we need to use canvas and the createImageData() function.

    Because pixel-by-pixel processing is needed later in this application, the captured image’s resolution is reduced to 400 x 200px; that means one rhythm grid is 50 x 50 px in the rhythm pattern matrix.

    Note: Though most recent laptops/notebooks have embedded WebCams, you will get the best results on Whiteboard Drum from an external camera, because the camera needs to be precisely aimed at the picture on the whiteboard. Also, the selection of input from multiple available devices/cameras is not standardized currently, and cannot be controlled in JavaScript. In Firefox, it is selectable in the permission dialog when connecting, and it can be set up from “contents setup” option of the setup screen in Google Chrome.

    Get the WebCam video

    We don’t want to show these parts of the processing on the screen, so first we hide the video:

    <video id="video" style="display:none"></video>

    Now to grab the video:

    video = document.getElementById("video");
        function(stream) {
            video.src= window.URL.createObjectURL(stream);
        function(err) {
            alert("Camera Error");

    Capture it and get pixel values

    We also hide the canvas:

    <canvas id="capture" width=400 height=200 style="display:none"></canvas>

    Then capture our video data on the canvas:

    function Capture() {

    The video from the WebCam will be drawn onto the canvas at periodic intervals.

    Image analyzing

    Next, we need to get the 400 x 200 pixel values with getImageData(). The analyzing phase analyses the 400 x 200 image data in an 8 x 4 matrix rhythm pattern, where a single matrix grid is 50 x 50 px. All necessary input data is stored in the array in RGBA format, 4 elements per pixel.

    var pixarray =;
    var step;
    for(var x = 0; x < 8; ++x) {
        var px = x * 50;
        for(var y = 0; y < 4; ++y) {
            var py = y * 50;
            var lum = 0;
            var red = 0;
            for(var dx = 0; dx < 50; ++dx) {
                for(var dy = 0; dy < 50; ++dy) {
                    var offset = ((py + dy) * 400 + px + dx)*4;
                    lum += pixarray[offset] * 3 + pixarray[offset+1] * 6 + pixarray[offset+2];
                    red += (pixarray[offset]-pixarray[offset+2]);
            if(lum < lumthresh) {
                if(red > redthresh)

    This is honest pixel-by-pixel analysis of grid-by-grid loops. In this implementation, the analysis is done for the luminance and redness. If the grid is “dark”, the grid is activated; if it is red, it should be accented.

    The luminance calculation uses a simplified matrix — R * 3 + G * 6 + B — that will get ten times the value – meaning – which means getting the value of range 0 to 2550 for each pixel. And the redness R – B is a experimental value because all that is required is a decision of Red or Blue. The result is stored in the rhythmpat array, with a value of 0 for nothing, 1 for blue or 2 for red.

    Sound generation through the Web Audio API

    Because the Web Audio API is a very cutting edge technology, it is not yet supported by every web browser. Currently, Google Chrome/Safari/Webkit-based Opera and Firefox (25 or later) support this API. Note: Firefox 25 is the latest version released at the end of October.

    For other web browsers, I have developed a polyfill that falls back to Flash: WAAPISim, available on GitHub. It provides almost all functions of the Web Audio API to unsupported browsers, for example Internet Explorer.

    Web Audio API is a large scale specification, but in our case, the sound generation part requires just a very simple use of the Web Audio API: load one sound for each instrument and trigger them at the right times. First we create an audio context, taking care of vendor prefixes in the process. Prefixes currently used are webkit or no prefix.

    audioctx = new (window.AudioContext||window.webkitAudioContext)();

    Next we load sounds to buffers via XMLHttpRequest. In this case, different sounds for each instrument (bd.wav / sd.wav / hh.wav / cy.wav) are loaded into the buffers array:

    var buffers = [];
    var req = new XMLHttpRequest();
    var loadidx = 0;
    var files = [
    function LoadBuffers() {"GET", files[loadidx], true);
        req.responseType = "arraybuffer";
        req.onload = function() {
            if(req.response) {
                    if(++loadidx < files.length)

    The Web Audio API generates sounds by routing graphs of nodes. Whiteboard Drum uses a simple graph that is accessed via AudioBufferSourceNode and GainNode. AudioBufferSourceNode play back the AudioBuffer and route to destination(output) directly (for normal *blue* sound), or route to destination via the GainNode (for accented *red* sound). Because the AudioBufferSourceNode can be used just once, it will be newly created for each trigger.

    Preparing the GainNode as the output point for accented sounds is done like this.


    And the trigger function looks like so:

    function Trigger(instrument,accent,when) {
        var src=audioctx.createBufferSource();

    All that is left to discuss is the accuracy of the playback timing, according to the rhythm pattern. Though it would be simple to keep creating the triggers with a setInterval() timer, it is not recommended. The timing can be easily messed up by any CPU load.

    To get accurate timing, using the time management system embedded in the Web Audio API is recommended. It calculates the when argument of the Trigger() function above.

    // console.log(nexttick-audioctx.currentTime);
    while(nexttick - audioctx.currentTime < 0.3) {
        var p = rhythmpat[step];
        for(var i = 0; i < 4; ++i)
            Trigger(i, p[i], nexttick);
        if(++step >= 8)
            step = 0;
        nexttick += deltatick;

    In Whiteboard Drum, this code controls the core of the functionality. nexttick contains the accurate time (in seconds) of the next step, while audioctx.currentTime is the accurate current time (again, in seconds). Thus, this routine is getting triggered every 300ms – meaning look ahead to 300ms in the future (triggered in advance while nextticktime – currenttime < 0.3). The commented console.log will print the timing margin. While this routine is called periodically, the timing is collapsed if this value is negative.

    For more detail, here is a helpful document: A Tale of Two Clocks – Scheduling Web Audio with Precision

    About the UI

    Especially in music production software like DAW or VST plugins, the UI is important. Web applications do not have to emulate this exactly, but something similar would be a good idea. Fortunately, the very handy WebComponent library webaudio-controls is available, allowing us to define knobs or sliders with just a single HTML tag.

    NOTE: webaudio-controls uses Polymer.js, which sometimes has stability issues, causing unexpected behavior once in a while, especially when combining it with complex APIs.

    Future work

    This is already an interesting application, but it can be improved further. Obviously the camera position adjustment is an issue. Analysis can be more smarter if it has an auto adjustment of the position (using some kind of marker?), and adaptive color detection. Sound generation could also be improved, with more instruments, more steps and more sound effects.

    How about a challenge?

    Whiteboard Drum is available at, and the code is on GitHub.

    Have a play with it and see what rhythms you can create!

  8. Make your Firefox OS app feel alive with video and audio

    Firefox OS applications aren’t just about text: there is no better way to make your app feel alive than adding some videos or audio to it. Let’s explore different ways we can use as developers to enhance our mobile masterpiece.

    Audio and video HTML tags

    Since we are talking about HTML, it makes total sense to think about using the <audio>, and <video> tag to play those media in your Firefox OS app. If you want to add a video in your application, just use this code.

    <video src="" controls>
      Your browser does not support the video element.

    In this code example, the user will see a video player with controls, and will have the opportunity to start the video. If your application is running in a browser not supporting the video tag, the user will see the text between the tag. It’s still a good practice to do so, even if your primary target is a Firefox OS app, because since it uses HTML5, someone may access it from another browser if it’s a hosted app. Note that you can use other attributes for this element.

    As for the audio tag, it’s basically the same.

    <audio id="demo" src="/music/audio.mp3" autoplay loop></audio>

    In this example, the audio will start automatically, and will play the audio file, in a loop, from the relative path: it’s perfect for background music if you are building a game. Note that you can add other attributes to this element too.

    Of course, using those elements without JavaScript give you basic features, but no worries, you can programmatically control them with code. Once you have your HTML element, like the audio example you just saw, you can use JavaScript to play, pause, change the volume, and more.

    document.querySelector("#demo").play(); //Play the Audio
    document.querySelector("#demo").pause(); //Pause the Audio
    document.querySelector("#demo").volume+=0.1; //Increase Volume
    document.querySelector("#demo").volume-=0.1; //Decrease Volume

    You can read more on what you can do with those two elements in the Mozilla Developer Network documentation. You also want to give a closer look to the supported format list.

    Use audio while the screen is locked

    Maybe you are building a podcast app, or at least you need to be able to play audio while the screen is locked? There is a way to do it by using the audio tag. You simply need to add the mozaudiochannel attribute with the value of content to your actual tag.

    <audio mozaudiochannel="content" preload="none"

    Actually, it’s not quite true as this code won’t work as is. You also need to add a permission to the manifest file.

    "permissions": {
        "description":"Use the audio channel for the music player"

    Having the manifest line above will authorize your application to use the audio channel to play music, even when the screen is locked. Having said that, you probably realize that this code is specific to Firefox OS for now. I intentionally put the end of the last sentence in bold as it’s one thing you need to understand about Firefox OS: we had to create some APIs, features or elements to give the power HTML deserve for developers, but we are working with the W3C to make those standards. In the case that the standards won’t be the same as what we created, we’ll change it to reflect it.

    Firefox OS Web activities

    Finally, something very handy for Firefox OS developers: the Web Activities. They define a way for applications to delegate an activity to another (usually user-chosen) application. They aren’t standardized, at the time of writing. In the case that will be interesting to us, we’ll use the Web Activity call open, to open music or video files. Note that for video, you can also use the view activity that basically does the same. Let’s say I want to open a remote video when someone clicks on a button with the id open-video: I’ll use the following code in my JavaScript to make it happen.

    var openVideo = document.querySelector("#open-video");
    if (openVideo) {
        openVideo.onclick = function () {
            var openingVideo = new MozActivity({
                name: "open",
                data: {
                    type: [
                    url: ""

    In that situation, the video player of Firefox OS will open, and play the video: it’s that easy!

    In the end…

    You may or may not need to use those tricks in your app, but adding videos or audio can enhance the quality of your application, and make it feel alive. At the end, you have to give a strong experience to your users, and it’s what will make the difference between a good and a great app!

  9. Songs of Diridum: Pushing the Web Audio API to Its Limits

    When we at Goo Technologies heard that the Web Audio API would be supported in an upcoming version of Mozilla Firefox, we immediately started brainstorming about what we could build with that.

    We started discussing the project with the game developers behind “Legend of Diridum” (see below) and came up with the idea of a small market place and a small jazz band on a stage. The feeling we wanted to capture was that of a market place coming to life. The band is playing a song to warm up and the crowd has not gathered around yet. The evening is warm and the party is about to start.

    We call the resulting demo “Songs of Diridum”.

    What the Web Audio API can do for you

    We will take a brief look at the web audio system from three perspectives. These are game design, audio engineering and programming.

    From a game designer perspective we can use the functionality of the Web Audio API to tune the soundscape of our game. We can run a whole lot of separate sounds simultaneously while also adjusting their character to fit an environment or a game mechanic. You can have muffled sounds coming through a closed door and open the filters for these sounds to unmuffle them gradually as the door opens. In real time. We can add reflecting sounds of the environment to the footsteps of my character as we walk from a sidewalk into a church. The ambient sounds of the street will be echoing around in the church together with my footsteps. We can attach the sounds of a roaring fire to my magicians fireball, hurl it away and hear the fireball moving towards its target. We can hear the siren of a police car approaching and hear how it passes by from the pitch shift known as doppler effect. And we know we can use these features without needing to manage the production of an audio engine. Its already there and it works.

    From an audio engineering perspective we view the Web Audio API as a big patch bay with a slew of outboard gear tape stations and mixers. On a low level we feel reasonably comfortable with the fundamental aspects of the system. We can work comfortably with changing the volume of a sound while it is playing without running the risk of inducing digital distortion from the volume level changing from one sample to another. The system will make the interpolation needed for this type of adjustment. We can also build the type of effects we want and hook them up however we want. As long as we keep my system reasonably small we can make a nice studio with the Web Audio API.

    From a programmer perspective we can write the code needed for our project with ease. If we run into a problem we will usually find a good solution to it on the web. We don’t have to spend our time with learning how to work with some poorly documented proprietary audio engine. The type of problem we will be working with the most is probably related to how the code is structured. We will be figuring out how to handle the loading of the sounds and which sounds to load when. How to provide these sounds to the game designer through some suitable data structure or other design pipelines. We will also work with the team to figure out how to handle the budgeting of the sound system. How much data can we use? How many sounds can we play at the same time? How many effects can we use on the target platform? It is likely that the hardest problem, the biggest technical risk, is related to handling the diversity of hardware and browsers running on the web.

    About Legend of Diridum

    This sound demo called “Songs of Diridum” is actually a special demo based on graphics and setting from the upcoming game “LOD: Legend of Diridum”. The LOD team is led by the game design veteran Michael Stenmark.

    LOD is an easy to learn, user-friendly fantasy role playing game set on top of a so called sandbox world. It is a mix of Japanese fantasy and roleplaying games drawing inspiration from Grandia, Final Fantasy, Zelda and games like Animal Crossing and Minecraft.

    The game is set in a huge fantasy world, in the aftermath of a terrible magic war. The world is haunted by the monsters, ghosts and undead that was part of the warlocks armies and the player starts the game as the Empire’s official ghost hunter to cleanse the lands of the evil and keep the people of Diridum safe. LOD is built in the Goo Engine and can be played in almost any web browser without the need download anything.

    About the music

    The song was important to set the mood and to embrace the warm feeling of a hot summer night turning into a party. Adam Hagstrand, the composer, nailed it right away. We love the way he got it very laid back, jazzy. Just that kind of tune a band would warm up with before the crowd arrives.

    Quickly building a 3D game with Goo Engine

    We love the web, and we love HTML5. HTML5 runs in the browser on multiple devices and does not need any special client software to be downloaded and installed. This allows for games to be published on nearly every conceivable web site, and since it runs directly in the browser, it opens up unprecedented viral opportunities and social media integration.

    We wanted to build Songs of Diridum as a HTML5 browser game, but how to do that in 3D? The answer was WebGL. WebGL is a new standard in HTML5 that allows games to gain access to hardware acceleration, just like native games. The introduction of WebGL in HTML5 is a massive leap in what the browser can deliver and it allows for web games to be built with previously unseen graphics quality. WebGL powered HTML5 does not require that much bandwidth during gameplay. Since the game assets are downloaded (pre-cached) before and during gameplay, even modest speed internet connections suffice.

    But building a WebGL game from scratch is a massive undertaking. The Goo Platform from Goo Technologies is the solution for making it much easier to build WebGL games and applications. In November, Goo Create is released making it even more accessible and easy to use.

    Goo is a HTML5 and WebGL based graphics development platform capable of powering the next generation of web games and apps. From the ground up, it’s been built for amazing graphics smoothness and performance while at the same time making things easy for graphics creators. Since it’s HTML5, it enables creators to build and distribute advanced interactive graphics on the web without the need for special browser plugins or software downloads. With Goo you have the power to publish hardware accelerated games & apps on desktops, laptops, smart TVs, tablets or mobile devices. It gives instant access to smooth rich graphics and previously unimagined browser game play.

    UPDATE: Goo Technologies has just launched their interactive 3D editor, Goo Create, which radically simplifies the process of creating interactive WebGL graphics.

    Building Songs of Diridum

    We built this demo project with a rather small team working for a relatively short time. In total we have had about seven or so people involved with the production. Most team members have done sporadic updates to our data and code. Roughly speaking we have not been following any conventional development process but rather tried to get as good a result as we can without going into any bigger scale production.

    The programming of the demo has two distinct sides. One for building the world and one for wiring up the sound system. Since the user interface primarily is used for controlling the state of the sound system we let the sound system drive the user interface. It’s a simple approach but we also have a relatively simple problem to solve. Building a small 3D world like this one is mostly a matter of loading model data to various positions in 3D space. All the low level programming needed to get the scene to render with the proper colors and shapes is handled by the Goo Engine so we have not had to write any code for those parts.

    We defined a simple data format for adding model data to the scene, we also included the ability to add sounds to the world and some slightly more complex systems such as the animated models and the bubbly water sound effect.

    The little mixer panel in which you can play around with to change the mix of the jazz band is dynamically generated by the sound system:

    Since we expected this app to be used on touch screen devices we also decided to only use clickable buttons for interface. We would not have enough time to test the usability of any other type of controls when aiming at a diffuse collection of mobile devices.

    Using Web Audio in a 3D world

    To build the soundscape of a 3D world we have access to spatialization of sound sources and the ears of the player. The spatial aspects of sounds and listeners are boiled down to position, direction and velocity. Each sound can also be made to emit its sound in a directional cone, in short this emulates the difference between the front and back of loudspeaker. A piano would not really need any directionality as it sounds quite similar in all directions. A megaphone on the other hand is comparably directional and should be louder at the front than at the back.

    if (is3dSource) {
        // 3D sound source
        this.panNode = this.context.createPanner();
        this.lastPos = [0,0,0];
        this.panNode.rolloffFactor = 0.5;
    } else {
        // Stereo sound source “hack”
        this.panNode = mixNodeFactory.buildStereoChannelSplitter(this.gainNode, context);
        this.panNode.setPosition(0, this.context.currentTime);

    The position of the sound is used to determine panning or which speaker the sound is louder in and how loud the sound should be.

    The velocity of the sound and the listener together with their positions provide the information needed to doppler shift all sound sources in the world accordingly. For other worldly effects such as muffling sounds behind a door or changing the sound of the room with reverbs we’ll have to write some code and configure processing nodes to meet with our desired results.

    For adding sounds to the user interface and such direct effects, we can hook the sounds up without going through the spatialization, which makes them a bit simpler. You can still process the effects and be creative if you like. Perhaps pan the sound source based on where the mouse pointer clicked.

    Initializing Web Audio

    Setting up for using Web Audio is quite simple. The tricky parts are related to figuring out how much sound you need to preload and how much you feel comfortable with loading later. You also need to take into account that loading a sound contains two asynchronous and potentially slow operations. The first is the download, the second is the decompression from some small format such as OGG or MP3 to arraybuffer. When developing against a local machine you’ll find that the decompression is a lot slower than the download and as with download speeds in general we can expect to not know how much time this will require for any given user.

    Playing sound streams

    Once you have a sound decompressed and ready it can be used to create a sound source. A sound source is a relatively low level object which streams its sound data at some speed to its selected target node. For the simplest system this target node is the speaker output. Even with this primitive system, you can already manipulate the playback rate of the sound, this changes its pitch and duration. There is a nice feature in the Web Audio API which allows you to adjust the behaviour of interpolating a change like this to fit your desires.

    Adding sound effects: reverb, delay, and distortion

    To add an effect to our simple system you put a processor node between the source and the speakers. us audio engineers wants to split the source to have a “dry” and a “wet” component at this point. The dry component is the non-processed sound and the wet is processed. Then you’ll send both the wet and the dry to the speakers and adjust the mix between them by adding a gain node on each of these tracks. Gain is an engineery way of saying “volume”. You can keep on like this and add nodes between the source and the speakers as you please. Sometimes you’ll want effects in parallel and sometimes you want them serially. When coding your system its probably a good idea to make it easy to change how this wiring is hooked up for any given node.


    We’re quite happy with how “Songs of Diridum” turned out, and we’re impressed with the audio and 3D performance available in today’s web browsers. Hopefully the mobile devices will catch up soon performance-wise, and HTML5 will become the most versatile cross-platform game environment available.
    Now go play “Songs of Diridum”!

  10. Building a Firefox OS App for my favorite Internet radio station

    I recently created a Firefox OS app for my favourite radio station — radio paradise. It was a lot of fun making this app, so I thought it would be good to share some notes about how I built it.

    The audio tag

    It started by implementing the main functionality of the app, playing an ogg stream I got from the Internet radio station, using the HTML5 audio element

    <audio src="" controls preload></audio>

    That was easy! At this point our app is completely functional. If you don’t believe me, checkout this jsfiddle. But please continue reading, since there will be a few more sweet features added. In fact, checkout the short video below to see how it will turn out.

    Because this content belongs to radio paradise, before implementing the app, I contacted them to ask for their permission to make a Firefox OS app for their radio station; they responded:

    Thanks. We’d be happy to have you do that. Our existing web player is html5-based. That might be a place to start. Firefox should have native support for our Ogg Vorbis streams.

    I couldn’t have asked for a more encouraging response, and that was enough to set things in motion.

    Features of the app

    I wanted the app to be very minimal and simple — both in terms of user experience and the code backing it. Here is a list of the features I decided to include:

    • A single, easy to access, button to play and pause the music
    • Artist name, song title and album cover for the current song playing should fill up the interface
    • Setting option to select song quality (for situation when bandwidth is not enough to handle highest quality)
    • Setting option to start app with music playing or paused
    • Continue playing even when the app is sent to the background
    • Keep the screen on when the app is running in the forground

    Instead of using the HTML tag, I decided to create the audio element and configure it in JavaScript. Then I hooked up an event listener for a button to play or stop music.

      var btn = document.getElementById('play-btn');
      var state = 'stop';
      btn.addEventListener('click', stop_play);
      // create an audio element that can be played in the background
      var audio = new Audio();
      audio.preload = 'auto';
      audio.mozAudioChannelType = 'content';
      function play() {;
        state = 'playing';
      function stop() {
        state = 'stop';
      // toggle between play and stop state
      function stop_play() {
        (state == 'stop') ? play() : stop();

    Accessing current song information

    The first challenge I faced was accessing the current song information. Normally we should not need any special privilege to access third party API’s as long as they provide correct header information. However, the link radio paradise provided me for getting the current song information did not allow for cross origin access. Luckily FirefoxOS has a special power reserved for this kind of situation — systemXHR comes to the rescue.

    function get_current_songinfo() {
      var cache_killer = Math.floor(Math.random() * 10000);
      var playlist_url =
        '' +
      var song_info = document.getElementById('song-info-holder');
      var crossxhr = new XMLHttpRequest({mozSystem: true});
      crossxhr.onload = function() {
        var infoArray = crossxhr.responseText.split('|');
        song_info.innerHTML = infoArray[1];
        next_song = setInterval(get_current_songinfo, infoArray[0]);
      crossxhr.onerror = function() {
        console.log('Error getting current song info', crossxhr);
        nex_song = setInterval(get_current_singinfo, 200000);
      };'GET', playlist_url);

    This meant that the app would have to be privileged and thus packaged. I normally would try to keep my apps hosted, because that is very natural for a web app and has several benefits including the added bonus of being accessible to search engines. However, in cases such as this we have no other option but to package the app and give it the special privileges it needs.

      "version": "1.1",
      "name": "Radio Paradise",
      "launch_path": "/index.html",
      "description": "An unofficial app for radio paradise",
      "type": "privileged",
      "icons": {
        "32": "/img/rp_logo32.png",
        "60": "/img/rp_logo60.png",
        "64": "/img/rp_logo64.png",
        "128": "/img/rp_logo128.png"
      "developer": {
        "name": "Aras Balali Moghaddam",
        "url": ""
      "permissions": {
        "systemXHR": {
          "description" : "Access current song info on"
        "audio-channel-content": {
          "description" : "Play music when app goes into background"
      "installs_allowed_from": ["*"],
      "default_locale": "en"

    Updating song info and album cover

    That XHR call to radio paradise proides me with three important pieces of information:

    • Name of the current song playing and it’s artist
    • An image tag containing the album cover
    • Time left to the end of current song in miliseconds

    Time left to the end of current song is very nice to have. It means that I can execute the XHR call and update the song information only once for every song. I first tried using the setTimeout function like this:

    //NOT working example. Can you spot the error?
    crossxhr.onload = function() {
      var infoArray = crossxhr.responseText.split('|');
      song_info.innerHTML = infoArray[1];
      setTimeout('get_current_songinfo()', infoArray[0]);

    To my surprise, that did not work, and I got a nice error in logcat about a CSP restriction. It turns out that any attempt at dynamically executing code is banned for security reasons. All we have to do in this scenario to avoid the CSP issue is to pass a callable object, instead of a string.

      // instead of passing a string to setTimout we pass
      // a callable object to it
      setTimeout(get_current_songinfo, infoArray[0]);

    Update: Mindaugas pointed out in the comments below that using innerHTML to parse unknown content in this way, introduces some security risks. Because of these security implications, we should retrieve the remote content as text instead of HTML. One way to do this is to use song_info.textContent which does not interpret the passed content as HTML. Another option, as Frederik Braun pointed out is to use a text node which can not render HTML.

    radio paradise mobile web app running on FirefoxOS

    With a bit of CSS magic, things started to fall into place pretty quickly

    Adding a unique touch

    One of the great advantages of developing mobile applications for the web is that you are completely free to design your app in any way you want. There is no enforcement of style or restriction on interaction design innovation. Knowing that, it was hard to hold myself back from trying to explore new ideas and have some fun with the app. I decided to hide the settings behind the main content and then add a feature so user can literally cut open the app in the middle to get to setting. That way they are tucked away, but still can be discovered in an intuitive way. For UI elements in the setting page to toggle options I decided to give Brick a try., with a bit of custom styling added.

    radio paradise app settings

    User can slide open the cover image to access app settings behind it

    Using the swipe gesture

    As you saw in the video above, to open and close the cover image I use pan and swipe gestures. To implement that, I took gesture detector from Gaia. It was very easy to integrated the gesture code as a module into my app and hook it up to the cover image.

    Organizing the code

    For an app this small, we do not have to use modular code. However, since I have recently started to learn about AMD practices, I decided to use a module system. I asked James Burke about implications of using requirejs in an app like this. He suggested I use Alameda instead, since it is geared toward modern browsers.

    Saving app settings

    I wanted to let users choose stream quality as well as whether they want the app to start playing music as soon as it opens. Both of these options need to be persisted somewhere and retrieved when the app starts. I just needed to save a couple of key/value pairs. I went to #openwebapps irc channel and asked for advice. Fabrice pointed me to a nice piece of code in Gaia (again!) that is used for asynchronous storing of key/value pairs and even whole objects. That was perfect for my use case, so I took it as well. Gaia appears to be a goldmine. Here is the module I created for settings.

    define(['helper/async_storage'], function(asyncStorage) {
      var setting = {
        values: {
          quality: 'high',
          play_on_start: false
        get_quality: function() {
          return setting.values.quality;
        set_quality: function(q) {
          setting.values.quality = q;
        get_play_on_start: function() {
          return setting.values.play_on_start;
        set_play_on_start: function(p) {
          setting.values.play_on_start = p;
        save: function() {
          asyncStorage.setItem('setting', setting.values);
        load: function(callback) {
          asyncStorage.getItem('setting', function(values_obj) {
            if (values_obj) setting.values = values_obj;
      return setting;

    Splitting the cover image

    Now we get to the really fun part that is splitting the cover image in half. To achieve this effect, I made two identical overlapping canvas element both of which are sized to fit the device width. One canvas clips the image and keeps the left portion of it while the other keeps the right side.

    Each canvas clips and renders half of the image

    Each canvas clips and renders half of the image

    Here is the code for draw function where most of the action is happening. Note that this function runs only once for each song, or when user changes the orientation of the device from portrait to landscape and vice versa.

    function draw(img_src) {
      width = cover.clientWidth;
      height = cover.clientHeight;
      draw_half(left_canvas, 'left');
      draw_half(right_canvas, 'right');
      function draw_half(canvas, side) {
        canvas.setAttribute('width', width);
        canvas.setAttribute('height', height);
        var ctx = canvas.getContext('2d');
        var img = new Image();
        var clip_img = new Image();
        // opacity 0.01 is used to make any glitch in clip invisible
        ctx.fillStyle = 'rgba(255,255,255,0.01)';
        if (side == 'left') {
          ctx.moveTo(0, 0);
          // add one pixel to ensure there is no gap
          var center = (width / 2) + 1;
        } else {
          ctx.moveTo(width, 0);
          var center = (width / 2) - 1;
        ctx.lineTo(width / 2, 0);
        // Draw a wavy pattern down the center
        var step = 40;
        var count = parseInt(height / step);
        for (var i = 0; i < count; i++) {
          ctx.lineTo(center, i * step);
          // alternate curve control point 20 pixels, every other time
          ctx.quadraticCurveTo((i % 2) ? center - 20 :
            center + 20, i * step + step * 0.5, center, (i + 1) * step);
        ctx.lineTo(center, height);
        if (side == 'left') {
          ctx.lineTo(0, height);
          ctx.lineTo(0, 0);
        } else {
          ctx.lineTo(width, height);
          ctx.lineTo(width, 0);
        img.onload = function() {
          var h = width * img.height / img.width;
          ctx.drawImage(img, 0, 0, width, h);
        img.src = img_src;

    Keeping the screen on

    The last feature I needed to add was keeping the screen on when the app is running in foreground and that turned out to be very easy to implement as well. We need to request a screen wake lock

      var lock = window.navigator.requestWakeLock(resourceName);

    The screen wake lock is actually pretty smart. It will be automatically released when app is sent to the background, and then will given back to your app when it comes to the foreground. Currently in this app I have not provided an option to release the lock. If in future I get requests to add that option, all I have to do is release the lock that has been obtained before setting the option to false


    Getting the app

    If you have a FirefoxOS device and like great music, you can now install this app on your device. Search for “radio paradise” in the marketplace, or install it directly from this link. You can also checkout the full source code from github. Feel free to fork and modify the app as you wish, to create your own Internet Radio apps! I would love it if you report issues, ask for features or send pull requests.


    I am more and more impressed by how quickly we can create very functional and unique mobile apps using web technologies. If you have not build a mobile web app for Firefox OS yet, you should definitely give it a try. The future of open web apps is very exciting, and Firefox OS provides a great platform to get a taste of that excitement.

    Now it is your turn to leave a comment. What is your favourite feature of this app? What things would you have done differently if you developed this app? How could we make this app better (both code and UX)?