Mozilla

Audio Articles

Sort by:

View:

  1. Songs of Diridum: Pushing the Web Audio API to Its Limits

    When we at Goo Technologies heard that the Web Audio API would be supported in an upcoming version of Mozilla Firefox, we immediately started brainstorming about what we could build with that.

    We started discussing the project with the game developers behind “Legend of Diridum” (see below) and came up with the idea of a small market place and a small jazz band on a stage. The feeling we wanted to capture was that of a market place coming to life. The band is playing a song to warm up and the crowd has not gathered around yet. The evening is warm and the party is about to start.

    We call the resulting demo “Songs of Diridum”.

    What the Web Audio API can do for you

    We will take a brief look at the web audio system from three perspectives. These are game design, audio engineering and programming.

    From a game designer perspective we can use the functionality of the Web Audio API to tune the soundscape of our game. We can run a whole lot of separate sounds simultaneously while also adjusting their character to fit an environment or a game mechanic. You can have muffled sounds coming through a closed door and open the filters for these sounds to unmuffle them gradually as the door opens. In real time. We can add reflecting sounds of the environment to the footsteps of my character as we walk from a sidewalk into a church. The ambient sounds of the street will be echoing around in the church together with my footsteps. We can attach the sounds of a roaring fire to my magicians fireball, hurl it away and hear the fireball moving towards its target. We can hear the siren of a police car approaching and hear how it passes by from the pitch shift known as doppler effect. And we know we can use these features without needing to manage the production of an audio engine. Its already there and it works.

    From an audio engineering perspective we view the Web Audio API as a big patch bay with a slew of outboard gear tape stations and mixers. On a low level we feel reasonably comfortable with the fundamental aspects of the system. We can work comfortably with changing the volume of a sound while it is playing without running the risk of inducing digital distortion from the volume level changing from one sample to another. The system will make the interpolation needed for this type of adjustment. We can also build the type of effects we want and hook them up however we want. As long as we keep my system reasonably small we can make a nice studio with the Web Audio API.

    From a programmer perspective we can write the code needed for our project with ease. If we run into a problem we will usually find a good solution to it on the web. We don’t have to spend our time with learning how to work with some poorly documented proprietary audio engine. The type of problem we will be working with the most is probably related to how the code is structured. We will be figuring out how to handle the loading of the sounds and which sounds to load when. How to provide these sounds to the game designer through some suitable data structure or other design pipelines. We will also work with the team to figure out how to handle the budgeting of the sound system. How much data can we use? How many sounds can we play at the same time? How many effects can we use on the target platform? It is likely that the hardest problem, the biggest technical risk, is related to handling the diversity of hardware and browsers running on the web.

    About Legend of Diridum

    This sound demo called “Songs of Diridum” is actually a special demo based on graphics and setting from the upcoming game “LOD: Legend of Diridum”. The LOD team is led by the game design veteran Michael Stenmark.

    LOD is an easy to learn, user-friendly fantasy role playing game set on top of a so called sandbox world. It is a mix of Japanese fantasy and roleplaying games drawing inspiration from Grandia, Final Fantasy, Zelda and games like Animal Crossing and Minecraft.

    The game is set in a huge fantasy world, in the aftermath of a terrible magic war. The world is haunted by the monsters, ghosts and undead that was part of the warlocks armies and the player starts the game as the Empire’s official ghost hunter to cleanse the lands of the evil and keep the people of Diridum safe. LOD is built in the Goo Engine and can be played in almost any web browser without the need download anything.

    About the music

    The song was important to set the mood and to embrace the warm feeling of a hot summer night turning into a party. Adam Hagstrand, the composer, nailed it right away. We love the way he got it very laid back, jazzy. Just that kind of tune a band would warm up with before the crowd arrives.

    Quickly building a 3D game with Goo Engine

    We love the web, and we love HTML5. HTML5 runs in the browser on multiple devices and does not need any special client software to be downloaded and installed. This allows for games to be published on nearly every conceivable web site, and since it runs directly in the browser, it opens up unprecedented viral opportunities and social media integration.

    We wanted to build Songs of Diridum as a HTML5 browser game, but how to do that in 3D? The answer was WebGL. WebGL is a new standard in HTML5 that allows games to gain access to hardware acceleration, just like native games. The introduction of WebGL in HTML5 is a massive leap in what the browser can deliver and it allows for web games to be built with previously unseen graphics quality. WebGL powered HTML5 does not require that much bandwidth during gameplay. Since the game assets are downloaded (pre-cached) before and during gameplay, even modest speed internet connections suffice.

    But building a WebGL game from scratch is a massive undertaking. The Goo Platform from Goo Technologies is the solution for making it much easier to build WebGL games and applications. In November, Goo Create is released making it even more accessible and easy to use.

    Goo is a HTML5 and WebGL based graphics development platform capable of powering the next generation of web games and apps. From the ground up, it’s been built for amazing graphics smoothness and performance while at the same time making things easy for graphics creators. Since it’s HTML5, it enables creators to build and distribute advanced interactive graphics on the web without the need for special browser plugins or software downloads. With Goo you have the power to publish hardware accelerated games & apps on desktops, laptops, smart TVs, tablets or mobile devices. It gives instant access to smooth rich graphics and previously unimagined browser game play.

    UPDATE: Goo Technologies has just launched their interactive 3D editor, Goo Create, which radically simplifies the process of creating interactive WebGL graphics.

    Building Songs of Diridum

    We built this demo project with a rather small team working for a relatively short time. In total we have had about seven or so people involved with the production. Most team members have done sporadic updates to our data and code. Roughly speaking we have not been following any conventional development process but rather tried to get as good a result as we can without going into any bigger scale production.

    The programming of the demo has two distinct sides. One for building the world and one for wiring up the sound system. Since the user interface primarily is used for controlling the state of the sound system we let the sound system drive the user interface. It’s a simple approach but we also have a relatively simple problem to solve. Building a small 3D world like this one is mostly a matter of loading model data to various positions in 3D space. All the low level programming needed to get the scene to render with the proper colors and shapes is handled by the Goo Engine so we have not had to write any code for those parts.

    We defined a simple data format for adding model data to the scene, we also included the ability to add sounds to the world and some slightly more complex systems such as the animated models and the bubbly water sound effect.

    The little mixer panel in which you can play around with to change the mix of the jazz band is dynamically generated by the sound system:

    Since we expected this app to be used on touch screen devices we also decided to only use clickable buttons for interface. We would not have enough time to test the usability of any other type of controls when aiming at a diffuse collection of mobile devices.

    Using Web Audio in a 3D world

    To build the soundscape of a 3D world we have access to spatialization of sound sources and the ears of the player. The spatial aspects of sounds and listeners are boiled down to position, direction and velocity. Each sound can also be made to emit its sound in a directional cone, in short this emulates the difference between the front and back of loudspeaker. A piano would not really need any directionality as it sounds quite similar in all directions. A megaphone on the other hand is comparably directional and should be louder at the front than at the back.

    if (is3dSource) {
        // 3D sound source
        this.panNode = this.context.createPanner();
        this.gainNode.connect(this.panNode);
        this.lastPos = [0,0,0];
        this.panNode.rolloffFactor = 0.5;
    } else {
        // Stereo sound source “hack”
        this.panNode = mixNodeFactory.buildStereoChannelSplitter(this.gainNode, context);
        this.panNode.setPosition(0, this.context.currentTime);
    }
    

    The position of the sound is used to determine panning or which speaker the sound is louder in and how loud the sound should be.

    The velocity of the sound and the listener together with their positions provide the information needed to doppler shift all sound sources in the world accordingly. For other worldly effects such as muffling sounds behind a door or changing the sound of the room with reverbs we’ll have to write some code and configure processing nodes to meet with our desired results.

    For adding sounds to the user interface and such direct effects, we can hook the sounds up without going through the spatialization, which makes them a bit simpler. You can still process the effects and be creative if you like. Perhaps pan the sound source based on where the mouse pointer clicked.

    Initializing Web Audio

    Setting up for using Web Audio is quite simple. The tricky parts are related to figuring out how much sound you need to preload and how much you feel comfortable with loading later. You also need to take into account that loading a sound contains two asynchronous and potentially slow operations. The first is the download, the second is the decompression from some small format such as OGG or MP3 to arraybuffer. When developing against a local machine you’ll find that the decompression is a lot slower than the download and as with download speeds in general we can expect to not know how much time this will require for any given user.

    Playing sound streams

    Once you have a sound decompressed and ready it can be used to create a sound source. A sound source is a relatively low level object which streams its sound data at some speed to its selected target node. For the simplest system this target node is the speaker output. Even with this primitive system, you can already manipulate the playback rate of the sound, this changes its pitch and duration. There is a nice feature in the Web Audio API which allows you to adjust the behaviour of interpolating a change like this to fit your desires.

    Adding sound effects: reverb, delay, and distortion

    To add an effect to our simple system you put a processor node between the source and the speakers. us audio engineers wants to split the source to have a “dry” and a “wet” component at this point. The dry component is the non-processed sound and the wet is processed. Then you’ll send both the wet and the dry to the speakers and adjust the mix between them by adding a gain node on each of these tracks. Gain is an engineery way of saying “volume”. You can keep on like this and add nodes between the source and the speakers as you please. Sometimes you’ll want effects in parallel and sometimes you want them serially. When coding your system its probably a good idea to make it easy to change how this wiring is hooked up for any given node.

    Conclusion

    We’re quite happy with how “Songs of Diridum” turned out, and we’re impressed with the audio and 3D performance available in today’s web browsers. Hopefully the mobile devices will catch up soon performance-wise, and HTML5 will become the most versatile cross-platform game environment available.
    Now go play “Songs of Diridum”!

  2. Building a Firefox OS App for my favorite Internet radio station

    I recently created a Firefox OS app for my favourite radio station — radio paradise. It was a lot of fun making this app, so I thought it would be good to share some notes about how I built it.

    The audio tag

    It started by implementing the main functionality of the app, playing an ogg stream I got from the Internet radio station, using the HTML5 audio element

    <audio src="http://stream-sd.radioparadise.com/rp_192m.ogg" controls preload></audio>

    That was easy! At this point our app is completely functional. If you don’t believe me, checkout this jsfiddle. But please continue reading, since there will be a few more sweet features added. In fact, checkout the short video below to see how it will turn out.

    Because this content belongs to radio paradise, before implementing the app, I contacted them to ask for their permission to make a Firefox OS app for their radio station; they responded:

    Thanks. We’d be happy to have you do that. Our existing web player is html5-based. That might be a place to start. Firefox should have native support for our Ogg Vorbis streams.

    I couldn’t have asked for a more encouraging response, and that was enough to set things in motion.

    Features of the app

    I wanted the app to be very minimal and simple — both in terms of user experience and the code backing it. Here is a list of the features I decided to include:

    • A single, easy to access, button to play and pause the music
    • Artist name, song title and album cover for the current song playing should fill up the interface
    • Setting option to select song quality (for situation when bandwidth is not enough to handle highest quality)
    • Setting option to start app with music playing or paused
    • Continue playing even when the app is sent to the background
    • Keep the screen on when the app is running in the forground

    Instead of using the HTML tag, I decided to create the audio element and configure it in JavaScript. Then I hooked up an event listener for a button to play or stop music.

      var btn = document.getElementById('play-btn');
      var state = 'stop';
      btn.addEventListener('click', stop_play);
     
      // create an audio element that can be played in the background
      var audio = new Audio();
      audio.preload = 'auto';
      audio.mozAudioChannelType = 'content';
     
      function play() {
        audio.play();
        state = 'playing';
        btn.classList.remove('stop');
        btn.classList.add('playing');
      }
     
      function stop() {
        audio.pause();
        state = 'stop';
        btn.classList.add('stop');
        btn.classList.remove('playing');
      }
     
      // toggle between play and stop state
      function stop_play() {
        (state == 'stop') ? play() : stop();
      }

    Accessing current song information

    The first challenge I faced was accessing the current song information. Normally we should not need any special privilege to access third party API’s as long as they provide correct header information. However, the link radio paradise provided me for getting the current song information did not allow for cross origin access. Luckily FirefoxOS has a special power reserved for this kind of situation — systemXHR comes to the rescue.

    function get_current_songinfo() {
      var cache_killer = Math.floor(Math.random() * 10000);
      var playlist_url =
        'http://www.radioparadise.com/ajax_rp2_playlist.php?' +
        cache_killer;
      var song_info = document.getElementById('song-info-holder');
      var crossxhr = new XMLHttpRequest({mozSystem: true});
      crossxhr.onload = function() {
        var infoArray = crossxhr.responseText.split('|');
        song_info.innerHTML = infoArray[1];
        next_song = setInterval(get_current_songinfo, infoArray[0]);
        update_info();
      };
      crossxhr.onerror = function() {
        console.log('Error getting current song info', crossxhr);
        nex_song = setInterval(get_current_singinfo, 200000);
      };
      crossxhr.open('GET', playlist_url);
      crossxhr.send();
      clearInterval(next_song);
    }

    This meant that the app would have to be privileged and thus packaged. I normally would try to keep my apps hosted, because that is very natural for a web app and has several benefits including the added bonus of being accessible to search engines. However, in cases such as this we have no other option but to package the app and give it the special privileges it needs.

    {
      "version": "1.1",
      "name": "Radio Paradise",
      "launch_path": "/index.html",
      "description": "An unofficial app for radio paradise",
      "type": "privileged",
      "icons": {
        "32": "/img/rp_logo32.png",
        "60": "/img/rp_logo60.png",
        "64": "/img/rp_logo64.png",
        "128": "/img/rp_logo128.png"
      },
      "developer": {
        "name": "Aras Balali Moghaddam",
        "url": "http://arasbm.com"
      },
      "permissions": {
        "systemXHR": {
          "description" : "Access current song info on radioparadise.com"
        },
        "audio-channel-content": {
          "description" : "Play music when app goes into background"
        }
      },
      "installs_allowed_from": ["*"],
      "default_locale": "en"
    }

    Updating song info and album cover

    That XHR call to radio paradise proides me with three important pieces of information:

    • Name of the current song playing and it’s artist
    • An image tag containing the album cover
    • Time left to the end of current song in miliseconds

    Time left to the end of current song is very nice to have. It means that I can execute the XHR call and update the song information only once for every song. I first tried using the setTimeout function like this:

    //NOT working example. Can you spot the error?
    crossxhr.onload = function() {
      var infoArray = crossxhr.responseText.split('|');
      song_info.innerHTML = infoArray[1];
      setTimeout('get_current_songinfo()', infoArray[0]);
      update_info();
    };

    To my surprise, that did not work, and I got a nice error in logcat about a CSP restriction. It turns out that any attempt at dynamically executing code is banned for security reasons. All we have to do in this scenario to avoid the CSP issue is to pass a callable object, instead of a string.

      // instead of passing a string to setTimout we pass
      // a callable object to it
      setTimeout(get_current_songinfo, infoArray[0]);

    Update: Mindaugas pointed out in the comments below that using innerHTML to parse unknown content in this way, introduces some security risks. Because of these security implications, we should retrieve the remote content as text instead of HTML. One way to do this is to use song_info.textContent which does not interpret the passed content as HTML. Another option, as Frederik Braun pointed out is to use a text node which can not render HTML.

    radio paradise mobile web app running on FirefoxOS

    With a bit of CSS magic, things started to fall into place pretty quickly

    Adding a unique touch

    One of the great advantages of developing mobile applications for the web is that you are completely free to design your app in any way you want. There is no enforcement of style or restriction on interaction design innovation. Knowing that, it was hard to hold myself back from trying to explore new ideas and have some fun with the app. I decided to hide the settings behind the main content and then add a feature so user can literally cut open the app in the middle to get to setting. That way they are tucked away, but still can be discovered in an intuitive way. For UI elements in the setting page to toggle options I decided to give Brick a try., with a bit of custom styling added.

    radio paradise app settings

    User can slide open the cover image to access app settings behind it

    Using the swipe gesture

    As you saw in the video above, to open and close the cover image I use pan and swipe gestures. To implement that, I took gesture detector from Gaia. It was very easy to integrated the gesture code as a module into my app and hook it up to the cover image.

    Organizing the code

    For an app this small, we do not have to use modular code. However, since I have recently started to learn about AMD practices, I decided to use a module system. I asked James Burke about implications of using requirejs in an app like this. He suggested I use Alameda instead, since it is geared toward modern browsers.

    Saving app settings

    I wanted to let users choose stream quality as well as whether they want the app to start playing music as soon as it opens. Both of these options need to be persisted somewhere and retrieved when the app starts. I just needed to save a couple of key/value pairs. I went to #openwebapps irc channel and asked for advice. Fabrice pointed me to a nice piece of code in Gaia (again!) that is used for asynchronous storing of key/value pairs and even whole objects. That was perfect for my use case, so I took it as well. Gaia appears to be a goldmine. Here is the module I created for settings.

    define(['helper/async_storage'], function(asyncStorage) {
      var setting = {
        values: {
          quality: 'high',
          play_on_start: false
        },
        get_quality: function() {
          return setting.values.quality;
        },
        set_quality: function(q) {
          setting.values.quality = q;
          setting.save();
        },
        get_play_on_start: function() {
          return setting.values.play_on_start;
        },
        set_play_on_start: function(p) {
          setting.values.play_on_start = p;
          setting.save();
        },
        save: function() {
          asyncStorage.setItem('setting', setting.values);
        },
        load: function(callback) {
          asyncStorage.getItem('setting', function(values_obj) {
            if (values_obj) setting.values = values_obj;
            callback();
          });
        }
      };
      return setting;
    });

    Splitting the cover image

    Now we get to the really fun part that is splitting the cover image in half. To achieve this effect, I made two identical overlapping canvas element both of which are sized to fit the device width. One canvas clips the image and keeps the left portion of it while the other keeps the right side.

    Each canvas clips and renders half of the image

    Each canvas clips and renders half of the image

    Here is the code for draw function where most of the action is happening. Note that this function runs only once for each song, or when user changes the orientation of the device from portrait to landscape and vice versa.

    function draw(img_src) {
      width = cover.clientWidth;
      height = cover.clientHeight;
      draw_half(left_canvas, 'left');
      draw_half(right_canvas, 'right');
      function draw_half(canvas, side) {
        canvas.setAttribute('width', width);
        canvas.setAttribute('height', height);
        var ctx = canvas.getContext('2d');
        var img = new Image();
        var clip_img = new Image();
        // opacity 0.01 is used to make any glitch in clip invisible
        ctx.fillStyle = 'rgba(255,255,255,0.01)';
     
        ctx.beginPath();
        if (side == 'left') {
          ctx.moveTo(0, 0);
          // add one pixel to ensure there is no gap
          var center = (width / 2) + 1;
        } else {
          ctx.moveTo(width, 0);
          var center = (width / 2) - 1;
        }
     
        ctx.lineTo(width / 2, 0);
     
        // Draw a wavy pattern down the center
        var step = 40;
        var count = parseInt(height / step);
        for (var i = 0; i < count; i++) {
          ctx.lineTo(center, i * step);
     
          // alternate curve control point 20 pixels, every other time
          ctx.quadraticCurveTo((i % 2) ? center - 20 :
            center + 20, i * step + step * 0.5, center, (i + 1) * step);
        }
        ctx.lineTo(center, height);
        if (side == 'left') {
          ctx.lineTo(0, height);
          ctx.lineTo(0, 0);
        } else {
          ctx.lineTo(width, height);
          ctx.lineTo(width, 0);
        }
     
        ctx.closePath();
        ctx.fill();
        ctx.clip();
     
        img.onload = function() {
          var h = width * img.height / img.width;
          ctx.drawImage(img, 0, 0, width, h);
        };
        img.src = img_src;
      }
    }

    Keeping the screen on

    The last feature I needed to add was keeping the screen on when the app is running in foreground and that turned out to be very easy to implement as well. We need to request a screen wake lock

      var lock = window.navigator.requestWakeLock(resourceName);

    The screen wake lock is actually pretty smart. It will be automatically released when app is sent to the background, and then will given back to your app when it comes to the foreground. Currently in this app I have not provided an option to release the lock. If in future I get requests to add that option, all I have to do is release the lock that has been obtained before setting the option to false

      lock.unlock();

    Getting the app

    If you have a FirefoxOS device and like great music, you can now install this app on your device. Search for “radio paradise” in the marketplace, or install it directly from this link. You can also checkout the full source code from github. Feel free to fork and modify the app as you wish, to create your own Internet Radio apps! I would love it if you report issues, ask for features or send pull requests.

    Conclusion

    I am more and more impressed by how quickly we can create very functional and unique mobile apps using web technologies. If you have not build a mobile web app for Firefox OS yet, you should definitely give it a try. The future of open web apps is very exciting, and Firefox OS provides a great platform to get a taste of that excitement.

    Now it is your turn to leave a comment. What is your favourite feature of this app? What things would you have done differently if you developed this app? How could we make this app better (both code and UX)?

  3. Writing Web Audio API code that works in every browser

    You probably have already read the announcement on the Web Audio API coming to Firefox, and are totally excited and ready to make your until-now-WebKit-only sites work with Firefox, which uses the unprefixed version of the spec.

    Unfortunately, Chrome, Safari and Opera still use the webkitAudioContext prefixed name. Furthermore, as a result of the spec being still in flux, some browsers use deprecated properties and method names that are not present in standards-compliant browsers: Safari uses the old method names, Firefox uses the new ones, and Chrome and Opera use both. In addition, not all features of Web Audio are already implemented in Firefox yet.

    What do we do!?

    We don’t want to maintain two or more separate code bases, and feature detection code is cumbersome! Plus we want to write code that reliably works in the future, or at least, works with a minimum amount of changes. Is there a way to satisfy all these constraints at the same time? Probably!

    Writing for today (and tomorrow)

    First, get a copy of AudioContext-MonkeyPatch by Chris Wilson. This little library will “normalise” the interfaces for you and make it look as if your code is running in a standards compliant browser, by aliasing prefixed names to the unprefixed versions. And it won’t do anything if the unprefixed versions are already present.

    Once you include it in your page, you can write in “modern Web Audio API” style, and do things such as:

    var audioContext = new AudioContext();

    everywhere, including Chrome/ium, Opera, Safari, and —of course!— Firefox.

    Also, if new methods such as start are not detected in some nodes, the library will also alias them to their old names. Thus, start is mapped to noteOn, stop to noteOff, and so on.

    If you’re porting moderately “old” code (say, a year old) it’s possible that it uses some methods that AudioContext-MonkeyPatch doesn’t alias, because it helps you to write code in the new style. For example, the way to create instances of GainNode used to be

    var gain = audioContext.createGainNode();

    but nowadays it is just

    var gain = audioContext.createGain();

    Since the old method names are not present in Firefox, existing code may crash with something like createGainNode is not a function, and you now know why.

    There’s a section in the spec that lists the old names and their updated equivalences; be sure to check it out and change your code accordingly. You can also check this article on porting which covers more cases and has many code samples.

    Things that are not ready yet

    Second, ensure that your project doesn’t use node types that are not implemented yet in Firefox: MediaStreamAudioSourceNode, MediaElementAudioSourceNode and OscillatorNode.

    If it’s using, for example, OscillatorNode, you will have to wait until it is supported, or maybe, if you’re really eager, hack in some replacement using ScriptProcessorNode, which allows you to write a node with callbacks that get called periodically, so that your JavaScript code generates or processes audio.

    The node parameters you use must also be supported in Firefox too. If they aren’t, you might be able to change them into something “acceptable” for the time being, and count on the talented audio developers to implement those very soon.

    For example, up until a couple of days ago PannerNode did not support the default HRTF panning model yet, and attempting to use a PannerNode with that configuration simply resulted in silence or a mono output coming out from that node, depending on the build you used.

    Today the support is already present in Nightly, but not quite yet in Aurora. In the meantime, you can explicitly specify 'equalpower' instead:

    var panner = new audioContext.PannerNode();
    panner.panningModel = 'equalpower';

    Keep track

    The best way to know what’s going on in the Web Audio API land is to subscribe to the mailing list. Be aware that there might be a bit of high level tech discussions from time to time, and you might not understand it all, but you will learn a lot even if only by skimming through it.

    You might also want to subscribe to the umbrella bug that tracks the Web Audio API implementation in Firefox, so that you get alerts when associated bugs get updated or resolved.

    Finally, there’s also a list of projects built with the Web Audio API, specifying which ones use the standard AudioContext and which browsers do they work on. If you’re a person that learns by example, it might be interesting to have a look at their source and see how they have resolved the compatibility issues.

  4. Web Audio API comes to Firefox

    We have been working on implementing the Web Audio API in Firefox for a while now, and we currently have basic support for the API implemented on Firefox Nightly and Firefox Aurora. Web Audio provides a number of cool features that can be used in order to create music applications, games, and basically any application which requires advanced audio processing.

    Features

    Here are some examples of the features:

    • Scheduling events to happen at exact times during audio playbacks
    • Various types of audio filters to create effects such as echo, noise cancellation, etc.
    • Sound synthesis to create electronic music
    • 3D positional audio to simulate effects such as a sound source moving around the scene in a game
    • Integration for WebRTC to apply effects to sound coming in from external input (a WebRTC call, a guitar plugged in to your device, etc.) or to sound which is transmitted to the other party in a WebRTC call
    • Analysing the audio data in order to create sound visualizers, etc.

    Code sample

    Here is a simple example of what you can build with Web Audio. Let’s imagine that you’re working on a game, and you want to play a gunshot sound as soon as the player clicks on your game canvas. In order to make sure that you’re not affected by things like network delay, the audio decoder delay, etc., you can use Web Audio to preload the audio into a buffer as part of the loading process of your game, and schedule it precisely when you receive a click event.

    In order to create a neater sound effect, we can additionally loop the sound while the mouse is pressed, and create a fade-out effect when you release the mouse. The following code sample shows how to do that:

    // Load the sound file from the network
    var decodedBuffer;
    var ctx = new AudioContext();
    var xhr = new XMLHttpRequest();
    xhr.open("GET", "gunshot.ogg", true);
    xhr.responseType = "arraybuffer";
    xhr.send();
    xhr.onload = function() {
      // At this point, xhr.response contains the encoded data for gunshot.ogg,
      // so let's decode it into an AudioBuffer first.
      ctx.decodeAudioData(xhr.response, function onDecodeSuccess(buffer) {
        decodedBuffer = buffer;
      }, function onDecodeFailure() { alert('decode error!'); });
    };
     
    // Set up a mousedown/mouseup handler on your game canvas
    canvas.addEventListener("mousedown", function onMouseDown() {
      var src = ctx.createBufferSource();
      src.buffer = decodedBuffer;                                      // play back the decoded buffer
      src.loop = true;                                                 // set the sound to loop while the mouse is down
      var gain = ctx.createGain();                                     // create a gain node in order to create the fade-out effect when the mouse is released
      src.connect(gain);
      gain.connect(ctx.destination);
      canvas.src = src;                                                // save a reference to our nodes to use it later
      canvas.gain = gain;
      src.start(0);                                                    // start playback immediately
    }, false);
    canvas.addEventListener("mouseup", function onMouseUp() {
      var src = canvas.src, gain = canvas.gain;
      src.stop(ctx.currentTime + 0.2);                                 // set up playback to stop in 200ms
      gain.gain.setValueAtTime(1.0, ctx.currentTime);
      gain.gain.linearRampToValueAtTime(0.001, ctx.currentTime + 0.2); // set up the sound to fade out within 200ms
    }, false);

    The first WebAudio implementations and WebKit

    The Web Audio API was first implemented in Google Chrome using the webkitAudioContext prefix. We have been discussing the API on the W3C Audio Working Group and have been trying to fix some of the problems in the earlier versions of the API. In some places, doing that means that we needed to break backwards compatibility of code which targets webkitAudioContext.

    There is a guide on how to port those applications to the standard API. There is also the webkitAudioContext monkeypatch available which handles some of these changes automatically, which can help to make some of the code targeting webkitAudioContext to work in the standard API.

    The implementation in Firefox

    In Firefox, we have implemented the standard API. If you’re a web developer interested in creating advanced audio applications on the web, it would be really helpful for you to review Porting webkitAudioContext code to standards based AudioContext to get a sense of all of the non-backwards-compatible changes made to the API through the standardization process.

    We are currently hoping to release Web Audio support in Firefox 24 for desktop and Android, unless something unexpected happens that would cause us to delay the release, but you can use most parts of the API on Nightly and Aurora right now.

    There are still some missing bits and pieces, including MediaStreamAudioSourceNode, MediaElementAudioSourceNode, OscillatorNode and HRTF panning for PannerNode. We’ll add support for the remaining parts of the API in the coming weeks on Nightly and Firefox Aurora.

  5. Shiva – More than a RESTful API to your music collection

    Music for me is not only part of my daily life, it is an essential part. It helps me concentrate, improves my mood, distracts me and/or helps me relax. This is true for most (if not all) people.The lack of music or the wrong selection of tunes can have the complete opposite effect, it has a strong influence on how we feel. It also plays a key role in shaping our identity. Music, like most (if not all) types of culture, is not an accessory, is not something we can choose to ignore, it is a need that we have as human beings.

    The Internet has become the most efficient medium ever in culture distribution. Today it’s easier than ever to have access to a huge diversity of culture, from any place in the world. At the same time you can reach the whole world with your music, instantly, with just signing up at one of the many websites you can find for music distribution. Just as “travel broadens the mind”, music sharing enriches culture, and thanks to the Internet, culture is nowadays more alive than ever.

    Not too long ago record labels were the judges of what was good music (by their standards) and what was not. They controlled the only global-scale distribution channel, so to make use of it you would need to come to an agreement with them, which usually meant giving up most of the rights over your cultural pieces. Creating and maintaining such a channel was neither easy nor cheap, there was a need for the service they provided, and even though their goal was not to distribute culture but to be profitable (as every company) both parties, industry and society, benefited from this.

    Times have changed and this model is obsolete now; the king is dead, so there are companies fighting to occupy this vacancy. What also changed was the business model. Now it is not just the music – it is also about restricting access to it and collecting (and selling) private information about the listeners. In other words, DRM and privacy. Here is where Shiva comes into play.

    What is Shiva?

    Shiva is, technically speaking, a RESTful API to your music collection. It indexes your music and exposes an API with the metadata of your files so you can then perform queries on it and organize it as you wish.

    On a higher level, however, Shiva aims to be a free (as in freedom and beer) alternative to popular music services. It was born with the goal of giving back the control over their music and privacy to the users, protecting them from the industry’s obsession with control.

    It’s not intended to compete directly with online music services, but to be an alternative that you can install and modify to your needs. You will own the music in your server. Nobody but you (or whoever you give permission) will be able to delete files or modify the files’ metadata to correct it when it’s wrong. And of course, it will all be available to any device with Internet connection.

    You will also have a clean, RESTful API to your music without restrictions. You can grant access to your friends and let them use the service or, if they have their own Shiva instances, let both servers talk to each other and share the music transparently.

    To sum up, Shiva is a distributed social network for sharing music.

    Your own music server

    Shiva-Server is the component that indexes your music and exposes a RESTful API. These are the available resources:

    • /artists
      • /artists/shows
    • /albums
    • /tracks
      • /tracks/lyrics

    It’s built in python, using SQLAlchemy as ORM and Flask for HTTP communication.

    Indexing your music

    The installation process is quite simple. There’s a very complete guide in the README file, but I’ll summarize it here:

    • Get the source
    • Install dependencies from the requirements.pip file
    • Copy /shiva/config/local.py.example to /shiva/config/local.py
    • Edit it and configure the directories to scan
    • Create the database (sqlite by default)
    • Run the indexer
    • Run the development server

    For details on any of the steps, check the documentation.

    Once the music has been indexed, all the metadata is stored in the database and queried from it. Files are only accessed by the file server for streaming. Lyrics are scraped the first time they are requested and then cached. Given the changing nature of the shows resource, this is the only one that is not cached; instead is queried every time. At the moment of this writing is using only one source, the BandsInTown API.

    Once the server is running you have all you need to start playing with Shiva. Point to a resource, like /artists, to see it in action.

    Scraping lyrics

    As mentioned, lyrics are scraped, and you can create your own scrapers for specific websites that have the lyrics you want. All you need is to create a python file with a class inheriting from LyricScraper in the /shiva/lyrics directory. The following template makes clear how easy it is. Let’s say we have a file /shiva/lyrics/mylyrics.py:

    From shiva.lyrics import LyricScraper:

    class MyLyricsScraper(LyricScraper):
        “““ Fetches lyrics from mylyrics.com ”””
     
        def fetch(self, artist, title):
            # Magic happens here
     
            if not lyrics:
                return False
     
            self.lyrics = lyrics
            self.source = lyrics_url
     
            return True

    After this you need add your brand new scraper to the scrapers list, in your local.py config file:

    SCRAPERS = {
        ‘lyrics’: (
            ‘mylyrics.MyLyricScraper,
        )
    }

    Shiva will instantiate your scraper and call the fetch() method. If it returns True, it will then proceed to look for the lyrics in the lyrics attribute, and the URL from which they were scraped in the source attribute:

    if scraper.fetch():
        lyrics = Lyrics(text=scraper.lyrics, source=scraper.source,
                        track=track)
        g.db.session.add(lyrics)
        g.db.session.commit()
     
        return lyrics

    Check the existing scrapers for real world examples.

    Lyrics will only be fetched when you request one specific tracks, not when retrieving more than one. The reason behind this is that each track’s lyrics may require two or more requests, and we don’t want to DoS the website when retrieving an artist’s discography. That would not be nice.

    Setting up a file server

    The development server, as its name clearly states, should not be used for production. In fact it is almost impossible because can only serve one request at a time, and the audio element will keep the connection open as long as the file is playing, ergo, blocking completely the API.

    Shiva provides a way to delegate the file serving to a dedicated server. For this you have to edit your /shiva/config/local.py file, and edit the MEDIA_DIRS setting. This option expects a tuple of MediaDir objects, which provide the mechanism to define directories to scan and a socket to serve the files through:

    MediaDir(‘/srv/music’, url=’http://localhost:8080)

    This way doesn’t matter in which socket your application runs, files in the /src/music directory will be served through the URL defined in the url attribute. This object also allows to define subdirectories to be scanned. For example:

    MediaDir(‘/srv/music’, dirs=(‘/pop’, ‘/rock’), url=’http://localhost:8080)

    In this case only the directories /srv/music/pop and /srv/music/rock will be scanned. You can define as many MediaDir objects as you need. Suppose you have the file /srv/music/rock/nofx-dinosaurs_will_die.mp3, once this is in place the track’s download_uri attribute will be:

    {
        "slug": "dinosaurs-will-die",
        "title": "Dinosaurs Will Die",
        "uri": "/track/510",
        "id": 510,
        "stream_uri": "http://localhost:8080/nofx-dinosaurs_will_die.mp3"
    }

    Your own music player

    Once you have your music scanned and the API running, you need a client that consumes those services and plays your music, like Shiva-Client. Built as a single page application with AngularJS and HTML5 technologies, like Audio and Drag and Drop, this client will allow you to browse through your catalog, add files to a playlist and play them.

    Due to the Same-Origin-Policy you will need a server that acts as proxy between the web app and the API. For this you will find a server.py file in the repo that will do this for you. The only dependency for this file is Flask, but I assume you have that installed already. Now just execute it:

    python server.py

    This will run the server on http://localhost:9001/

    Access that URI and check the server output, you will see not only the media needed by the app (like images and javascript files) but also a /api/artists call. That’s the proxy. Any call to /api/ will be redirected by the server to http://localhost:9002/

    If you open a console, like firebug, you will see a Shiva object in the global namespace. Inside of it you will find 2 main attributes, Player and Playlist. Those objects encapsulate all the logic for queueing and playing music. The Player only holds the current track, acts as a wrapper around HTML5’s Audio element. What may not seem natural at first is that normally you won’t interact with the Player, but with the Playlist, which acts as a façade because it knows all the tracks and instructs the Player which track to load and play next.

    The source for those objects is in the js/controllers.js file. There you will also find the AngularJS controllers, which perform the actual calls to the API. It consists of just 2 calls, one to get the list of artists and another one to get the discography for an artist. Check the code, is quite simple.

    So once tracks are added to the playlist, you can do things like play it:

    Shiva.Playlist.play()

    Stop it:

    Shiva.Playlist.stop()

    Or skip the track:

    Shiva.Playlist.next()

    Some performance optimizations were made in order to lower the processing as much as possible. For example, you will see a progress bar when playing music, that will only be updated when it has to be shown. The events will be removed when not needed to avoid any unnecessary DOM manipulation of non-visible elements:

    Shiva.Player.audio.removeEventListener('timeupdate',    Shiva.Player.audio.timeUpdateHandler, false);

    Present and future

    At the time of this writing, Shiva is in a usable state and provides the core functionality but is still young and lacks some important features. That’s also why this article doesn’t dig too much into the code, because it will rapidly change. To know the current status please check the documentation.

    If you want to contribute, there are many ways you can help the project. First of all, fork both the server and the client, play with them and send your improvements back to upstream.

    Request features. If you think “seems nice, but I wouldn’t use it” write down why and send those thoughts and ideally some ideas on how to tackle them. There is a long list of lacking features; your help is vital in order to prioritize them.

    But most importantly; build your own client. You know what you like about your favourite music player and you know what sucks. Fork and modify the existing client or create your own from scratch, bring new ideas to the old world of music players. Give new and creative uses to the API.

    There’s a lot of work to be done, in many different fronts. Some of the plans for the future are:

    • Shiva-jslib: An easy to use javascript library that encapsulates the API calls, so you can focus only on building the GUI and forget about the protocol.
    • Shiva2Shiva communication: Let two (or more) Shiva instances talk to each other to allow for transparent sharing of music between servers.
    • Shiva-FXOS: A Shiva client for Firefox OS.

    And anything else you can think of. Code, send ideas, code your clients.

    Happy hacking!

  6. Simplifying audio in the browser

    The last few years have seen tremendous gains in the capabilities of browsers, as the latest HTML5 standards continue to get implemented. We can now render advanced graphics on the canvas, communicate in real-time with WebSockets, access the local filesystem, create offline apps and more. However, the one area that has lagged behind is audio.

    The HTML5 Audio element is great for a small set of uses (such as playing music), but doesn’t work so well when you need low-latency, precision playback.

    Over the last year, a new audio standard has been developed for the browser, which gives developers direct access to the audio data. Web Audio API allows for high precision and high performing audio playback, as well as many advanced features that just aren’t possible with the HTML5 Audio element. However, support is still limited, and the API is considerably more complex than HTML5 Audio.

    Introducing howler.js

    The most obvious use-case for high-performance audio is games, but most developers have had to settle for HTML5 Audio with a Flash fallback to get browser compatibility. My company, GoldFire Studios, exclusively develops games for the open web, and we set out to find an audio library that offered the kind of audio support a game needs, without relying on antiquated technologies. Unfortunately, there were none to be found, so we wrote our own and open-sourced it: howler.js.

    Howler.js defaults to Web Audio API and uses HTML5 Audio as the fallback. The library greatly simplifies the API and handles all of the tricky bits automatically. This is a simple example to create an audio sprite (like a CSS sprite, but with an audio file) and play one of the sounds:

    var sound = new Howl({
      urls: ['sounds.mp3', 'sounds.ogg'],
      sprite: {
        blast: [0, 2000],
        laser: [3000, 700],
        winner: [5000, 9000]
      }
    });
     
    // shoot the laser!
    sound.play('laser');

    Using feature detection

    At the most basic level, this works through feature detection. The following snippet detects whether or not Web Audio API is available and creates the audio context if it is. Current support for Web Audio API includes Chrome 10+, Safari 6+, and iOS 6+. It is also in the pipeline for Firefox, Opera and most other mobile browsers.

    var ctx = null,
      usingWebAudio = true;
    if (typeof AudioContext !== 'undefined') {
      ctx = new AudioContext();
    } else if (typeof webkitAudioContext !== 'undefined') {
      ctx = new webkitAudioContext();
    } else {
      usingWebAudio = false;
    }

    Audio support for different codecs varies across browsers as well, so we detect which format is best to use from your provided array of sources with the canPlayType method:

    var audioTest = new Audio();
    var codecs = {
      mp3: !!audioTest.canPlayType('audio/mpeg;').replace(/^no$/,''),
      ogg: !!audioTest.canPlayType('audio/ogg; codecs="vorbis"').replace(/^no$/,''),
      wav: !!audioTest.canPlayType('audio/wav; codecs="1"').replace(/^no$/,''),
      m4a: !!(audioTest.canPlayType('audio/x-m4a;') || audioTest.canPlayType('audio/aac;')).replace(/^no$/,''),
      webm: !!audioTest.canPlayType('audio/webm; codecs="vorbis"').replace(/^no$/,'')
    };

    Making it easy

    These two key components of howler.js allows the library to automatically select the best method of playback and source file to load and play. From there, the library abstracts away the two different APIs and turns this (a simplified Web Audio API example without all of the extra fallback support and extra features):

    // create gain node
    var gainNode, bufferSource;
    gainNode = ctx.createGain();
    gainNode.gain.value = volume;
    loadBuffer('sound.wav');
     
    var loadBuffer = function(url) {
      // load the buffer from the URL
      var xhr = new XMLHttpRequest();
      xhr.open('GET', url, true);
      xhr.responseType = 'arraybuffer';
      xhr.onload = function() {
        // decode the buffer into an audio source
        ctx.decodeAudioData(xhr.response, function(buffer) {
          if (buffer) {
            bufferSource = ctx.createBufferSource();
            bufferSource.buffer = buffer;
            bufferSource.connect(gainNode);
            bufferSource.start(0);
          }
        });
      };
      xhr.send();
    };

    (Note: some old deprecated names were createGainNode and noteOn, if you see them in other examples on the web)

    Into this:

    var sound = new Howl({
      urls: ['sound.wav'],
      autoplay: true
    });

    It is important to note that neither Web Audio API nor HTML5 Audio are the perfect solution for everything. As with anything, it is important to select the right tool for the right job. For example, you wouldn’t want to load a large background music file using Web Audio API, as you would have to wait for the entire data source to load before playing. HTML5 Audio is able to play very quickly after the download begins, which is why howler.js also implements an override feature that allows you to mix-and-match the two APIs within your app.

    Audio in the browser is ready

    I often hear that audio in the browser is broken and won’t be useable for anything more than basic audio streaming for quite some time. This couldn’t be further from the truth. The tools are already in today’s modern browsers. High quality audio support is here today, and Web Audio API and HTML5 combine to offer truly plugin-free, cross-browser audio support. Browser audio is no longer a second-class citizen, so let’s all stop treating it like one and keep making apps for the open web.

  7. Defending Opus

    On January 18th, France Telecom filed an IPR disclosure against Opus citing a single patent under non-royalty free terms. This raises a key question – what impact does this have on Opus? A close evaluation indicates that it has no impact on the Opus specification in any way.

    Summary:

    A careful reading of the FT patent reveals that:

    1. The FT patent does not cover the Opus reference implementation because critical limitations of the claim are absent;
    2. The patent is directed to encoders, therefore it cannot affect the Opus specification, which only includes conformance tests for the decoder, and
    3. With a simple change, we can make non-infringement even more obvious.

    Let’s expand on those points a bit. If you don’t want to hear about patent claims, you should stop reading this article now.

    Details:

    IETF IPR disclosures are a safe course of action for patent holders: they prevent unclean hands arguments or implied license grants. However, because the IETF requires specific patent numbers in these disclosures, we can analyze the claims. The patent in question is EP0743634B1, and the corresponding U.S. and other related foreign patents: “Method of adapting the noise masking level in an analysis-by-synthesis speech coder employing a short-term perceptual weighting filter”. It has a single independent claim, Claim 1. All of the other claims are “dependent claims” built on top of Claim 1. If Opus does not infringe Claim 1, it cannot infringe any other claim.

    The FT patent doesn’t cover Opus

    To establish infringement, all of the elements of a claim must be present in an implementation. Key elements of Claim 1 are not present in the Opus reference implementation, including, among others

    • The way the bandwidth expansion coefficients are used. In Claim 1, two parameters γ1 and γ2 are used to shape the quantization noise added by the lossy compression by “minimizing the energy of an error signal resulting from the filtering of the difference between the speech signal and the synthetic signal.” Opus doesn’t do this. Instead, the Opus encoder uses a single parameter BWExp2 to shape the noise, and uses a different parameter BWExp1 to shape the input signal, and also applies an additional gain to the filtered input to match the volume of the original.
    • The optimization criterion. Opus doesn’t compute the “difference between the speech signal and the synthetic signal”. We want to code a signal that differs from the original speech, so we don’t compare what we code to the original speech. This is actually one of the main innovations in Opus: it’s the reason the SILK layer doesn’t need a post-filter like many other codecs do.

    Thus Opus doesn’t perform the steps of the claim and cannot infringe the FT patent by definition. Of course this is not a legal opinion, but it doesn’t take a lawyer to figure this out. While we don’t know why FT disclosed this patent, we welcome the opportunity to evaluate such disclosures and remove any real or perceived encumbrances. This is one of the benefits of the IETF process.

    The FT patent cannot threaten the specification

    The FT patent covers perceptual noise weighting, which is specific to an encoder. The claim is about the “difference between the speech signal and the synthetic signal”, when a decoder — by definition — doesn’t have access to the input speech signal.

    The Opus specification only demands specific behavior from decoders, leaving the encoder largely unspecified. Even if France Telecom were to continue to assert its patent against Opus, there’s no limit to what we could change in the encoder to avoid whatever theory they have. No deployed systems break. There’s no threat to the Opus standard. We can safely say that the FT patent doesn’t encumber Opus for this reason alone.

    We can always make things even safer if needed

    While we don’t believe that the Opus encoder ever infringed on this patent, we quickly realized there is a simple way to make non-infringement obvious even without analyzing complex DSP filters.

    This can be done with a simple change (patch file) to the code in silk/float/noise_shape_analysis_FLP.c (an equivalent change can be made to the fixed-point version).

    Original code:

    strength = FIND_PITCH_WHITE_NOISE_FRACTION * psEncCtrl->predGain;
    BWExp1 = BWExp2 = BANDWIDTH_EXPANSION / ( 1.0f + strength * strength );
    delta  = LOW_RATE_BANDWIDTH_EXPANSION_DELTA
           * ( 1.0f - 0.75f * psEncCtrl->coding_quality );
    BWExp1 -= delta;
    BWExp2 += delta;

    New code:

    BWExp1 = BWExp2 = BANDWIDTH_EXPANSION;
    delta  = LOW_RATE_BANDWIDTH_EXPANSION_DELTA
           * ( 1.0f - 0.75f * psEncCtrl->coding_quality );
    BWExp1 -= delta;
    BWExp2 += delta;

    Yup, that’s all of two lines changed. This makes the filter parameters depend only on the encoder’s bit-rate, which is clearly not, “spectral parameters obtained in the linear prediction analysis step,” as required by Claim 1. Below is the quality comparison between the original encoder and the modified encoder (using PESQ). As you can see, the difference is so small that it’s not worth worrying about.

  8. It's Opus, it rocks and now it's an audio codec standard!

    In a great victory for open standards, the Internet Engineering Task Force (IETF) has just standardized Opus as RFC 6716.

    Opus is the first state of the art, free audio codec to be standardized. We think this will help us achieve wider adoption than prior royalty-free codecs like Speex and Vorbis. This spells the beginning of the end for proprietary formats, and we are now working on doing the same thing for video.

    There was both skepticism and outright opposition to this work when it was first proposed in the IETF over 3 years ago. However, the results have shown that we can create a better codec through collaboration, rather than competition between patented technologies. Open standards benefit both open source organizations and proprietary companies, and we have been successful working together to create one. Opus is the result of a collaboration between many organizations, including the IETF, Mozilla, Microsoft (through Skype), Xiph.Org, Octasic, Broadcom, and Google.

    A highly flexible codec

    Unlike previous audio codecs, which have typically focused on a narrow set of applications (either voice or music, in a narrow range of bitrates, for either real-time or storage applications), Opus is highly flexible. It can adaptively switch among:

    • Bitrates from 6 kb/s to 512 kb/s
    • Voice and music
    • Mono and stereo
    • Narrowband (8 kHz) to Fullband (48 kHz)
    • Frame sizes from 2.5 ms to 60 ms

    Most importantly, it can adapt seamlessly within these operating points. Doing all of this with proprietary codecs would require at least six different codecs. Opus replaces all of them, with better quality.
    Illustration of the quality of different codecs
    The specification is available in RFC 6716, which includes the reference implementation. Up-to-date software releases are also available.

    Some audio standards define a normative encoder, which cannot be improved after it is standardized. Others allow for flexibility in the encoder, but release an intentionally hobbled reference implementation to force you to license their proprietary encoders. For Opus, we chose to allow flexibility for future encoders, but we also made the best one we knew how and released that as the reference implementation, so everyone could use it. We will continue to improve it, and keep releasing those improvements as open source.

    Use cases

    Opus is primarily designed for use in interactive applications on the Internet, including voice over IP (VoIP), teleconferencing, in-game chatting, and even live, distributed music performances. The IETF recently decided with “strong consensus” to adopt Opus as a mandatory-to-implement (MTI) codec for WebRTC, an upcoming standard for real-time communication on the web. Despite the focus on low latency, Opus also excels at streaming and storage applications, beating existing high-delay codecs like Vorbis and HE-AAC. It’s great for internet radio, adaptive streaming, game sound effects, and much more.

    Although Opus is just out, it is already supported in many applications, such as Firefox, GStreamer, FFMpeg, foobar2000, K-Lite Codec Pack, and lavfilters, with upcoming support in VLC, rockbox and Mumble.

    For more information, visit the Opus website.

  9. Opus Support for WebRTC

    Opus audio codec logo

    As we announced during the beta cycle, Firefox now supports the new Opus audio format. We expect Opus to be published as RFC 6716 any day now, and we’re starting to see Opus support pop up in more and more places. Momentum is really building.

    What does this mean for the web?

    Keeping the Internet an open platform is part of Mozilla’s mission. When the technology the Web needs doesn’t exist, we will invest the resources to create it, and release it royalty-free, just as we ask of others. Opus is one of these technologies.

    Mozilla employs two of the key authors and developers, and has invested significant legal resources into avoiding known patent thickets. It uses processes and methods that have been long known in the field and which are considered patent-free. As a result, Opus is available on a royalty-free basis and can be deployed by anyone, including other open-source projects. Everyone knows this is an incredibly challenging legal environment to operate in, but we think we’ve succeeded.

    Why Opus is important?

    The Opus support in the <audio> tag we’re shipping today is great. We think it’s as good or better than all the other codecs people use there, particularly in the voice modes, which people have been asking for for a long time. But our goals extend far beyond building a great codec for the <audio> tag.

    Mozilla is heavily involved in the new WebRTC standards to bring real-time communication to the Web. This is the real reason we made Opus, and why its low-delay features are so important. At the recent IETF meeting in Vancouver we achieved “strong consensus” to make Opus Mandatory To Implement (MTI) in WebRTC. Interoperability is even more important here than in the <audio> tag. If two browsers ship without any codecs in common, a website still has the option of encoding their content twice to be compatible with both. But that option isn’t available when the browsers are trying to talk to each other directly. So our success here is a big step in bringing interoperable real-time communication to the Web, using native Web technologies, without plug-ins.

    Illustration of the quality of different codecs

    Opus’s flexibility to scale to both very low bitrates and very high quality, and do all of it with very low delay, were instrumental in achieving this consensus. It would take at least six other codecs to satisfy all the use-cases Opus does. So try out Opus today for your podcasts, music broadcasts, games, and more. But look out for Opus in WebRTC coming soon.

  10. Firefox Beta 15 supports the new Opus audio format

    Firefox 15 (now in the Beta channel) supports the Opus audio format, via the Opus reference implementation.

    What is it?

    Opus is a completely free audio format that was recently approved for publication as a standards-track RFC by the IETF. Opus files can play in Firefox Beta today.

    Opus offers these benefits:

    • Better compression than MP3, Ogg, or AAC formats
    • Good for both music and speech
    • Dynamically adjustable bitrate, audio bandwidth, and coding delay
    • Support for both interactive and pre-recorded applications

    Why Should I care?

    First, Opus is free software, free for everyone, for any purpose. It’s also an IETF standard. Both the encoder and decoder are free, including the fixed-point implementation (for mobile devices). These aren’t toy demos. They’re the best we could make, ready for serious use.

    We think Opus is an incredible new format for web audio. We’re working hard to convince other browsers to adopt it, to break the logjam over a common <audio> format.

    The codec is a collaboration between members of the IETF Internet Wideband Audio Codec working group, including Mozilla, Microsoft, Xiph.Org, Broadcom, Octasic, and others.

    We designed it for high-quality, interactive audio (VoIP, teleconference) and will use it in the upcoming WebRTC standard. Opus is also best-in-class for live streaming and static file playback. In fact, it is the first audio codec to be well-suited for both interactive and non-interactive applications.

    Opus is as good or better than basically all existing lossy audio codecs, when competing against them in their sweet spots, including:

    General audio codecs (high latency, high quality)
    • MP3
    • AAC (all flavors)
    • Vorbis
    Speech codecs (low latency, low quality)
    • G.729
    • AMR-NB
    • AMR-WB (G.722.2)
    • Speex
    • iSAC
    • iLBC
    • G.722.1 (all variants)
    • G.719

    And none of those codecs have the versatility to support all the use cases that Opus does.

    Listening tests show that:

    That’s a lot of bandwidth saved. It’s also much more flexible.

    Opus can stream:

    • narrowband speech at bitrates as low as 6 kbps
    • fullband music at rates of 256 kbps per channel

    At the higher of those rates, it is perceptually lossless. It also scales between these two extremes dynamically, depending on the network bandwidth available.

    Opus compresses speech especially well. Those same test results (slide 19) show that for fullband mono speech, Opus is almost transparent at 32 kbps. For audio books and podcasts, it’s a real win.

    Opus is also great for short files (like game sound effects) and startup latency, because unlike Vorbis, it doesn’t require several kilobytes of codebooks at the start of each file. This makes streaming easier, too, since the server doesn’t have to keep extra data around to send to clients who join mid-stream. Instead, it can send them a tiny, generic header constructed on the fly.

    How do I use it in a web page?

    Opus works with the <audio> element just like any other audio format.

    For example:

     <audio src="ehren-paper_lights-64.opus" controls>

    This code in a web page displays an embedded player like this:

    %CODEopusdemo%
    Paper Lights by Ehren Starks Creative Commons License

     
    (Requires Firefox 15 or later)

    Encoding files

    For now, the best way to create Opus files is to use the opusenc tool. You can get source code, along with Mac and Windows binaries, from:

    http://www.opus-codec.org/downloads/

    While Firefox 15 is the first browser with native Opus support, playback is coming to gstreamer, libavcodec, foobar2000, and other media players.

    Streaming

    Live streaming applications benefit greatly from Opus’s flexibility. You don’t have to decide up front whether you want low bandwidth or high quality, to optimize for voice or music, etc. Streaming servers can adapt the encoding as conditions change—without breaking the stream to the player.

    Pre-encoded files can stream from a normal web server. The popular Icecast streaming media server can relay a single, live Opus stream, generated on the fly, to thousands of connected listeners. Opus is supported by the current development version of Icecast.

    More Information

    To learn more visit opus-codec.org, or join us in #opus on irc.freenode.net.