Mozilla

Audio Articles

Sort by:

View:

  1. Shiva – More than a RESTful API to your music collection

    Music for me is not only part of my daily life, it is an essential part. It helps me concentrate, improves my mood, distracts me and/or helps me relax. This is true for most (if not all) people.The lack of music or the wrong selection of tunes can have the complete opposite effect, it has a strong influence on how we feel. It also plays a key role in shaping our identity. Music, like most (if not all) types of culture, is not an accessory, is not something we can choose to ignore, it is a need that we have as human beings.

    The Internet has become the most efficient medium ever in culture distribution. Today it’s easier than ever to have access to a huge diversity of culture, from any place in the world. At the same time you can reach the whole world with your music, instantly, with just signing up at one of the many websites you can find for music distribution. Just as “travel broadens the mind”, music sharing enriches culture, and thanks to the Internet, culture is nowadays more alive than ever.

    Not too long ago record labels were the judges of what was good music (by their standards) and what was not. They controlled the only global-scale distribution channel, so to make use of it you would need to come to an agreement with them, which usually meant giving up most of the rights over your cultural pieces. Creating and maintaining such a channel was neither easy nor cheap, there was a need for the service they provided, and even though their goal was not to distribute culture but to be profitable (as every company) both parties, industry and society, benefited from this.

    Times have changed and this model is obsolete now; the king is dead, so there are companies fighting to occupy this vacancy. What also changed was the business model. Now it is not just the music – it is also about restricting access to it and collecting (and selling) private information about the listeners. In other words, DRM and privacy. Here is where Shiva comes into play.

    What is Shiva?

    Shiva is, technically speaking, a RESTful API to your music collection. It indexes your music and exposes an API with the metadata of your files so you can then perform queries on it and organize it as you wish.

    On a higher level, however, Shiva aims to be a free (as in freedom and beer) alternative to popular music services. It was born with the goal of giving back the control over their music and privacy to the users, protecting them from the industry’s obsession with control.

    It’s not intended to compete directly with online music services, but to be an alternative that you can install and modify to your needs. You will own the music in your server. Nobody but you (or whoever you give permission) will be able to delete files or modify the files’ metadata to correct it when it’s wrong. And of course, it will all be available to any device with Internet connection.

    You will also have a clean, RESTful API to your music without restrictions. You can grant access to your friends and let them use the service or, if they have their own Shiva instances, let both servers talk to each other and share the music transparently.

    To sum up, Shiva is a distributed social network for sharing music.

    Your own music server

    Shiva-Server is the component that indexes your music and exposes a RESTful API. These are the available resources:

    • /artists
      • /artists/shows
    • /albums
    • /tracks
      • /tracks/lyrics

    It’s built in python, using SQLAlchemy as ORM and Flask for HTTP communication.

    Indexing your music

    The installation process is quite simple. There’s a very complete guide in the README file, but I’ll summarize it here:

    • Get the source
    • Install dependencies from the requirements.pip file
    • Copy /shiva/config/local.py.example to /shiva/config/local.py
    • Edit it and configure the directories to scan
    • Create the database (sqlite by default)
    • Run the indexer
    • Run the development server

    For details on any of the steps, check the documentation.

    Once the music has been indexed, all the metadata is stored in the database and queried from it. Files are only accessed by the file server for streaming. Lyrics are scraped the first time they are requested and then cached. Given the changing nature of the shows resource, this is the only one that is not cached; instead is queried every time. At the moment of this writing is using only one source, the BandsInTown API.

    Once the server is running you have all you need to start playing with Shiva. Point to a resource, like /artists, to see it in action.

    Scraping lyrics

    As mentioned, lyrics are scraped, and you can create your own scrapers for specific websites that have the lyrics you want. All you need is to create a python file with a class inheriting from LyricScraper in the /shiva/lyrics directory. The following template makes clear how easy it is. Let’s say we have a file /shiva/lyrics/mylyrics.py:

    From shiva.lyrics import LyricScraper:

    class MyLyricsScraper(LyricScraper):
        “““ Fetches lyrics from mylyrics.com ”””
     
        def fetch(self, artist, title):
            # Magic happens here
     
            if not lyrics:
                return False
     
            self.lyrics = lyrics
            self.source = lyrics_url
     
            return True

    After this you need add your brand new scraper to the scrapers list, in your local.py config file:

    SCRAPERS = {
        ‘lyrics’: (
            ‘mylyrics.MyLyricScraper,
        )
    }

    Shiva will instantiate your scraper and call the fetch() method. If it returns True, it will then proceed to look for the lyrics in the lyrics attribute, and the URL from which they were scraped in the source attribute:

    if scraper.fetch():
        lyrics = Lyrics(text=scraper.lyrics, source=scraper.source,
                        track=track)
        g.db.session.add(lyrics)
        g.db.session.commit()
     
        return lyrics

    Check the existing scrapers for real world examples.

    Lyrics will only be fetched when you request one specific tracks, not when retrieving more than one. The reason behind this is that each track’s lyrics may require two or more requests, and we don’t want to DoS the website when retrieving an artist’s discography. That would not be nice.

    Setting up a file server

    The development server, as its name clearly states, should not be used for production. In fact it is almost impossible because can only serve one request at a time, and the audio element will keep the connection open as long as the file is playing, ergo, blocking completely the API.

    Shiva provides a way to delegate the file serving to a dedicated server. For this you have to edit your /shiva/config/local.py file, and edit the MEDIA_DIRS setting. This option expects a tuple of MediaDir objects, which provide the mechanism to define directories to scan and a socket to serve the files through:

    MediaDir(‘/srv/music’, url=’http://localhost:8080)

    This way doesn’t matter in which socket your application runs, files in the /src/music directory will be served through the URL defined in the url attribute. This object also allows to define subdirectories to be scanned. For example:

    MediaDir(‘/srv/music’, dirs=(‘/pop’, ‘/rock’), url=’http://localhost:8080)

    In this case only the directories /srv/music/pop and /srv/music/rock will be scanned. You can define as many MediaDir objects as you need. Suppose you have the file /srv/music/rock/nofx-dinosaurs_will_die.mp3, once this is in place the track’s download_uri attribute will be:

    {
        "slug": "dinosaurs-will-die",
        "title": "Dinosaurs Will Die",
        "uri": "/track/510",
        "id": 510,
        "stream_uri": "http://localhost:8080/nofx-dinosaurs_will_die.mp3"
    }

    Your own music player

    Once you have your music scanned and the API running, you need a client that consumes those services and plays your music, like Shiva-Client. Built as a single page application with AngularJS and HTML5 technologies, like Audio and Drag and Drop, this client will allow you to browse through your catalog, add files to a playlist and play them.

    Due to the Same-Origin-Policy you will need a server that acts as proxy between the web app and the API. For this you will find a server.py file in the repo that will do this for you. The only dependency for this file is Flask, but I assume you have that installed already. Now just execute it:

    python server.py

    This will run the server on http://localhost:9001/

    Access that URI and check the server output, you will see not only the media needed by the app (like images and javascript files) but also a /api/artists call. That’s the proxy. Any call to /api/ will be redirected by the server to http://localhost:9002/

    If you open a console, like firebug, you will see a Shiva object in the global namespace. Inside of it you will find 2 main attributes, Player and Playlist. Those objects encapsulate all the logic for queueing and playing music. The Player only holds the current track, acts as a wrapper around HTML5’s Audio element. What may not seem natural at first is that normally you won’t interact with the Player, but with the Playlist, which acts as a façade because it knows all the tracks and instructs the Player which track to load and play next.

    The source for those objects is in the js/controllers.js file. There you will also find the AngularJS controllers, which perform the actual calls to the API. It consists of just 2 calls, one to get the list of artists and another one to get the discography for an artist. Check the code, is quite simple.

    So once tracks are added to the playlist, you can do things like play it:

    Shiva.Playlist.play()

    Stop it:

    Shiva.Playlist.stop()

    Or skip the track:

    Shiva.Playlist.next()

    Some performance optimizations were made in order to lower the processing as much as possible. For example, you will see a progress bar when playing music, that will only be updated when it has to be shown. The events will be removed when not needed to avoid any unnecessary DOM manipulation of non-visible elements:

    Shiva.Player.audio.removeEventListener('timeupdate',    Shiva.Player.audio.timeUpdateHandler, false);

    Present and future

    At the time of this writing, Shiva is in a usable state and provides the core functionality but is still young and lacks some important features. That’s also why this article doesn’t dig too much into the code, because it will rapidly change. To know the current status please check the documentation.

    If you want to contribute, there are many ways you can help the project. First of all, fork both the server and the client, play with them and send your improvements back to upstream.

    Request features. If you think “seems nice, but I wouldn’t use it” write down why and send those thoughts and ideally some ideas on how to tackle them. There is a long list of lacking features; your help is vital in order to prioritize them.

    But most importantly; build your own client. You know what you like about your favourite music player and you know what sucks. Fork and modify the existing client or create your own from scratch, bring new ideas to the old world of music players. Give new and creative uses to the API.

    There’s a lot of work to be done, in many different fronts. Some of the plans for the future are:

    • Shiva-jslib: An easy to use javascript library that encapsulates the API calls, so you can focus only on building the GUI and forget about the protocol.
    • Shiva2Shiva communication: Let two (or more) Shiva instances talk to each other to allow for transparent sharing of music between servers.
    • Shiva-FXOS: A Shiva client for Firefox OS.

    And anything else you can think of. Code, send ideas, code your clients.

    Happy hacking!

  2. Simplifying audio in the browser

    The last few years have seen tremendous gains in the capabilities of browsers, as the latest HTML5 standards continue to get implemented. We can now render advanced graphics on the canvas, communicate in real-time with WebSockets, access the local filesystem, create offline apps and more. However, the one area that has lagged behind is audio.

    The HTML5 Audio element is great for a small set of uses (such as playing music), but doesn’t work so well when you need low-latency, precision playback.

    Over the last year, a new audio standard has been developed for the browser, which gives developers direct access to the audio data. Web Audio API allows for high precision and high performing audio playback, as well as many advanced features that just aren’t possible with the HTML5 Audio element. However, support is still limited, and the API is considerably more complex than HTML5 Audio.

    Introducing howler.js

    The most obvious use-case for high-performance audio is games, but most developers have had to settle for HTML5 Audio with a Flash fallback to get browser compatibility. My company, GoldFire Studios, exclusively develops games for the open web, and we set out to find an audio library that offered the kind of audio support a game needs, without relying on antiquated technologies. Unfortunately, there were none to be found, so we wrote our own and open-sourced it: howler.js.

    Howler.js defaults to Web Audio API and uses HTML5 Audio as the fallback. The library greatly simplifies the API and handles all of the tricky bits automatically. This is a simple example to create an audio sprite (like a CSS sprite, but with an audio file) and play one of the sounds:

    var sound = new Howl({
      urls: ['sounds.mp3', 'sounds.ogg'],
      sprite: {
        blast: [0, 2000],
        laser: [3000, 700],
        winner: [5000, 9000]
      }
    });
     
    // shoot the laser!
    sound.play('laser');

    Using feature detection

    At the most basic level, this works through feature detection. The following snippet detects whether or not Web Audio API is available and creates the audio context if it is. Current support for Web Audio API includes Chrome 10+, Safari 6+, and iOS 6+. It is also in the pipeline for Firefox, Opera and most other mobile browsers.

    var ctx = null,
      usingWebAudio = true;
    if (typeof AudioContext !== 'undefined') {
      ctx = new AudioContext();
    } else if (typeof webkitAudioContext !== 'undefined') {
      ctx = new webkitAudioContext();
    } else {
      usingWebAudio = false;
    }

    Audio support for different codecs varies across browsers as well, so we detect which format is best to use from your provided array of sources with the canPlayType method:

    var audioTest = new Audio();
    var codecs = {
      mp3: !!audioTest.canPlayType('audio/mpeg;').replace(/^no$/,''),
      ogg: !!audioTest.canPlayType('audio/ogg; codecs="vorbis"').replace(/^no$/,''),
      wav: !!audioTest.canPlayType('audio/wav; codecs="1"').replace(/^no$/,''),
      m4a: !!(audioTest.canPlayType('audio/x-m4a;') || audioTest.canPlayType('audio/aac;')).replace(/^no$/,''),
      webm: !!audioTest.canPlayType('audio/webm; codecs="vorbis"').replace(/^no$/,'')
    };

    Making it easy

    These two key components of howler.js allows the library to automatically select the best method of playback and source file to load and play. From there, the library abstracts away the two different APIs and turns this (a simplified Web Audio API example without all of the extra fallback support and extra features):

    // create gain node
    var gainNode, bufferSource;
    gainNode = ctx.createGain();
    gainNode.gain.value = volume;
    loadBuffer('sound.wav');
     
    var loadBuffer = function(url) {
      // load the buffer from the URL
      var xhr = new XMLHttpRequest();
      xhr.open('GET', url, true);
      xhr.responseType = 'arraybuffer';
      xhr.onload = function() {
        // decode the buffer into an audio source
        ctx.decodeAudioData(xhr.response, function(buffer) {
          if (buffer) {
            bufferSource = ctx.createBufferSource();
            bufferSource.buffer = buffer;
            bufferSource.connect(gainNode);
            bufferSource.start(0);
          }
        });
      };
      xhr.send();
    };

    (Note: some old deprecated names were createGainNode and noteOn, if you see them in other examples on the web)

    Into this:

    var sound = new Howl({
      urls: ['sound.wav'],
      autoplay: true
    });

    It is important to note that neither Web Audio API nor HTML5 Audio are the perfect solution for everything. As with anything, it is important to select the right tool for the right job. For example, you wouldn’t want to load a large background music file using Web Audio API, as you would have to wait for the entire data source to load before playing. HTML5 Audio is able to play very quickly after the download begins, which is why howler.js also implements an override feature that allows you to mix-and-match the two APIs within your app.

    Audio in the browser is ready

    I often hear that audio in the browser is broken and won’t be useable for anything more than basic audio streaming for quite some time. This couldn’t be further from the truth. The tools are already in today’s modern browsers. High quality audio support is here today, and Web Audio API and HTML5 combine to offer truly plugin-free, cross-browser audio support. Browser audio is no longer a second-class citizen, so let’s all stop treating it like one and keep making apps for the open web.

  3. Defending Opus

    On January 18th, France Telecom filed an IPR disclosure against Opus citing a single patent under non-royalty free terms. This raises a key question – what impact does this have on Opus? A close evaluation indicates that it has no impact on the Opus specification in any way.

    Summary:

    A careful reading of the FT patent reveals that:

    1. The FT patent does not cover the Opus reference implementation because critical limitations of the claim are absent;
    2. The patent is directed to encoders, therefore it cannot affect the Opus specification, which only includes conformance tests for the decoder, and
    3. With a simple change, we can make non-infringement even more obvious.

    Let’s expand on those points a bit. If you don’t want to hear about patent claims, you should stop reading this article now.

    Details:

    IETF IPR disclosures are a safe course of action for patent holders: they prevent unclean hands arguments or implied license grants. However, because the IETF requires specific patent numbers in these disclosures, we can analyze the claims. The patent in question is EP0743634B1, and the corresponding U.S. and other related foreign patents: “Method of adapting the noise masking level in an analysis-by-synthesis speech coder employing a short-term perceptual weighting filter”. It has a single independent claim, Claim 1. All of the other claims are “dependent claims” built on top of Claim 1. If Opus does not infringe Claim 1, it cannot infringe any other claim.

    The FT patent doesn’t cover Opus

    To establish infringement, all of the elements of a claim must be present in an implementation. Key elements of Claim 1 are not present in the Opus reference implementation, including, among others

    • The way the bandwidth expansion coefficients are used. In Claim 1, two parameters γ1 and γ2 are used to shape the quantization noise added by the lossy compression by “minimizing the energy of an error signal resulting from the filtering of the difference between the speech signal and the synthetic signal.” Opus doesn’t do this. Instead, the Opus encoder uses a single parameter BWExp2 to shape the noise, and uses a different parameter BWExp1 to shape the input signal, and also applies an additional gain to the filtered input to match the volume of the original.
    • The optimization criterion. Opus doesn’t compute the “difference between the speech signal and the synthetic signal”. We want to code a signal that differs from the original speech, so we don’t compare what we code to the original speech. This is actually one of the main innovations in Opus: it’s the reason the SILK layer doesn’t need a post-filter like many other codecs do.

    Thus Opus doesn’t perform the steps of the claim and cannot infringe the FT patent by definition. Of course this is not a legal opinion, but it doesn’t take a lawyer to figure this out. While we don’t know why FT disclosed this patent, we welcome the opportunity to evaluate such disclosures and remove any real or perceived encumbrances. This is one of the benefits of the IETF process.

    The FT patent cannot threaten the specification

    The FT patent covers perceptual noise weighting, which is specific to an encoder. The claim is about the “difference between the speech signal and the synthetic signal”, when a decoder — by definition — doesn’t have access to the input speech signal.

    The Opus specification only demands specific behavior from decoders, leaving the encoder largely unspecified. Even if France Telecom were to continue to assert its patent against Opus, there’s no limit to what we could change in the encoder to avoid whatever theory they have. No deployed systems break. There’s no threat to the Opus standard. We can safely say that the FT patent doesn’t encumber Opus for this reason alone.

    We can always make things even safer if needed

    While we don’t believe that the Opus encoder ever infringed on this patent, we quickly realized there is a simple way to make non-infringement obvious even without analyzing complex DSP filters.

    This can be done with a simple change (patch file) to the code in silk/float/noise_shape_analysis_FLP.c (an equivalent change can be made to the fixed-point version).

    Original code:

    strength = FIND_PITCH_WHITE_NOISE_FRACTION * psEncCtrl->predGain;
    BWExp1 = BWExp2 = BANDWIDTH_EXPANSION / ( 1.0f + strength * strength );
    delta  = LOW_RATE_BANDWIDTH_EXPANSION_DELTA
           * ( 1.0f - 0.75f * psEncCtrl->coding_quality );
    BWExp1 -= delta;
    BWExp2 += delta;

    New code:

    BWExp1 = BWExp2 = BANDWIDTH_EXPANSION;
    delta  = LOW_RATE_BANDWIDTH_EXPANSION_DELTA
           * ( 1.0f - 0.75f * psEncCtrl->coding_quality );
    BWExp1 -= delta;
    BWExp2 += delta;

    Yup, that’s all of two lines changed. This makes the filter parameters depend only on the encoder’s bit-rate, which is clearly not, “spectral parameters obtained in the linear prediction analysis step,” as required by Claim 1. Below is the quality comparison between the original encoder and the modified encoder (using PESQ). As you can see, the difference is so small that it’s not worth worrying about.

  4. It’s Opus, it rocks and now it’s an audio codec standard!

    In a great victory for open standards, the Internet Engineering Task Force (IETF) has just standardized Opus as RFC 6716.

    Opus is the first state of the art, free audio codec to be standardized. We think this will help us achieve wider adoption than prior royalty-free codecs like Speex and Vorbis. This spells the beginning of the end for proprietary formats, and we are now working on doing the same thing for video.

    There was both skepticism and outright opposition to this work when it was first proposed in the IETF over 3 years ago. However, the results have shown that we can create a better codec through collaboration, rather than competition between patented technologies. Open standards benefit both open source organizations and proprietary companies, and we have been successful working together to create one. Opus is the result of a collaboration between many organizations, including the IETF, Mozilla, Microsoft (through Skype), Xiph.Org, Octasic, Broadcom, and Google.

    A highly flexible codec

    Unlike previous audio codecs, which have typically focused on a narrow set of applications (either voice or music, in a narrow range of bitrates, for either real-time or storage applications), Opus is highly flexible. It can adaptively switch among:

    • Bitrates from 6 kb/s to 512 kb/s
    • Voice and music
    • Mono and stereo
    • Narrowband (8 kHz) to Fullband (48 kHz)
    • Frame sizes from 2.5 ms to 60 ms

    Most importantly, it can adapt seamlessly within these operating points. Doing all of this with proprietary codecs would require at least six different codecs. Opus replaces all of them, with better quality.
    Illustration of the quality of different codecs
    The specification is available in RFC 6716, which includes the reference implementation. Up-to-date software releases are also available.

    Some audio standards define a normative encoder, which cannot be improved after it is standardized. Others allow for flexibility in the encoder, but release an intentionally hobbled reference implementation to force you to license their proprietary encoders. For Opus, we chose to allow flexibility for future encoders, but we also made the best one we knew how and released that as the reference implementation, so everyone could use it. We will continue to improve it, and keep releasing those improvements as open source.

    Use cases

    Opus is primarily designed for use in interactive applications on the Internet, including voice over IP (VoIP), teleconferencing, in-game chatting, and even live, distributed music performances. The IETF recently decided with “strong consensus” to adopt Opus as a mandatory-to-implement (MTI) codec for WebRTC, an upcoming standard for real-time communication on the web. Despite the focus on low latency, Opus also excels at streaming and storage applications, beating existing high-delay codecs like Vorbis and HE-AAC. It’s great for internet radio, adaptive streaming, game sound effects, and much more.

    Although Opus is just out, it is already supported in many applications, such as Firefox, GStreamer, FFMpeg, foobar2000, K-Lite Codec Pack, and lavfilters, with upcoming support in VLC, rockbox and Mumble.

    For more information, visit the Opus website.

  5. Opus Support for WebRTC

    Opus audio codec logo
    As we announced during the beta cycle, Firefox now supports the new Opus audio format. We expect Opus to be published as RFC 6716 any day now, and we’re starting to see Opus support pop up in more and more places. Momentum is really building.

    What does this mean for the web?

    Keeping the Internet an open platform is part of Mozilla’s mission. When the technology the Web needs doesn’t exist, we will invest the resources to create it, and release it royalty-free, just as we ask of others. Opus is one of these technologies.

    Mozilla employs two of the key authors and developers, and has invested significant legal resources into avoiding known patent thickets. It uses processes and methods that have been long known in the field and which are considered patent-free. As a result, Opus is available on a royalty-free basis and can be deployed by anyone, including other open-source projects. Everyone knows this is an incredibly challenging legal environment to operate in, but we think we’ve succeeded.

    Why Opus is important?

    The Opus support in the <audio> tag we’re shipping today is great. We think it’s as good or better than all the other codecs people use there, particularly in the voice modes, which people have been asking for for a long time. But our goals extend far beyond building a great codec for the <audio> tag.

    Mozilla is heavily involved in the new WebRTC standards to bring real-time communication to the Web. This is the real reason we made Opus, and why its low-delay features are so important. At the recent IETF meeting in Vancouver we achieved “strong consensus” to make Opus Mandatory To Implement (MTI) in WebRTC. Interoperability is even more important here than in the <audio> tag. If two browsers ship without any codecs in common, a website still has the option of encoding their content twice to be compatible with both. But that option isn’t available when the browsers are trying to talk to each other directly. So our success here is a big step in bringing interoperable real-time communication to the Web, using native Web technologies, without plug-ins.

    Illustration of the quality of different codecs

    Opus’s flexibility to scale to both very low bitrates and very high quality, and do all of it with very low delay, were instrumental in achieving this consensus. It would take at least six other codecs to satisfy all the use-cases Opus does. So try out Opus today for your podcasts, music broadcasts, games, and more. But look out for Opus in WebRTC coming soon.

  6. Firefox Beta 15 supports the new Opus audio format

    Firefox 15 (now in the Beta channel) supports the Opus audio format, via the Opus reference implementation.

    What is it?

    Opus is a completely free audio format that was recently approved for publication as a standards-track RFC by the IETF. Opus files can play in Firefox Beta today.

    Opus offers these benefits:

    • Better compression than MP3, Ogg, or AAC formats
    • Good for both music and speech
    • Dynamically adjustable bitrate, audio bandwidth, and coding delay
    • Support for both interactive and pre-recorded applications

    Why Should I care?

    First, Opus is free software, free for everyone, for any purpose. It’s also an IETF standard. Both the encoder and decoder are free, including the fixed-point implementation (for mobile devices). These aren’t toy demos. They’re the best we could make, ready for serious use.

    We think Opus is an incredible new format for web audio. We’re working hard to convince other browsers to adopt it, to break the logjam over a common <audio> format.

    The codec is a collaboration between members of the IETF Internet Wideband Audio Codec working group, including Mozilla, Microsoft, Xiph.Org, Broadcom, Octasic, and others.

    We designed it for high-quality, interactive audio (VoIP, teleconference) and will use it in the upcoming WebRTC standard. Opus is also best-in-class for live streaming and static file playback. In fact, it is the first audio codec to be well-suited for both interactive and non-interactive applications.

    Opus is as good or better than basically all existing lossy audio codecs, when competing against them in their sweet spots, including:

    General audio codecs (high latency, high quality)
    • MP3
    • AAC (all flavors)
    • Vorbis
    Speech codecs (low latency, low quality)
    • G.729
    • AMR-NB
    • AMR-WB (G.722.2)
    • Speex
    • iSAC
    • iLBC
    • G.722.1 (all variants)
    • G.719

    And none of those codecs have the versatility to support all the use cases that Opus does.

    Listening tests show that:

    That’s a lot of bandwidth saved. It’s also much more flexible.

    Opus can stream:

    • narrowband speech at bitrates as low as 6 kbps
    • fullband music at rates of 256 kbps per channel

    At the higher of those rates, it is perceptually lossless. It also scales between these two extremes dynamically, depending on the network bandwidth available.

    Opus compresses speech especially well. Those same test results (slide 19) show that for fullband mono speech, Opus is almost transparent at 32 kbps. For audio books and podcasts, it’s a real win.

    Opus is also great for short files (like game sound effects) and startup latency, because unlike Vorbis, it doesn’t require several kilobytes of codebooks at the start of each file. This makes streaming easier, too, since the server doesn’t have to keep extra data around to send to clients who join mid-stream. Instead, it can send them a tiny, generic header constructed on the fly.

    How do I use it in a web page?

    Opus works with the <audio> element just like any other audio format.

    For example:

     <audio src="ehren-paper_lights-64.opus" controls>

    This code in a web page displays an embedded player like this:


    Paper Lights by Ehren Starks Creative Commons License

     
    (Requires Firefox 15 or later)

    Encoding files

    For now, the best way to create Opus files is to use the opusenc tool. You can get source code, along with Mac and Windows binaries, from:

    http://www.opus-codec.org/downloads/

    While Firefox 15 is the first browser with native Opus support, playback is coming to gstreamer, libavcodec, foobar2000, and other media players.

    Streaming

    Live streaming applications benefit greatly from Opus’s flexibility. You don’t have to decide up front whether you want low bandwidth or high quality, to optimize for voice or music, etc. Streaming servers can adapt the encoding as conditions change—without breaking the stream to the player.

    Pre-encoded files can stream from a normal web server. The popular Icecast streaming media server can relay a single, live Opus stream, generated on the fly, to thousands of connected listeners. Opus is supported by the current development version of Icecast.

    More Information

    To learn more visit opus-codec.org, or join us in #opus on irc.freenode.net.

  7. Interview: Jay Salvat, Audio Dev Derby winner

    Jay SalvatJay Salvat won the Audio Dev Derby with Buzz demo, his wonderful children’s game powered by the open web. Using a JavaScript library that he wrote himself, Jay demonstrated that web audio can be not only useful, but also practical and even engaging.

    Recently, I had the opportunity to learn more about Jay: his work, his history, and his thoughts on the future of web development. In our chat, Jay shared insight and advice that should be useful to all web developers, newcomers and veterans alike.

    How did you become interested in web development?

    I am totally self taught. I come from sales and marketing schools. I quickly realized that I was not done for this life. I tried some stuff, first working for free as designer and then as a layout artist in print press and magazines. At the time internet barely existed.

    With the 1997/8 internet big bang, I naturally passed from print design to web design to work in one of the first local web agencies. The agency was sold to a big international company and I then worked on ergonomics and interface designs for key accounts and managed a team of developers on these interfaces.

    Seeing them work gave me the taste of development, so I starting to develop some personal projects. My skills as marketing guy, designer, developer allowed me to get some interesting results by myself.

    Tell us about developing your Buzz demo. Was anything especially exciting, challenging, or rewarding?

    The idea behind the Buzz library was to allow developers to creatively manage sounds on their websites. My fear was to see Buzz used to add sounds on button clicks or some unbearable music background loops. Everything I hate as a user.

    I wanted to be clear and create a demo to show my vision of how sounds should be used on the web in 2012. This educational HTML5 game is inspired by games used by my 5 year old daughter on iPad.

    What makes the web an exciting platform for you?

    What is interesting is being able to quickly test ideas, share them with the world and see them used, improved, distributed and discussed by others. It’s invaluable to get hundreds of comments worldwide. It taught me a lot.

    What up-and-coming web technologies are you most excited about?

    HTML5/CSS3/JavaScript are really exiting and now make everything possible in a browser. I’m really interested by node.js as well allowing full JavaScript client/server side applications.

    If you could change one thing about the web, what would it be?

    Clearly, cross-browser compatibility (I’m looking at you Internet Explorer). It is very frustrating to work a few weeks on ideas, to finally get the desired result and then move to the testing phase on different browsers to see that everything is skewed or unusable. This is what happened to me on the markitup! 2.0 development, which I have never actually found the energy and time to correct.

    I dream to not worry about vendors prefixes, hacks and ridiculous compatibility barriers.

    What advice would you give to aspiring web developers?

    Be curious, be a sharer. Whenever possible do not hesitate to expose your work as open source projects. This is a great challenge to make your code public and have it judged by peers. It’s exciting and rewarding.

    Further reading

  8. getUserMedia is ready to roll!

    We blogged about some of our WebRTC efforts back in April. Today we have an exciting update for you on that front: getUserMedia has landed on mozilla-central! This means you will be able to use the API on the latest Nightly versions of Firefox, and it will eventually make its way to a release build.

    getUserMedia is a DOM API that allows web pages to obtain video and audio input, for instance, from a webcam or microphone. We hope this will open the possibility of building a whole new class of web pages and applications. This DOM API is one component of the WebRTC project, which also includes APIs for peer-to-peer communication channels that will enable exchange of video steams, audio streams and arbitrary data.

    We’re still working on the PeerConnection API, but getUserMedia is a great first step in the progression towards full WebRTC support in Firefox! We’ve certainly come a long way since the first image from a webcam appeared on a web page via a DOM API. (Not to mention audio recording support in Jetpack before that.)

    We’ve implemented a prefixed version of the “Media Capture and Streams” standard being developed at the W3C. Not all portions of the specification have been implemented yet; most notably, we do not support the Constraints API (which allows the caller to request certain types of audio and video based on various parameters).

    We have also implemented a Mozilla specific extension to the API: the first argument to mozGetUserMedia is a dictionary that will also accept the property {picture: true} in addition to {video: true} or {audio: true}. The picture API is an experiment to see if there is interest in a dedicated mechanism to obtain a single picture from the user’s camera, without having to set up a video stream. This could be useful in a profile picture upload page, or a photo sharing application, for example.

    Without further ado, let’s start with a simple example! Make sure to create a pref named “media.navigator.enabled” and set it to true via about:config first. We’ve put the pref in place because we haven’t implemented a permissions model or any UI for prompting the user to authorize access to the camera or microphone. This release of the API is aimed at developers, and we’ll enable the pref by default after we have a permission model and UI that we’re happy with.

    There’s also a demo page where you can test the audio, video and picture capabilities of the API. Give it a whirl, and let us know what you think! We’re especially interested in feedback from the web developer community about the API and whether it will meet your use cases. You can leave comments on this post, or on the dev-media mailing list or newsgroup.

    We encourage you to get involved with the project – there’s a lot of information about our ongoing efforts on the project wiki page. Posting on the mailing list with your questions, comments and suggestions is great way to get started. We also hang out on the #media IRC channel, feel free to drop in for an informal chat.

    Happy hacking!

  9. HTML5 audio and audio sprites – this should be simple

    As we’re having a HTML5 Audio developer derby this month, I thought it fun to play with audio again. And I found it sadly enough pretty frustrating.

    One thing I proposed in a lot of talks is using the idea of CSS sprites and apply them to HTML5 audio. You’ll get the same benefits – loading one file in one HTTP request instead of many, avoiding failure as files might not get loaded and so on.

    To test this out I wrote the following small demo using the awesome Music Non Stop by Kraftwerk.

    Clicking the different buttons should play the part of the music file and nothing more. This works fine in Firefox, Chrome and Opera on my computer here. Safari, however, fails to preload the audio and the setting of the current time is off. The code is simple enough that this should work:

    <div id="buttons"></div>
    <audio preload controls>
      <source src="boing-boomchack-peng.mp3" type="audio/mp3"></source>
      <source src="boing-boomchack-peng.ogg" type="audio/ogg"></source>
    </audio>
    // get the audio element and the buttons container
    // define a sprite object with the names and the start and end times 
    // of the different sounds.
    var a = document.querySelector('audio'),
        buttoncontainer = document.querySelector('#buttons'),
        audiosprite = {
          'all': [ 0, 5 ],
          'boing': [ 0, 1.3 ],
          'boomtchack': [ 2, 2.5 ],
          'peng': [ 4, 5 ]
        },
        end = 0;
     
    // when the audio data is loaded, create the buttons 
    // this way non-HTML5 browsers don't get any buttons 
    a.addEventListener('loadeddata', function(ev) {
      for (var i in audiosprite) {
        buttoncontainer.innerHTML += '<button onclick="play(\'' +
                                      i + '\')">' + i + '</button>';
      }
    }, false);
     
    // If the time of the file playing is updated, compare it 
    // to the current end time and stop playing when this one 
    // is reached
    a.addEventListener('timeupdate', function(ev) {
      if (a.currentTime > end) {
        a.pause();
      }
    },false);
     
    // Play the current audio sprite by setting the currentTime
    function play(sound) {
      if ( audiosprite[sound] ) {
        a.currentTime = audiosprite[sound][0];
        end = audiosprite[sound][1];
        a.play();
      }
    }

    Now, this is nothing new, Remy Sharp wrote about audio sprites in 2010 and lamented especially the buggy support in iOS (audio won’t load at all until you activate it with a touch – something that sounds horribly like the “click to active” Flash has on IE).

    Other issues are looping and latency of HTML5 audio. As reported by Robert O’Callahan there is a work-around by cloning the audio element before playing it (with an incredibly annoying test) and this fix has been used in the Gladius HTML5 game engine.

    All in all it seems HTML5 audio still needs a lot of work which is why a lot of Games released lately under the banner of HTML5 use Flash audio or no audio at all. This is sad and needs fixing.

    Interestingly enough there are some great projects that you could be part of. Are we playing yet? by Soundcloud and others for example is a test suite for audio support in browsers. You can write own tests on GitHub and report results to the browser makers.

    The jPlayer team has a great HTML5 Media Event Inspector showing just how many of the HTML5 media events are supported in your current browser.

    If you want to be safe, you can use SoundManager 2 by Scott Schiller to have an API that uses HTML5 when possible and falls back to Flash when the browser doesn’t have any support. It also fixes a few issues for you.

    Speaking of Scott Schiller, he continually gives good insight on the state of audio. There is a 51 minute video of his article on 24 ways “Probably, Maybe, No: The State of HTML5 Audio“.

    A shorter and more recent talk on the same subject is also available:

    All in all it would be interesting to hear what you think of the state of HTML5 audio:

    • Did the companies that heralded HTML5 as the end of plugins drop the ball?
    • Is it really sensible to have an API that returns probably or maybe or ” when you ask it if the browser can play a certain type of media?
    • What could be done to work around these issues?

    Let’s re-ignite the discussion on HTML5 audio, after all we need it for the future of messaging in the browser and telephony, too.

    Oh and another thing. Of course there is the Audio Data API of Firefox and the web audio proposal from Webkit available but getting those running in mobile devices will be a much bigger change. If you want to know more about those and libraries to work around their differences, there is a great overview post available on Happyworm.

  10. Making the Dino roar – syncing audio and CSS transitions

    It started with Brian King setting up our Google+ page using this round MDN logo by John Slater. I thought this looks cool and reminded me of the famous MGM intro so I wondered if I could turn it into an intro for our video tutorials (not sure if we will do that though). And, some photoshop and sound work later and with a sprinkle of HTML5 audio and CSS transitions, here we are (source on GitHub):

    I started with the sound. If you need Creative Commons licensed sounds, Freesound is a good resource. So I took Chinese Fanfare by Nick-Nack and Roar by CGEffex and put them together in Audacity.

    Saving them as OGG and MP3 gave me an audio element that I could tie into. All I needed was to listen to the timeupdate event and compare the currentTime to trigger the animations. The animations (rotation of the dino and opening and closing of the jaw) are CSS transitions triggered by classes on the parent element. The main trick was to store both the dino and the jaw inside a div and transition them separately. The jaw animation also needed a change in transformation origin as we don’t rotate the image around its center.

    If you got seven minutes to spare, here is a blow-by-blow screencast explaining what is going on: