Mozilla

speak.js: Text-to-Speech on the Web

Text-to-Speech (TTS) can make content more accessible, but there is so far no simple and universal way to do that on the web. One possible approach is shown in this demo, which is powered by speak.js, a new 100% pure JavaScript/HTML5 TTS implementation. speak.js is a port of eSpeak, an open source speech synthesizer, from C++ to JavaScript using Emscripten.

Compiling an existing speech synthesis engine to JavaScript is a good way to avoid writing a complicated project like eSpeak from scratch. Once compiled, the eSpeak code in speak.js doesn’t know it’s running on the web: speak.js uses the Emscripten emulated filesystem to ‘fake’ the normal file reading and writing calls that the eSpeak C++ code has (fopen, fread, etc.). This allows the normal eSpeak datafiles to be used (either through an xhr, or by converting them to JSON and bundling them with the script file). The result of running the compiled eSpeak code is that it ‘writes’ a .wav file with the generated audio to the emulated filesystem. speak.js then takes that data, encodes it using base64, and creates a data URL. That URL is then loaded in an HTML5 audio element, letting the browser handle playback. (Note that while that is a very simple way to do things, it isn’t the most efficient. speak.js has not yet focused on speed, but with some additional work it could be much faster, if that turns out to be an issue.)

Why would you want TTS in JavaScript? Well, with speak.js you can bundle a single .js file in your website, and then generating speech is about as simple as writing

speak("hello world")

(see the speak.js website for instructions). The generated speech will be exactly the same on all platforms, unlike if your users each did TTS in their own way (using an OS capability, or a separate program). speak.js can also be used to build browser addons in a straightforward way, since it’s pure JavaScript – no need for platform dependent binaries, and the addon will work the same on all OSes.

A few more comments:

  • JavaScript is getting more and more capable all the time. The development versions of the top JavaScript engines today can run code compiled from C++ only 3-5X slower than a fast C++ compiler, and getting even better. As a consequence, expanding the capabilities of the web platform can in many cases be done in JavaScript or by compiling to JavaScript, instead of adding new code to the browsers themselves, which inevitably takes longer – especially if you wait for all browsers to implement a particular feature.
  • While speak.js uses only standards-based APIs, due to browser limitations it can’t work everywhere yet. It won’t work in IE, Safari or Opera since they don’t support typed arrays, nor in Chrome since it doesn’t support WAV data URLs. So currently speak.js only works properly in Firefox. However, the missing features just mentioned are not huge and hopefully those browser makers will implement them soon. It is also possible to implement workarounds in speak.js for these issues (see next comment).
  • Help with improving speak.js is very welcome! One important thing we need is to implement workarounds for the issues that prevent speak.js from running on the browsers it currently can’t run on. Another goal is to build browser addons using speak.js.¬†Please get in touch on github if you want to help out.
  • eSpeak supports multiple languages so speak.js can too. You do need to include the additional language files though. Here is an experimental build where you can switch between English and French support (note that it is an unoptimized build, so it will run slower).

Posted by on at

14 comments

Comments are now closed.

  1. Muhammad Tarmizi bin Kamaruddin wrote on August 17th, 2011 at 11:45:

    I really hope that speak.js use MIT license and not GPL.

  2. azakai wrote on August 17th, 2011 at 11:58:

    I would prefer MIT myself, however eSpeak is GPL licensed. But I hope that isn’t a problem here. speak.js is built in a way that makes it very obvious that anything using speak.js is not a derivative work of speak.js or eSpeak – using speak.js just means writing |speak(“text”)|. That’s the entire API. So I strongly believe that using speak.js doesn’t mean your code needs to be GPL as well.

    To comply with the GPL in this case, all you should need is a link to the source code, which is on github, https://github.com/kripken/speak.js (unless you modified speak.js itself).

  3. Prestaul wrote on August 17th, 2011 at 12:02:

    Works great in Chrome 14…

    1. azakai wrote on August 17th, 2011 at 12:10:

      Thanks, good to know it’s fixed in the Chrome 14 preview version.

  4. abral wrote on August 17th, 2011 at 12:54:

    This is just wonderful!

  5. Eric Jung wrote on August 17th, 2011 at 14:22:

    I’d be pleased to write an addon which wraps this. Some ideas: speak RSS feeds, speak the current web page, speak the current web page but only elements marked with a “speak” attribute, etc.

    But after playing with the demo for a while, I just don’t see this being anything but an interesting novelty. Why? Because the speech output suffers from lack of clarity. After pasting contents of various news articles into the demo, and playing with some of the variables, I found myself really struggling to understand the speech.

    Contrast that to AT&T Natural Voices:
    http://www2.research.att.com/~ttsweb/tts/demo.php
    and you’ll quickly see what I mean.

    Is there any way to make it more clear?

    Eric Jung
    Author of FoxyProxy and other addons

    1. azakai wrote on August 17th, 2011 at 14:48:

      Well, to be honest I know very little about speech synthesis :) I just compiled eSpeak to JS and hacked it so it worked on the web.

      It’s possible that that AT&T project is simply better than eSpeak. But it is also possible that we can improve eSpeak’s output by tweaking the parameters, using different voice/dictionary datafiles, etc. I didn’t try any of that yet.

      I filed issue 2 on speak.js to track this, hopefully we can get make this much better.

      Edit: Forgot to say, thanks for offering to help!

      1. Eric Jung wrote on August 17th, 2011 at 14:51:

        ok, well if you do improve, email me and i’ll write a killer addon for it.

    2. Paul wrote on August 17th, 2011 at 17:26:

      You should definitely write an add-on for this. I want this addon!

  6. skierpage wrote on August 17th, 2011 at 17:35:

    The demo phrase sounds pretty good, but it messes up Rick Astley’s immortal “Never gonna tell a lie and hurt you”: the speaker doesn’t say anything for “hurt”. I’ve got an OLPC laptop that I think also uses espeak and that makes a vague ‘t’ sound for hurt/curt/yurt.

    emscripten is incredible!

  7. anfemfjs wrote on August 18th, 2011 at 03:58:

    This might me caused by emscripten or something in the browser: Espeak 1.45.04-1 from Debian unstable produces http://dl.dropbox.com/u/96013/Never%20gonna%20tell%20a%20lie%20and%20hurt%20you.wav which sounds much better.

    1. Chris wrote on September 17th, 2012 at 15:10:

      Rick roll’d?

  8. Gerardo Capiel wrote on August 27th, 2011 at 09:03:

    Chrome web app developers should look at:

    http://code.google.com/chrome/extensions/trunk/tts.html

    It’s in the beta build now and should be in the stable build soon. Unfortunately, you have a make a packaged web app or extension. One nice feature is word level call backs for doing things like word highlighting for dyslexic users. Check out a demo I built at:

    https://github.com/gcapiel/ChromeWebAppBookshareReader

    You can install web app on Chrome Beta by clicking on the .crx file in the downloads section.

  9. Gerardo Capiel wrote on August 27th, 2011 at 09:05:

    I meant to add that I’d love to see speak.js extended to support word level callbacks like the Google TTS API.

Comments are closed for this article.