A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

This is an update on the LPCNet project, an efficient neural speech synthesizer from Mozilla’s Emerging Technologies group. In an an earlier demo from late last year, we showed how LPCNet combines signal processing and deep learning to improve the efficiency of neural speech synthesis.

This time, we turn LPCNet into a very low-bitrate neural speech codec that’s actually usable on current hardware and even on phones (as described in this paper). It’s the first time a neural vocoder is able to run in real-time using just one CPU core on a phone (as opposed to a high-end GPU)! The resulting bitrate — just 1.6 kb/s — is about 10 times less than what wideband codecs typically use. The quality is much better than existing very low bitrate vocoders. In fact, it’s comparable to that of more traditional codecs using higher bitrates.

LPCNet sample player

Screenshot of a demo player that demonstrates the quality of LPCNet-coded speech

This new codec can be used to improve voice quality in countries with poor network connectivity. It can also be used as redundancy to improve robustness to packet loss for everyone. In storage applications, it can compress an hour-long podcast to just 720 kB (so you’ll still have room left on your floppy disk). With some further work, the technology behind LPCNet could help improve existing codecs at very low bitrates.

Learn more about our ongoing work and check out the playable demo in this article.

About Jean-Marc Valin

Jean-Marc Valin has a B.S., M.S., and PhD in Electrical Engineering from the University of Sherbrooke. He is the primary author of the Speex codec and one of the main authors of the Opus codec. His expertise includes speech and audio coding, speech recognition, echo cancellation, and other audio-related topics. He is currently employed by Mozilla to work on next-generation multimedia codecs.

More articles by Jean-Marc Valin…