Less than a year after the release of Theora 1.0, the wonderful people at Xiph have released Theora 1.1. The 1.1 release is a software-only release of the Theora encoder and decoder. It does not include any changes to the Theora format. Existing Theora videos should continue to play with the new decoder and the new Theora encoder generates bitstreams that will work in existing players that can play Theora content.
The 1.1 release is largely an improvement to the Theora encoder. This post will attempt to give people a high-level overview of the changes and what they mean to web developers and people who are thinking of deploying Theora to support HTML5 video. Theora is an important technology to web developers – it’s the only competitive codec that currently complies with the W3C patent policy.
Here’s a quick list of important things that have changed in this release. We’ll go into more detail on each of these items.
- Video quality between Theora 1.0 and Theora 1.1 has been improved.
- Rate control for live streaming now works well.
- A two-pass mode has been added to the encoder that can create rate controlled videos with very predictable bandwidth requirements.
- CPU usage during encoding is much more consistent.
- Decoder performance has been improved.
Video quality between Theora 1.0 and Theora 1.1 has been improved.
One issue that people had with the Theora 1.0 encoder was that it produced video that appeared fuzzy. The 1.1 improvements are clear in these two images provided by Monty, one of the Xiph Developers. Open each of these images in new tabs and flip between them. You can really see the difference.
This was also very visible at the edges of text. Here’s an example taken from one of our Firefox 3.5 promotional videos. The first is with the 1.0 encoder (9.0MB) and the second is with the 1.1 encoder (8.2MB). You will notice that not only are the edges more defined but there’s a lot less noise in the area around the edges of the text. Once again, if you open them in tabs and flip between them you can see the difference.
Note that the original video is nearly 17MB. That was done largely to get the text crisp. With these changes we can likely use a much lower-bandwidth version of the video, probably as small as 9.9MB. That’s a pretty big difference.
Note that we’re talking about an improvement of quality at the same video bitrate. This means that we’re either able to produce higher quality videos at the same file size or we’re able to reduce the file size and keep the same quality – either way it’s a big win.
Rate control for live streaming now works well.
Before describing this change, something important must be described. This is the difference between videos encoded with a variable bitrate (VBR) and a constant bitrate (CBR).
In variable bitrate encoding the amount of data that’s required to represent the difference between two frames in a video is allowed to grow. This happens most often when shifting from a scene where there isn’t much movement to a scene where there’s a lot of motion. You could easily go from requiring 40Kb/sec to 400Kb/sec because the entire background moves.
In constant bitrate encoding the amount of data that you’re allowed to use to represent a change from one frame to the next is pinned at some maximum value. If you’ve got a low maximum value and there’s a set of frames that requires a lot of bits to represent the changes from one to the next you will need to sacrifice something in order to stay inside of that maximum value. Very often it’s some amount of video quality or the encoder will start dropping frames in order to keep under the watermark.
This leads to a pretty simple rule: If you want the highest quality video possible, you should be using variable rate encoding. This means that when you’re encoding a video you should be using quality settings (0-99, low/medium/high, 1-10) instead of picking bitrates (60Kb/sec, 200Kb/sec.) For most use cases on the web VBR-encoded videos actually work very well because users are allowed to buffer quite a bit of video out ahead of their current position so these bursts of data don’t affect the user’s experience.
But there are some use cases where having a constant bitrate is very important. These include:
- Live, low delay streaming over HTTP with a lot of clients.
- Streaming large files where a large read-ahead buffer is not desired.
- Situations where large bursts of data result in large bursts in CPU to handle them.
For live, low-delay streaming over HTTP it’s important to realize what happens when there’s a sudden burst of data to handle. HTTP runs over TCP. In TCP it takes a while for a connection to increase its bandwidth. (And by “a while” I mean “not that long” but it’s long enough to affect the low latency connection that we want for this use case. This is why many low-latency applications don’t use TCP. But we’re talking about delivering video over HTTP.) If you’ve got a big burst of data and the TCP window takes a long time to open up you start building up a big send buffer on the server. (And remember in this use case you’ve got a lot of clients connected!) That requires a lot of memory to hold the send buffers for each client. What happens then is that servers will start closing connections en masse because it needs to save memory or because it thinks that the client has become somehow unreachable. This is made worse by the fact that even if the connection scales up and then scales back down it re-settles at the low rate and the process has to be repeated. The user’s experience is that the video stream stops and restarts or just stops working altogether when the server hangs up. The solution? Using a constant rate that doesn’t require the TCP window to open up suddenly and doesn’t require large send buffers for each client.
For the use case where you’re streaming large files it might not be reasonable for the client to cache a lot of data. You also might be serving up a lot of data to a lot of clients and you might want to avoid the large send buffer problem as well, just for different reasons.
And for the last use case where you’re in a CPU-constrained environment the bursting nature of variable bitrate videos means it often takes a large bursts of CPU to handle those bursts. While CPUs do burst up faster than TCP does, you might be talking to constrained processors (think mobile) or you might be serving up files near HD-sized content, which CPUs often struggle to decode.
In any case there are a number of use cases for constant bitrate encoding. Back to the question of what’s improved in Theora 1.1.
In Theora 1.0 the rate controlled encoding mode was very very broken. This resulted in two things:
- People trying to do live streaming ran into problems.
- People who used rate controlled settings to compare overall Theora quality to the quality of other encoders saw worse results than the format actually represented.
The first issue is clear – it was broken, it should be fixed. And it has been. The new encoder does a pretty good job of maintaining bitrates, changes quality on the fly, drops frames and even includes a “soft-target” mode so that bitrates can fluctuate a little bit to maintain quality while occasionally breaking the bandwidth rules.
The encoder also has a wonderful new piece of functionality that people will find very useful. It’s now possible to specify a maximum rate ceiling for video encoding while also specifying a minimum quality floor. What this means is that the encoder will try and maintain very crisp video frames within rate constraints. This means that it will aggressively drop frames instead of creating frame deltas that are fuzzy or low-quality. While this might sound like a poor trade-off it’s actually very useful. If you’re showing a live video of a presentation you usually want a crisp video of the slides and having a lower frame update rate is very acceptable.
The second issue that was caused by the bad rate control in Theora 1.0 is an issue of marketing. People would often use the encoder with the fixed bitrate mode instead of the quality mode and dismiss the results as a reflection of the format instead of problems with the encoder. We hope that people find better results with the new encoder.
A two-pass mode has been added to the encoder that can create rate controlled videos with very predictable bandwidth requirements.
In addition to fixing the single pass rate controlled encoder in 1.1 a two-pass encoding option has been added. This means that if you are transcoding a file (as opposed to doing a live stream) you can create a very consistent bitrate in a file if you want. This is because the encoder can look ahead in the stream to allocate bits efficiently. Monty from Xiph made a graph that shows one example of the bitrate in a file with one pass and two pass.
Above: graph of quantizer choice (which roughly corresponds to the encoding quality) when using default two-pass bitrate management as opposed to one-pass (with –soft-target) when encoding [the Matrix movie clip] at 300kbps. Both encodes resulted in the same bitrate. The quality of a one-pass encode varies considerably as the encoder has no way to plan ahead.
CPU usage during encoding is much more consistent.
People who were doing live streaming often saw huge spikes in CPU usage during high-motion events. This has been fixed and now CPU usage is much more consistent during single pass rate constrained encoding making it much easier to live stream video.
Decoder performance has been improved.
And last but not least the decoder has been made faster during the 1.1 release. How much faster depends quite a bit on the clip, but people are reporting that the new encoder is anywhere from 1.5-2x faster than the 1.0 of release of libtheora.
Coming soon to a product near you.
This release is a library release. It’s not a product in itself, but is instead something that other products include. So over the next days and weeks we’ll see other products pick up and start using this as part of their releases.