This is a re-post (with permission) of a post that Greg Maxwell wrote in response to a comment by Chris DiBona from Google on a whatwg mailing list. The codecs being discussed are the same ones we’ll be including in Firefox 3.5 and are also the same codecs that Mozilla, Wikipedia and others have been investing in.
Recent developer nightlies of Google Chrome support these codecs and a future version of Opera will also support them. Theora and Vorbis also work in Safari if you install the Xiph Qt component. We’re quickly reaching the point where all modern browsers support these open codecs with full support for the video tag.
You’ll note that Greg’s post doesn’t have the tone of a marketing document – it’s not meant to. Nor is this a comparison against HD-sized, high-bitrate video. Instead it’s an attempt to give an honest comparison of how the open codecs fare against commonly-used formats and sizes used on the world’s largest video site. I think you’ll agree with Greg’s conclusions at the bottom of the document, especially with audio where Vorbis really shines.
Greg’s post follows.
Purpose
On Jun 13th 2009 Chris DiBona of Google made a remarkable claim on the WhatWG mailing list:
“If [youtube] were to switch to theora and maintain even a semblance of the current youtube quality it would take up most available bandwidth across the Internet.”
Unfortunately, open video formats have been subjected to FUD so frequently that people are willing to believe bold claims like these without demanding substantiation.
In this comparison I will demonstrate that this claim was unfair and unreasonable. Using a simple test case I show that Theora is competitive and even superior to some of the files that Google is distributing today on YouTube.
Theora isn’t the most efficient video codec available right now. But it is by no means bad, and it is substantially better than many other widely used options. By conventional criteria Theora is competitive. It also has the substantial advantage of being unencumbered, reasonable in computational complexity, and entirely open source. People are often confused by the correct observation that Theora doesn’t provide the state of the art in bitrate vs quality, and take that to mean that Theora does poorly when in reality it does quite well. Also, the Theora encoder has improved a lot lately so some older problems no longer apply.
While different files may produce different results, the allegation made on WhatWG was so expansive that I believe a simple comparison can reliably demonstrate its falsehood.
I do not believe Chris intended to deceive anyone, only that he is a victim of the same outdated and/or simply inaccurate information that has fooled many others. Automotive enthusiasts may make a big deal about a 5 horsepower difference between two cars, but these kinds of raw performance differences are not relevant to most car buyers nor are they even the most important criteria to people who race. Likewise, videophiles nitpick the quality of compression formats and this nitpicking is important for the advancement of the art. But I believe that people are mistaking these kinds of small differences for something which is relevant to their own codec selection.
Results
A 499kbit/sec H.264+AAC output and a 327kbit/sec H.263(Sorensen Spark)+MP3 output were available via the download service. The YouTube-encoded files are available on the YouTube site. Because the files on YouTube may change and the web player does not disclose the underlying bitrate, I have made the two encoded files available.
~499kbit/sec comparison
YouTube
Download (H.264+AAC; 17MiB)
Ogg/Theora+Vorbis
Download / Watch (Ogg/Theora+Vorbis; 17MiB)
~327kbit/sec comparison
YouTube
Download (H.263+MP3; 12MiB)
Ogg/Theora+Vorbis
Download / Watch (Ogg/Theora+Vorbis; 12MiB)
A slightly lower bitrate was used for the Theora+Vorbis test cases to avoid any question of quality improvement resulting from larger outputs.
For a fair comparison you must compare the audio as well. Even without audio differences, still image comparisons are a poor proxy for video quality.
I provided this random frame still image comparison only because I expect that people will not bother watching the examples without evidence that the results are interesting.
Methodology
In order to avoid any possible bias in the selection of H.264 encoders and encoding options, and to maximize the relevance for this particular issue, I’ve used YouTube itself as the H.264 encoder. This is less than ideal because YouTube does not accept lossless input, but it does accept arbitrarily high bitrate inputs.
I utilized the Blender Foundation’s Big Buck Bunny as my test case because of its clear licensing status, because it’s a real world test case, and because I have it available in a lossless format. I am not aware of any reason why this particular clip would favor either Theora or H.264.
I chose to use a test case with a soundtrack because most real usage has sound. No one implements HTML5 video without audio, and no one is implementing either of Theora or Vorbis without the other. Vorbis’s state-of-the-art performance is a contributor to the overall Ogg/Theora+Vorbis solution.
- Obtain the lossless 640×360 Big Buck Bunny source PNGs and FLACs from media.xiph.org.
- Resample the images to 480×270 using ImageMagick’s convert utility.
- Use gstreamer’s jpegenc, produce a quality=100 mjpeg + PCM audio stream. The result is around 1.5Gbytes with a bitrate of around 20Mbit/sec.
- Truncate the file to fit under the YouTube 1Gbyte limit, resulting in input_mjpeg.avi (706MiB).
- Upload the file to YouTube and wait for it to transcode.
- Download the FLV and H.264 files produced by YouTube using one of the many web downloading services. (I used keepvid)
- Using libtheora 1.1a2 and Vorbis aoTuv 5.7 produce a file of comparable bitrate to the youtube 499kbit/sec from the same file uploaded to YouTube (input_mjpeg.avi).
- Resample the file uploaded to YouTube to 400×226.
- Using libtheora 1.1a2 and Vorbis aoTuv 5.7 produce a file of comparable bitrate to the youtube 327kbit/sec from the 400×226 downsampled copy of input_mjpeg.avi.
I later discovered that YouTube sometimes offers additional sizes. I tried the youtube-dl utility and it indicated that these other sizes were not available for my file. Otherwise I would have also included them in this comparison.
A keyframe interval of 250 frames was used for the Theora encoding. The theora 1.1a2 encoder software used is available from theora.org. The Vorbis encoder used is available from the aoTuV website. No software modifications were performed.
My conclusions
It can be difficult to compare video at low bitrates, and even YouTube’s higher bitrate option is not high enough to achieve good quality. The primary challenge is that all files at these rates will have problems, so the reviewer is often forced to decide which of two entirely distinct flaws is worse. Sometimes people come to different conclusions.
That said, I believe that the Theora+Vorbis results are substantially better than the YouTube 327kbit/sec. Several other people have expressed the same view to me, and I expect you’ll also reach the same conclusion. This is unsurprising since we’ve been telling people that Theora is better than H.263, especially at lower bitrates, for some time now and YouTube only uses a subset of H.263.
The low bitrate case is also helped by Vorbis’ considerable superiority over MP3. For example, the crickets at the beginning are inaudible in the low rate YouTube clip but sound fine in the Ogg/Theora+Vorbis version.
In the case of the 499kbit/sec H.264 I believe that under careful comparison many people would prefer the H.264 video. However, the difference is not especially great. I expect that most casual users would be unlikely to express a preference or complain about quality if one was substituted for another and I’ve had several people perform a casual comparison of the files and express indifference. Since Theora+Vorbis is providing such comparable results, I think I can confidently state that reports of the internet’s impending demise are greatly exaggerated.
Of course, YouTube may be using an inferior processing chain, or encoding options which trade off quality for some other desirable characteristic (like better seeking granularity, encoding speed, or a specific rate control pattern). But even if they are, we can conclude that adopting an an open unencumbered format in addition to or instead of their current offerings would not cause problems on the basis of quality or bitrate.
But please— see and hear for yourself.
About Christopher Blizzard
Making the web better, one release at a time.
27 comments