Into the Depths: The Technical Details Behind AV1

AV1, the next generation royalty-free video codec from the Alliance for Open Media, is making waves in the broadcasting industry.

Since AOMedia officially cemented the AV1 v1.0.0 specification earlier this year, we’ve seen increasing interest from the broadcasting industry. Starting with the NAB Show (National Association of Broadcasters) in Las Vegas earlier this year, and gaining momentum through IBC (International Broadcasting Convention) in Amsterdam, and more recently the NAB East Show in New York, AV1 keeps picking up steam. Each of these industry events attract over 100,000 media professionals. Mozilla attended these shows to demonstrate AV1 playback in Firefox, and showed that AV1 is well on its way to being broadly adopted in web browsers.

Continuing to advocate for AV1 in the broadcast space, Nathan Egge from Mozilla dives into the depths of AV1 at the Mile High Video Workshop in Denver, sponsored by Comcast.

AV1 leapfrogs the performance of VP9 and HEVC, making it a next-generation codec. The AV1 format is and will always be royalty-free with a permissive FOSS license.

Nathan Egge is a Senior Research Engineer at Mozilla and a member of the non-profit Xiph.Org Foundation. Nathan works on video compression research with the goal of producing best-in-class, royalty-free open standards for media on the Internet. He is a co-author of the AV1 video format from the Alliance for Open Media and contributed to the Daala project before that.

6 comments

Olivier de B.

In terms of compression, AV1 sounds good, but I get quite worried when, concerning encoding performance, I read: “..in our AV1 First Look, we encoded with cpu-used=0, and AV1 encoding time was about 1,000 times longer than VP9.” http://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=127956&PageNum=2
Is there a recent performance comparison available between AV1, HEVC, H.264 and VP9? Where does AV1 stand today?

November 8th, 2018 at 08:38
1. Nathan Egge
  
  That is a good question Olivier, and one of the topics I wanted to cover at the Mile High Video workshop. My presentation, https://xiph.org/~negge/MHV2018.pdf, even included figures at the end to address this, but I ran out of time.
  
  Slides 40 and 41 show the performance of libaom over its development history (using July 15, 2016 as an anchor) both in compression and complexity. As you can see, our focus had been on adding features that improve compression at the cost of increasing the complexity. Once the bitstream was frozen, we shifted to reducing the complexity with a minimal impact to compression. This is even more pronounced in the updated version of these two slides, https://xiph.org/~negge/AV1perf.pdf. Note that the bulk of this work is being done at –cpu-used=1 (to keep –cpu-used=0 as ~~upper bound~~ comparison point for compression) and is what is shown in these graphs starting July 15, 2018.
  
  November 8th, 2018 at 09:30
Olivier

Thanks for sharing the slides, Nathan. Looks really complex! I wish there was a simple benchmark that would be used so everyone can understand the comparisons, e.g. how long does it take to encode a 4k video using the different video codecs? The other benchmark we can all understand is how do the file sizes compare for each codec when encoding an original 1 hour 4k video. I think it would be most helpful to include these benchmarks for the larger public. Thanks again for your contribution.

November 8th, 2018 at 12:51
1. Nathan Egge
  
  Indeed, you are right. Testing video and image coding is complicated, and probably a good topic for another talk.
  
  Let me attempt to answer your very reasonable question of why there isn’t “a simple benchmark that would be used so everyone can understand the comparisons”. The short answer is that nobody can agree on what that one benchmark would be, and for good reason.
  
  You mention “file sizes”, i.e. rate, but the implicit assumption is “rate at the same quality”. How do we measure quality? There are at least a dozen different objective (deterministic) quality metrics, but even the experts agree that they are often biased, disagree with each other, and don’t correlate well with human perception. For example a favorite objective metric, mean squared error a.k.a. PSNR, is generally useful except that you can “improve” PSNR just by blurring an image. Then there is the question of how you come to a single objective “quality” value for an individual test clip. The way you combine objective metrics matters, e.g., based on: region of interest (across the frame), human perception (across color planes), coding unit (across frame types), average stream size (across rate controlled scenes), etc.
  
  You mention “how long … to encode”, i.e., complexity, and here the issue is that modern video codecs are designed to be used at a wide range of operating points. The algorithms used for video-on-demand (high latency), live streaming (some latency) and interactive video conferencing (low latency) are often different and so what you really want to compare is “quality for a given rate and complexity budget”. You can do this, but it takes time and is only fair if you really ensure it is an apples-to-apples comparison, i.e., you need to define the test conditions well, but also ensure that the encoders are configured properly.
  
  This points to the other big problem which is that you wanted to compare video formats, e.g., “AV1, HEVC, H.264 and VP9”, but what you are really comparing are encoders, e.g., libaom, HM, OpenH264 and libvpx. These are a mix of reference (verification model) and production (enterprise quality) software and have different levels of maturity, usability and feature completeness. Add to it the implicit biases and level of effort it takes to configure software you are familiar with and that of the alternative, and you can see why it is hard to get consistent evaluations of video and image formats.
  
  November 8th, 2018 at 15:00
DD

Thank you for this explanation. I found it interesting, even though I am not a programmer or working heavily with A/V technology.

By the way, I just wanted to mention: For those of us tracking how far along AV1 decoding/playback is in the browsers:

– You can use it in the latest stable release of Chrome (no flags required anymore!)
– You can use it in the latest stable release of Firefox if you set “media.av1.enabled” to true in about:config
– No Safari or Edge yet? (As of early November 2018)

There are also a fair amount of YouTube videos now encoded in AV1. You can go to youtube.com/testtube to enable the AV1 beta on YouTube.

And outside of browsers, you can play AV1 files with VLC, and a number of other softwares. You can look at the AV1 Wikipedia page more areas where AV1 can already be used: https://en.m.wikipedia.org/wiki/AV1#Adoption

November 8th, 2018 at 13:58
DD

Mr. Egge,

I am thinking forward to AV2 (even though it seems early to do so! But I realize some amount of speculation or planning does happen rather early before work even begins on AV2.)

Perhaps it is safe to assume AV2 will include the best optimizations from the AV1 develoment cycle at that point, with the added freedom of a bitstream spec that is not frozen again.

In that spirit, I would be curious to know: In your opinion, are there any “lessons learned” from the AV1development cycle so far? (e.g. Technologies that should be added? Stuff that is in AV1 but maybe too complex or under-used and should be removed? Prior assumptions that one would want to tweak?)

In any case, thank you for letting us in on the details! It’s always more fun being “hyped up” about a technology when I can understand it well.

November 9th, 2018 at 05:40

Comments are closed for this article.

Hacks

By Nathan Egge, Michael Bebenita

About Nathan Egge

About Michael Bebenita

6 comments

Into the Depths: The Technical Details Behind AV1

By Nathan Egge, Michael Bebenita

About Nathan Egge

About Michael Bebenita

Discover great resources for web development

Thanks! Please check your inbox to confirm your subscription.