Building Interactive HTML5 Videos

The HTML5 <video> element makes embedding videos into your site as easy as embedding images. And since all major browsers support <video> since 2011, it’s also the most reliable way to get your moving pictures seen by people.

A more recent addition to the HTML5 family is the <track> element. It’s a sub-element of <video>, intended to make the video timeline more accessible. Its main use case is adding closed captions. These captions are loaded from a separate text file (a WebVTT file) and printed over the bottom of the video display. Ian Devlin has written an excellent article on the subject.

Beyond captions though, the <track> element can be used for any kind of interaction with the video timeline. This article explores 3 examples: chapter markers, preview thumbnails, and a timeline search. By the end, you will have sufficient understanding of the <track> element and its scripting API to build your own interactive video experiences.

Chapter Markers

Let’s start with an example made popular by DVD disks: chapter markers. These allow viewers to quickly jump to a specific section. It’s especially useful for longer movies like Sintel:

The chapter markers in this example reside in an external VTT file and are loaded on the page through a <track> element with a kind of **chapters. The track is set to load by default:

<video width="480" height="204" poster="assets/sintel.jpg" controls>
  <source src="assets/sintel.mp4" type="video/mp4">
  <track src="assets/chapters.vtt" kind="chapters" default>

Next, we use JavaScript to load the cues of the text track, format them, and print them in a controlbar below the video. Note we have to wait until the external VTT file is loaded:

track.addEventListener('load',function() {
    var c = video.textTracks[0].cues;
    for (var i=0; i<c.length; i++) {
      var s = document.createElement("span");
      s.innerHTML = c[i].text;

In above code block, we’re adding 2 properties to the list entries to hook up interactivity. First, we set a data attribute to store the start position of the chapter, and second we add a click handler for an external seek function. This function will jump the video to the start position. If the video is not (yet) playing, we’ll make that so:

function seek() {
  video.currentTime = this.getAttribute('data-start');
  if(video.paused){; }

That’s it! You now have a visual chapter menu for your video, powered by a VTT track. Note the actual live Chapter Markers example has a little bit more logic than described, e.g. to toggle playback of the video on click, to update the controlbar with the video position, and to add some CSS styling.

Preview Thumbnails

This second example shows a cool feature made popular by Hulu and Netflix: preview thumbnails. When mousing over the controlbar (or dragging on mobile), a small preview of the position you’re about to seek to is displayed:

This example is also powered by an external VTT file, loaded in a metadata track. Instead of texts, the cues in this VTT file contain links to a separate JPG image. Each cue could link to a separate image, but in this case we opted to use a single JPG sprite – to keep latency low and management easy. The cues link to the correct section of the sprite by using Media Fragment URIs.Example:,0,160,90

Next, all important logic to get the right thumbnail and display it lives in a mousemove listener for the controlbar:

controlbar.addEventListener('mousemove',function(e) {
  // first we convert from mouse to time position ..
  var p = (e.pageX - controlbar.offsetLeft) * video.duration / 480;
  // ..then we find the matching cue..
  var c = video.textTracks[0].cues;
  for (var i=0; i<c.length; i++) {
      if(c[i].startTime <= p && c[i].endTime > p) {
  // we unravel the JPG url and fragment query..
  var url =c[i].text.split('#')[0];
  var xywh = c[i].text.substr(c[i].text.indexOf("=")+1).split(',');
  // ..and last we style the thumbnail overlay = 'url('+c[i].text.split('#')[0]+')'; = '-'+xywh[0]+'px -'+xywh[1]+'px'; = e.pageX - xywh[2]/2+'px'; = controlbar.offsetTop - xywh[3]+8+'px'; = xywh[2]+'px'; = xywh[3]+'px';

All done! Again, the actual live Preview Thumbnails example contains some additional code. It includes the same logic for toggling playback and seeking, as well as logic to show/hide the thumbnail when mousing in/out of the controlbar.

Timeline Search

Our last example offers yet another way to unlock your content, this time though in-video search:

This example re-uses an existing captions VTT file, which is loaded into a captions track. Below the video and controlbar, we print a basic search form:

    <input type="search" />
    <button type="submit">Search</button>

Like with the thumbnails example, all key logic resides in a single function. This time, it’s the event handler for submitting the form:

form.addEventListener('submit',function(e) {
  // First we’ll prevent page reload and grab the cues/query..
  var c = video.textTracks[0].cues;
  var q = document.querySelector("input").value.toLowerCase();
  // ..then we find all matching cues..
  var a = [];
  for(var j=0; j<c.length; j++) {
    if(c[j].text.toLowerCase().indexOf(q) > -1) {
  // ..and last we highlight matching cues on the controlbar.
  for (var i=0; i<a.length; i++) {
    var s = document.createElement("span"); = (a[i].startTime/video.duration*480-2)+"px";

Three time’s a charm! Like with the other ones, the actual live Timeline Search example contains additional code for toggling playback and seeking, as well as a snippet to update the controlbar help text.

Wrapping Up

Above examples should provide you with enough knowledge to build your own interactive videos. For some more inspiration, see our experiments around clickable hot spots, interactive transcripts, or timeline interaction.

Overall, the HTML5 <track> element provides an easy to use, cross-platform way to add interactivity to your videos. And while it definitely takes time to author VTT files and build similar experiences, you will see higher accessibility of and engagement with your videos. Good luck!


Comments are now closed.

  1. Elijah Lynn wrote on August 26th, 2014 at 15:13:

    The timeline search is badass, thanks for these demos!

    1. Robert Nyman [Editor] wrote on August 27th, 2014 at 03:27:

      Glad you liked it!

    2. Mathew Porter wrote on August 28th, 2014 at 08:59:

      I just thought the exact same thing!

  2. Chris Adams wrote on August 27th, 2014 at 05:52:

    Nice – I’ve been really happy with how well the suite of web video technologies is coming together. Awhile back I created a simple project to create an interactive, synchronized transcript and was happy with how little work is involved on modern browsers:

    (I think this is similar to the linked interactive transcripts demo above but can’t confirm since the demo requires Flash, which I don’t have installed)

    1. Robert Nyman [Editor] wrote on August 27th, 2014 at 06:45:

      Yes, great progress in creating much richer experiences!

  3. Heather Zhong wrote on August 27th, 2014 at 07:16:

    Thanks for the code snippets and demos. I would like to know how accurate the cues are fired based on startTime specified in vtt. We implemented “interactive video” based on video.timeupdate event by positioning various DOM elements on top of the video layer (example here: I found that video.timeupdate events don’t fire often enough to get accurate timing we desired. I would like to go with webvtt route once the browser support for oncuechange events are better. I like to assume oncuechange events are fired on time accurately on browsers supported so far. Anyone know if it is the case from your experience?

    1. Chris Adams wrote on August 27th, 2014 at 11:25:

      I found cuechange to be quite accurate before I switched to timeupdate for compatibility with Firefox. My code highlighting text was imperceptibly close to the subtitles or audio (assuming, of course, that your timecodes are that precise).

      1. alexander farkas wrote on August 29th, 2014 at 06:57:

        I don’t think you are right here. Clearly cuechange is close to subtitles, but currently no browser (tested in Safari, Chrome and IE) has implemented “high precission timing with text track. The accuracy is about 100-140. (Note the timeupdate is throttled to 250ms.) Here is a which performs a test:

        Simply play the video until 20sec and it will alert you with the precission. I used a similiar test to implement high precission timing in my polyfill. You can test this by simply adding the following line:

        webshims.setOptions(‘track’, {
        //set to true to test webshim
        override: true

        1. Chris Adams wrote on August 29th, 2014 at 08:04:

          You’re right – a better question would have been “what precision do you need?”. The human visual system latency is somewhere in the 100ms range but audio latency is an order of magnitude lower.

          For my needs, the precision from triggering on cuechange was perceptibly better than waiting for timeupdate event, making the text display synchronization close enough that I couldn’t detect a delay but I can imagine many scenarios where that would be different.

          As a data point, Firefox is consistently around 26ms on my system.

  4. happyWang wrote on August 27th, 2014 at 07:20:

    Nice! This is what i want, build a flash like video player, thanks

  5. Raymond Camden wrote on August 27th, 2014 at 07:42:

    Very cool!

    FYI, a typo:

    “through aelement with a kind of **chapters”

    probably should be “an element”. Also, the “with a kind of **chapters” didn’t quite make sense.

    1. Robert Nyman [Editor] wrote on August 27th, 2014 at 08:35:

      Thank you!
      And the <track> element reference had fallen out there. Added it now.

  6. Jx Prince wrote on August 27th, 2014 at 11:14:

    making sense of data in a video – super awesome

  7. Md Mizanur Rahman wrote on September 19th, 2014 at 02:59:


    I have need your htm5 video source code in zip for Chapter Markers, Preview Thumbnails and Timeline Search. I want to learn it .

    Best regards

    1. Robert Nyman [Editor] wrote on September 25th, 2014 at 13:02:

      All the code is available in the linked examples in the article.

  8. Kris wrote on September 22nd, 2014 at 15:10:

    Hi, thanks for a good intro to VTT & some neat functionality. I’m pretty keen on the preview thumbnails, but the demo doesn’t seem to work for me in Chrome v37 on OS X 10.9. They look great in FF & Safari, but there is no effect at all when I mouseover in Chrome.
    Anyone else seeing this? I have a site where I’d love to implement this functionality but I’m a little uneasy to use it in production if it’s not fully working. I’ll try it out for myself and will follow up if I have success.

Comments are closed for this article.