Building Interactive HTML5 Videos

The HTML5 <video> element makes embedding videos into your site as easy as embedding images. And since all major browsers support <video> since 2011, it’s also the most reliable way to get your moving pictures seen by people.

A more recent addition to the HTML5 family is the <track> element. It’s a sub-element of <video>, intended to make the video timeline more accessible. Its main use case is adding closed captions. These captions are loaded from a separate text file (a WebVTT file) and printed over the bottom of the video display. Ian Devlin has written an excellent article on the subject.

Beyond captions though, the <track> element can be used for any kind of interaction with the video timeline. This article explores 3 examples: chapter markers, preview thumbnails, and a timeline search. By the end, you will have sufficient understanding of the <track> element and its scripting API to build your own interactive video experiences.

Chapter Markers

Let’s start with an example made popular by DVD disks: chapter markers. These allow viewers to quickly jump to a specific section. It’s especially useful for longer movies like Sintel:

The chapter markers in this example reside in an external VTT file and are loaded on the page through a <track> element with a kind of **chapters. The track is set to load by default:

<video width="480" height="204" poster="assets/sintel.jpg" controls>
  <source src="assets/sintel.mp4" type="video/mp4">
  <track src="assets/chapters.vtt" kind="chapters" default>

Next, we use JavaScript to load the cues of the text track, format them, and print them in a controlbar below the video. Note we have to wait until the external VTT file is loaded:

track.addEventListener('load',function() {
    var c = video.textTracks[0].cues;
    for (var i=0; i<c.length; i++) {
      var s = document.createElement("span");
      s.innerHTML = c[i].text;

In above code block, we’re adding 2 properties to the list entries to hook up interactivity. First, we set a data attribute to store the start position of the chapter, and second we add a click handler for an external seek function. This function will jump the video to the start position. If the video is not (yet) playing, we’ll make that so:

function seek() {
  video.currentTime = this.getAttribute('data-start');
  if(video.paused){; }

That’s it! You now have a visual chapter menu for your video, powered by a VTT track. Note the actual live Chapter Markers example has a little bit more logic than described, e.g. to toggle playback of the video on click, to update the controlbar with the video position, and to add some CSS styling.

Preview Thumbnails

This second example shows a cool feature made popular by Hulu and Netflix: preview thumbnails. When mousing over the controlbar (or dragging on mobile), a small preview of the position you’re about to seek to is displayed:

This example is also powered by an external VTT file, loaded in a metadata track. Instead of texts, the cues in this VTT file contain links to a separate JPG image. Each cue could link to a separate image, but in this case we opted to use a single JPG sprite – to keep latency low and management easy. The cues link to the correct section of the sprite by using Media Fragment URIs.Example:,0,160,90

Next, all important logic to get the right thumbnail and display it lives in a mousemove listener for the controlbar:

controlbar.addEventListener('mousemove',function(e) {
  // first we convert from mouse to time position ..
  var p = (e.pageX - controlbar.offsetLeft) * video.duration / 480;
  // ..then we find the matching cue..
  var c = video.textTracks[0].cues;
  for (var i=0; i<c.length; i++) {
      if(c[i].startTime <= p && c[i].endTime > p) {
  // we unravel the JPG url and fragment query..
  var url =c[i].text.split('#')[0];
  var xywh = c[i].text.substr(c[i].text.indexOf("=")+1).split(',');
  // ..and last we style the thumbnail overlay = 'url('+c[i].text.split('#')[0]+')'; = '-'+xywh[0]+'px -'+xywh[1]+'px'; = e.pageX - xywh[2]/2+'px'; = controlbar.offsetTop - xywh[3]+8+'px'; = xywh[2]+'px'; = xywh[3]+'px';

All done! Again, the actual live Preview Thumbnails example contains some additional code. It includes the same logic for toggling playback and seeking, as well as logic to show/hide the thumbnail when mousing in/out of the controlbar.

Timeline Search

Our last example offers yet another way to unlock your content, this time though in-video search:

This example re-uses an existing captions VTT file, which is loaded into a captions track. Below the video and controlbar, we print a basic search form:

    <input type="search" />
    <button type="submit">Search</button>

Like with the thumbnails example, all key logic resides in a single function. This time, it’s the event handler for submitting the form:

form.addEventListener('submit',function(e) {
  // First we’ll prevent page reload and grab the cues/query..
  var c = video.textTracks[0].cues;
  var q = document.querySelector("input").value.toLowerCase();
  // ..then we find all matching cues..
  var a = [];
  for(var j=0; j<c.length; j++) {
    if(c[j].text.toLowerCase().indexOf(q) > -1) {
  // ..and last we highlight matching cues on the controlbar.
  for (var i=0; i<a.length; i++) {
    var s = document.createElement("span"); = (a[i].startTime/video.duration*480-2)+"px";

Three time’s a charm! Like with the other ones, the actual live Timeline Search example contains additional code for toggling playback and seeking, as well as a snippet to update the controlbar help text.

Wrapping Up

Above examples should provide you with enough knowledge to build your own interactive videos. For some more inspiration, see our experiments around clickable hot spots, interactive transcripts, or timeline interaction.

Overall, the HTML5 <track> element provides an easy to use, cross-platform way to add interactivity to your videos. And while it definitely takes time to author VTT files and build similar experiences, you will see higher accessibility of and engagement with your videos. Good luck!

About Jeroen Wijering

Creator of the successful JW Player and co-founder of the company with the same name. He is the team's Product Evangelist, driving innovation and market awareness.

More articles by Jeroen Wijering…

About Robert Nyman [Editor emeritus]

Technical Evangelist & Editor of Mozilla Hacks. Gives talks & blogs about HTML5, JavaScript & the Open Web. Robert is a strong believer in HTML5 and the Open Web and has been working since 1999 with Front End development for the web - in Sweden and in New York City. He regularly also blogs at and loves to travel and meet people.

More articles by Robert Nyman [Editor emeritus]…


  1. Elijah Lynn

    The timeline search is badass, thanks for these demos!

    August 26th, 2014 at 15:13
    1. Robert Nyman [Editor]

      Glad you liked it!

      August 27th, 2014 at 03:27
    2. Mathew Porter

      I just thought the exact same thing!

      August 28th, 2014 at 08:59
  2. Chris Adams

    Nice – I’ve been really happy with how well the suite of web video technologies is coming together. Awhile back I created a simple project to create an interactive, synchronized transcript and was happy with how little work is involved on modern browsers:

    (I think this is similar to the linked interactive transcripts demo above but can’t confirm since the demo requires Flash, which I don’t have installed)

    August 27th, 2014 at 05:52
    1. Robert Nyman [Editor]

      Yes, great progress in creating much richer experiences!

      August 27th, 2014 at 06:45
  3. Heather Zhong

    Thanks for the code snippets and demos. I would like to know how accurate the cues are fired based on startTime specified in vtt. We implemented “interactive video” based on video.timeupdate event by positioning various DOM elements on top of the video layer (example here: I found that video.timeupdate events don’t fire often enough to get accurate timing we desired. I would like to go with webvtt route once the browser support for oncuechange events are better. I like to assume oncuechange events are fired on time accurately on browsers supported so far. Anyone know if it is the case from your experience?

    August 27th, 2014 at 07:16
    1. Chris Adams

      I found cuechange to be quite accurate before I switched to timeupdate for compatibility with Firefox. My code highlighting text was imperceptibly close to the subtitles or audio (assuming, of course, that your timecodes are that precise).

      August 27th, 2014 at 11:25
      1. alexander farkas

        I don’t think you are right here. Clearly cuechange is close to subtitles, but currently no browser (tested in Safari, Chrome and IE) has implemented “high precission timing with text track. The accuracy is about 100-140. (Note the timeupdate is throttled to 250ms.) Here is a which performs a test:

        Simply play the video until 20sec and it will alert you with the precission. I used a similiar test to implement high precission timing in my polyfill. You can test this by simply adding the following line:

        webshims.setOptions(‘track’, {
        //set to true to test webshim
        override: true

        August 29th, 2014 at 06:57
        1. Chris Adams

          You’re right – a better question would have been “what precision do you need?”. The human visual system latency is somewhere in the 100ms range but audio latency is an order of magnitude lower.

          For my needs, the precision from triggering on cuechange was perceptibly better than waiting for timeupdate event, making the text display synchronization close enough that I couldn’t detect a delay but I can imagine many scenarios where that would be different.

          As a data point, Firefox is consistently around 26ms on my system.

          August 29th, 2014 at 08:04
  4. happyWang

    Nice! This is what i want, build a flash like video player, thanks

    August 27th, 2014 at 07:20
  5. Raymond Camden

    Very cool!

    FYI, a typo:

    “through aelement with a kind of **chapters”

    probably should be “an element”. Also, the “with a kind of **chapters” didn’t quite make sense.

    August 27th, 2014 at 07:42
    1. Robert Nyman [Editor]

      Thank you!
      And the <track> element reference had fallen out there. Added it now.

      August 27th, 2014 at 08:35
  6. Jx Prince

    making sense of data in a video – super awesome

    August 27th, 2014 at 11:14
  7. Md Mizanur Rahman


    I have need your htm5 video source code in zip for Chapter Markers, Preview Thumbnails and Timeline Search. I want to learn it .

    Best regards

    September 19th, 2014 at 02:59
    1. Robert Nyman [Editor]

      All the code is available in the linked examples in the article.

      September 25th, 2014 at 13:02
  8. Kris

    Hi, thanks for a good intro to VTT & some neat functionality. I’m pretty keen on the preview thumbnails, but the demo doesn’t seem to work for me in Chrome v37 on OS X 10.9. They look great in FF & Safari, but there is no effect at all when I mouseover in Chrome.
    Anyone else seeing this? I have a site where I’d love to implement this functionality but I’m a little uneasy to use it in production if it’s not fully working. I’ll try it out for myself and will follow up if I have success.

    September 22nd, 2014 at 15:10

Comments are closed for this article.