The Baseline Interpreter: a faster JS interpreter in Firefox 70

Introduction

Modern web applications load and execute a lot more JavaScript code than they did just a few years ago. While JIT (just-in-time) compilers have been very successful in making JavaScript performant, we needed a better solution to deal with these new workloads.

To address this, we’ve added a new, generated JavaScript bytecode interpreter to the JavaScript engine in Firefox 70. The interpreter is available now in the Firefox Nightly channel, and will go to general release in October. Instead of writing or generating a new interpreter from scratch, we found a way to do this by sharing most code with our existing Baseline JIT.

The new Baseline Interpreter has resulted in performance improvements, memory usage reductions and code simplifications. Here’s how we got there:

Execution tiers

In modern JavaScript engines, each function is initially executed in a bytecode interpreter. Functions that are called a lot (or perform many loop iterations) are compiled to native machine code. (This is called JIT compilation.)

Firefox has an interpreter written in C++ and multiple JIT tiers:

  • The Baseline JIT. Each bytecode instruction is compiled directly to a small piece of machine code. It uses Inline Caches (ICs) both as performance optimization and to collect type information for Ion.
  • IonMonkey (or just Ion), the optimizing JIT. It uses advanced compiler optimizations to generate fast code for hot functions (at the expense of slower compile times).

Ion JIT code for a function can be ‘deoptimized’ and thrown away for various reasons, for example when the function is called with a new argument type. This is called a bailout. When a bailout happens, execution continues in the Baseline code until the next Ion compilation.

Until Firefox 70, the execution pipeline for a very hot function looked like this:

Timeline showing C++ Interpreter, Baseline Compilation, Baseline JIT Code, Prepare for Ion, Ion JIT Code with an arrow (called bailout) from Ion JIT Code back to Baseline JIT Code

Problems

Although this works pretty well, we ran into the following problems with the first part of the pipeline (C++ Interpreter and Baseline JIT):

  1. Baseline JIT compilation is fast, but modern web applications like Google Docs or Gmail execute so much JavaScript code that we could spend quite some time in the Baseline compiler, compiling thousands of functions.
  2. Because the C++ interpreter is so slow and doesn’t collect type information, delaying Baseline compilation or moving it off-thread would have been a performance risk.
  3. As you can see in the diagram above, optimized Ion JIT code was only able to bail out to the Baseline JIT. To make this work, Baseline JIT code required extra metadata (the machine code offset corresponding to each bytecode instruction).
  4. The Baseline JIT had some complicated code for bailouts, debugger support, and exception handling. This was especially true where these features intersect!

Solution: generate a faster interpreter

We needed type information from the Baseline JIT to enable the more optimized tiers, and we wanted to use JIT compilation for runtime speed. However, the modern web has such large codebases that even the relatively fast Baseline JIT Compiler spent a lot of time compiling. To address this, Firefox 70 adds a new tier called the Baseline Interpreter to the pipeline:

Same timeline of execution tiers as before but now has the 'Baseline Interpreter' between C++ interpreter and Baseline compilation. The bailout arrow points to Baseline Interpreter instead of Baseline JIT Code.

The Baseline Interpreter sits between the C++ interpreter and the Baseline JIT and has elements from both. It executes all bytecode instructions with a fixed interpreter loop (like the C++ interpreter). In addition, it uses Inline Caches to improve performance and collect type information (like the Baseline JIT).

Generating an interpreter isn’t a new idea. However, we found a nice new way to do it by reusing most of the Baseline JIT Compiler code. The Baseline JIT is a template JIT, meaning each bytecode instruction is compiled to a mostly fixed sequence of machine instructions. We generate those sequences into an interpreter loop instead.

Sharing Inline Caches and profiling data

As mentioned above, the Baseline JIT uses Inline Caches (ICs) both to make it fast and to help Ion compilation. To get type information, the Ion JIT compiler can inspect the Baseline ICs.

Because we wanted the Baseline Interpreter to use exactly the same Inline Caches and type information as the Baseline JIT, we added a new data structure called JitScript. JitScript contains all type information and IC data structures used by both the Baseline Interpreter and JIT.

The diagram below shows what this looks like in memory. Each arrow is a pointer in C++. Initially, the function just has a JSScript with the bytecode that can be interpreted by the C++ interpreter. After a few calls/iterations we create the JitScript, attach it to the JSScript and can now run the script in the Baseline Interpreter.

As the code gets warmer we may also create the BaselineScript (Baseline JIT code) and then the IonScript (Ion JIT code).
JSScript (bytecode) points to JitScript (IC and profiling data). JitScript points to BaselineScript (Baseline JIT Code) and IonScript (Ion JIT code).

Note that the Baseline JIT data for a function is now just the machine code. We’ve moved all the inline caches and profiling data into JitScript.

Sharing the frame layout

The Baseline Interpreter uses the same frame layout as the Baseline JIT, but we’ve added some interpreter-specific fields to the frame. For example, the bytecode PC (program counter), a pointer to the bytecode instruction we are currently executing, is not updated explicitly in Baseline JIT code. It can be determined from the return address if needed, but the Baseline Interpreter has to store it in the frame.

Sharing the frame layout like this has a lot of advantages. We’ve made almost no changes to C++ and IC code to support Baseline Interpreter frames—they’re just like Baseline JIT frames. Furthermore, When the script is warm enough for Baseline JIT compilation, switching from Baseline Interpreter code to Baseline JIT code is a matter of jumping from the interpreter code into JIT code.

Sharing code generation

Because the Baseline Interpreter and JIT are so similar, a lot of the code generation code can be shared too. To do this, we added a templated BaselineCodeGen base class with two derived classes:

The base class has a Handler C++ template argument that can be used to specialize behavior for either the Baseline Interpreter or JIT. A lot of Baseline JIT code can be shared this way. For example, the implementation of the JSOP_GETPROP bytecode instruction (for a property access like obj.foo in JavaScript code) is shared code. It calls the emitNextIC helper method that’s specialized for either Interpreter or JIT mode.

Generating the Interpreter

With all these pieces in place, we were able to implement the BaselineInterpreterGenerator class to generate the Baseline Interpreter! It generates a threaded interpreter loop: The code for each bytecode instruction is followed by an indirect jump to the next bytecode instruction.

For example, on x64 we currently generate the following machine code to interpret JSOP_ZERO (bytecode instruction to push a zero value on the stack):

// Push Int32Value(0).
movabsq $-0x7800000000000, %r11
pushq  %r11
// Increment bytecode pc register.
addq   $0x1, %r14
// Patchable NOP for debugger support.
nopl   (%rax,%rax)
// Load the next opcode.
movzbl (%r14), %ecx
// Jump to interpreter code for the next instruction.
leaq   0x432e(%rip), %rbx
jmpq   *(%rbx,%rcx,8)

When we enabled the Baseline Interpreter in Firefox Nightly (version 70) back in July, we increased the Baseline JIT warm-up threshold from 10 to 100. The warm-up count is determined by counting the number of calls to the function + the number of loop iterations so far. The Baseline Interpreter has a threshold of 10, same as the old Baseline JIT threshold. This means that the Baseline JIT has a lot less code to compile.

Results

Performance and memory usage

After this landed in Firefox Nightly our performance testing infrastructure detected several improvements:

  • Various 2-8% page load improvements. A lot happens during page load in addition to JS execution (parsing, style, layout, graphics). Improvements like this are quite significant.
  • Many devtools performance tests improved by 2-10%.
  • Some small memory usage wins.

Note that we’ve landed more performance improvements since this first landed.

To measure how the Baseline Interpreter’s performance compares to the C++ Interpreter and the Baseline JIT, I ran Speedometer and Google Docs on Windows 10 64-bit on Mozilla’s Try server and enabled the tiers one by one. (The following numbers reflect the best of 7 runs.):
C++ Interpreter 901 ms, + Baseline Interpreter 676 ms, + Baseline JIT 633 ms
On Google Docs we see that the Baseline Interpreter is much faster than just the C++ Interpreter. Enabling the Baseline JIT too makes the page load only a little bit faster.

On the Speedometer benchmark we get noticeably better results when we enable the Baseline JIT tier. The Baseline Interpreter does again much better than just the C++ Interpreter:
C++ Interpreter 31 points, + Baseline Interpreter 52 points, + Baseline JIT 69 points
We think these numbers are great: the Baseline Interpreter is much faster than the C++ Interpreter and its start-up time (JitScript allocation) is much faster than Baseline JIT compilation (at least 10 times faster).

Simplifications

After this all landed and stuck, we were able to simplify the Baseline JIT and Ion code by taking advantage of the Baseline Interpreter.

For example, deoptimization bailouts from Ion now resume in the Baseline Interpreter instead of in the Baseline JIT. The interpreter can re-enter Baseline JIT code at the next loop iteration in the JS code. Resuming in the interpreter is much easier than resuming in the middle of Baseline JIT code. We now have to record less metadata for Baseline JIT code, so Baseline JIT compilation got faster too. Similarly, we were able to remove a lot of complicated code for debugger support and exception handling.

What’s next?

With the Baseline Interpreter in place, it should now be possible to move Baseline JIT compilation off-thread. We will be working on that in the coming months, and we anticipate more performance improvements in this area.

Acknowledgements

Although I did most of the Baseline Interpreter work, many others contributed to this project. In particular Ted Campbell and Kannan Vijayan reviewed most of the code changes and had great design feedback.

Also thanks to Steven DeTar, Chris Fallin, Havi Hoffman, Yulia Startsev, and Luke Wagner for their feedback on this blog post.

About Jan de Mooij

Jan is a software engineer at Mozilla where he works on SpiderMonkey, the JavaScript Engine in Firefox. He lives in the Netherlands.

More articles by Jan de Mooij…


12 comments

  1. Kelly

    Any chance of a simplified explanation for the idiot “sysadmin types that dream of being programmers” (me)?

    Like, I don’t understand what you mean by “C++ interpreter”. Is that effectively the classic spidermonkey JS interpreter (in other words, “a JS interpreter written in C++”)? How is “Baseline interpreter” different from the “C++ interpreter” (and wouldn’t it still be written in C++)? And how is “Baseline JIT” different from “Ion JIT”? If something is JITed, it’s JITed, right? You can’t compile something that’s been compiled already?

    Man, I sort of understood what was going on when JIT was first being added to JS engines, but I am so lost nowadays.

    August 30th, 2019 at 11:34

    Reply

    1. Jan de Mooij

      True, all this can get pretty complicated! I hope the following helps:

      C++ Interpreter: you’re correct, it’s the JS bytecode interpreter written in C++.

      Baseline Interpreter: an interpreter that’s generated dynamically. The code that generates the interpreter is still written in C++. It’s faster than the C++ Interpreter, one reason is because it uses Inline Caches.

      Why do we have two JITs, Baseline and Ion? Because they have different goals: the Baseline JIT’s goals are: (1) compile code very quickly (the compiled code is not super fast but still faster than an interpreter) (2) collect information about the script for the Ion JIT compiler.

      The Ion JIT has different goals: (1) generate very fast code (but this may take longer so we only do it for hot code) (2) Make assumptions based on things the Baseline Interpreter/JIT observed (this makes it fast) and throw away the code if these assumptions changed.

      So, yes, a function can actually be JIT compiled multiple times!

      August 30th, 2019 at 23:31

      Reply

  2. bob

    Thanks! This is awesome news of the web and everyone who uses web apps!

    August 30th, 2019 at 13:51

    Reply

  3. Yan Luo

    Does the Baseline interpreter look like the one in LuaJIT?

    August 30th, 2019 at 20:48

    Reply

    1. Jan de Mooij

      Sorry I’m not very familiar with LuaJIT, maybe one of our readers can answer this better :) (LuaJIT has a tracing JIT I think so I expect the interpreter and JIT to look quite different from what we have, but I could be wrong.)

      August 30th, 2019 at 23:58

      Reply

      1. Jaen

        LuaJIT also has an interpreter written in assembly, and on a superficial level it is similar to the interpreter generated here.

        Since the higher tier is a tracing JIT, it does not eg. use/need inline caches though.

        August 31st, 2019 at 00:13

        Reply

        1. Yan Luo

          Thanks!

          August 31st, 2019 at 15:32

          Reply

      2. Yan Luo

        Thanks! It’s a nice read.

        August 31st, 2019 at 15:33

        Reply

      3. DRSAgile

        Jan, thanks for your article.

        You mentioned that your next project is to move some compilation to secondary process threads, which is great.

        But, after that, why not also rewrite more things in assembler language, as LuaJIT does?

        The issue is, no matter how hard you try with compiler optimizations flags, in general, C++ is slower than C, and C is slower than assembler. FireFox is not run on a gazillion of CPU/ISAs nowadays, and you can only do AMD x64 for starters, so it all should be just fine, no?

        Considering the fact that “the fight” between JS engines is going mostly for marginal improvements, the old argument that rewriting things in assembler would not improve the performance by “much” does not work already: every percentage matters nowadays.

        September 3rd, 2019 at 23:00

        Reply

        1. Jan de Mooij

          Thanks for your reply! In general we think most of our performance wins will come from being smarter about what/how we JIT compile. There are still a lot of potential improvements at a much higher level and with more impact than C/C++ vs assembly.

          September 3rd, 2019 at 23:39

          Reply

        2. Aaron W

          I’d much prefer to keep focusing on architectural improvements instead of spending a ton of time on architecture specific optimizations.

          Firefox currently runs on arm 32/64, x86 32/64, power (pretty sure), mips, and who knows what else. Any of those architectures can have a myriad of different instruction set extensions, and supporting that matrix should be explored after we’ve explored all of the instruction set agnostic architectural improvements possible.

          Keeping the code portable as much as possible is a great feature.

          Because of its portable nature, I’ve even got libmozjs running on my mid-90’s Alpha as part of gnome shell. It’s slow, but awesome that I can do it at all.

          September 6th, 2019 at 16:20

          Reply

        3. Horacio

          As an ex-embedded engineer working on multi-platform code, I’d love to see some proof that writing in assembler is an advantage, much less a sustainable one ;)

          September 8th, 2019 at 05:27

          Reply

Post Your Comment