A crash course in assembly

This is the third part in a series on WebAssembly and what makes it fast. If you haven’t read the others, we recommend starting from the beginning.

To understand how WebAssembly works, it helps to understand what assembly is and how compilers produce it.

In the article on the JIT, I talked about how communicating with the machine is like communicating with an alien.

A person holding a sign with source code on it, and an alien responding in binary

I want to take a look now at how that alien brain works—how the machine’s brain parses and understands the communication coming in to it.

There’s a part of this brain that’s dedicated to the thinking—things like adding and subtracting, or logical operations. There’s also a part of the brain near that which provides short-term memory, and another part that provides longer-term memory.

These different parts have names.

  • The part that does the thinking is the Arithmetic-logic Unit (ALU).
  • The short term memory is provided by registers.
  • The longer term memory is the Random Access Memory (or RAM).

A diagram showing the CPU, including ALU and Registers, and RAM

The sentences in machine code are called instructions.

What happens when one of these instructions comes into the brain? It gets split up into different parts that mean different things.

The way that this instruction is split up is specific to the wiring of this brain.

For example, a brain that is wired like this might always take the first six bits and pipe that in to the ALU. The ALU will figure out, based on the location of ones and zeros, that it needs to add two things together.

This chunk is called the “opcode”, or operation code, because it tells the ALU what operation to perform.

6-bits being taken from a 16-bit instruction and being piped into the ALU

Then this brain would take the next two chunks of three bits each to determine which two numbers it should add. These would be addresses of the registers.

Two 3-bit chunks being decoded to determine source registers

Note the annotations above the machine code here, which make it easier for us humans to understand what’s going on. This is what assembly is. It’s called symbolic machine code. It’s a way for humans to make sense of the machine code.

You can see here there is a pretty direct relationship between the assembly and the machine code for this machine. Because of this, there are different kinds of assembly for the different kinds of machine architectures that you can have. When you have a different architecture inside of a machine, it is likely to require its own dialect of assembly.

So we don’t just have one target for our translation. It’s not just one language called machine code. It’s many different kinds of machine code. Just as we speak different languages as people, machines speak different languages.

With human to alien translation, you may be going from English, or Russian, or Mandarin to Alien Language A or Alien language B. In programming terms, this is like going from C, or C++, or Rust to x86 or to ARM.

You want to be able to translate any one of these high-level programming languages down to any one of these assembly languages (which corresponds to the different architectures). One way to do this would be to create a whole bunch of different translators that can go from each language to each assembly.

Diagram showing programming languages C, C++, and Rust on the left and assembly languages x86 and ARM on the right, with arrows between every combination

That’s going to be pretty inefficient. To solve this, most compilers put at least one layer in between. The compiler will take this high-level programming language and translate it into something that’s not quite as high level, but also isn’t working at the level of machine code. And that’s called an intermediate representation (IR).

Diagram showing an intermediate representation between high level languages and assembly languages, with arrows going from high level programming languages to intermediate representation, and then from intermediate representation to assembly language

This means the compiler can take any one of these higher-level languages and translate it to the one IR language. From there, another part of the compiler can take that IR and compile it down to something specific to the target architecture.

The compiler’s front-end translates the higher-level programming language to the IR. The compiler’s backend goes from IR to the target architecture’s assembly code.

Same diagram as above, with labels for front-end and back-end

Conclusion

That’s what assembly is and how compilers translate higher-level programming languages to assembly. In the next article, we’ll see how WebAssembly fits in to this.

About Lin Clark

Lin works in Advanced Development at Mozilla, with a focus on Rust and WebAssembly.

More articles by Lin Clark…


6 comments

  1. Deep Pulusani

    These are really great articles and illustrations. Well done!

    March 1st, 2017 at 09:45

  2. Fernando Chavez Gomes da Silva

    This is the easiest way i could find to understand Assembly. Great Work and nice illustrations!

    March 2nd, 2017 at 10:15

  3. Boy this brought back memories: writing machine code for the Z-80 in my NEC “Trek” … 8KB. And from there to Assembler on C=64’s 6502.
    Elemental! :-)
    –@ITGeek

    March 2nd, 2017 at 12:35

  4. Luca

    Hi, I think you have swapped ram with registry when talking about the assembler parts.

    Cheers Luca

    March 4th, 2017 at 06:19

    1. Lin Clark

      I can tell you for certain that I have not swapped the registers and RAM. I have in the past designed all of these parts (registers, an ALU, a CPU, and RAM) from basic logic gates using a hardware description language. I’ve used those parts to create the design for a working computer.

      Also, just to avoid confusion for anyone reading the comments, the diagram isn’t meant to show an assembler. As Wikipedia defines it, an assembler is “a computer program which translates assembly language to an object file or machine language format”. An assembler would have been used during compilation to generate the machine code instruction, but it is not part of the description or diagram here.

      March 4th, 2017 at 06:39

  5. Pablo Barria Urenda

    Digging these articles! Thanks!

    Just wanted to comment that, maybe in the spirit of cartoon introductions, you could name drop this video game introduction to assembly:

    http://www.zachtronics.com/tis-100/

    (I’m not affiliated with the game in any way other than having bought a copy)

    March 13th, 2017 at 19:09

Comments are closed for this article.