Moore’s Law roughly guarantees that processor speed doubles every 18 months. However, if we look at how modern processors achieve this, one realizes that this fact can not apply to all programs run. Superscalar processors rely on heavy branch prediction and instruction pipelining—which is not given in everyone’s favourite “always too slow” applications, namely language interpreters.
If you want to implement an language efficiently nowadays, you have two choices: write a compiler or write a fast interpreter. Here, I assume the compiler approach consists of translating the source language code into assembly or C, before the program is run. Everything else, I’ll consider to be a fast interpreter—for example, a virtual machine, possibly compiling code just-in-time.
In both ways, there is no way around machine code generation—directly or indirectly. It is just about impossible to write a architecture-independent (read: ANSI C only), efficient interpreter.
Luckily, runtime code generation does not need to be difficult. Imagine having a virtual machine, where you can extend the instruction set at runtime. Or even write the instructions in a higher-level language!
The problem of fast interpreters is that their dispatch needs a lot of time. Research suggests that there are two code threading schemes that are reasonable to implement on modern architectures: direct threaded code and subroutine threading. (See Anton Ertl’s explanations for details.)
Let me introduce you libsew
, a library for generating
subroutine-threaded code at runtime.
Here is a quick example:
#include <stdio.h>
#include "libsew.h"
void a() { printf ("A\n"); }
void b() { printf ("B\n"); }
int
main()
{
sew_seam *s = sew_new ();
sew_add_call (s, a);
sew_add_call (s, b);
sew_add_call (s, a);
sew_finalize (s);
sew_call (s);
sew_free (s);
return 0;
}
As you probably can guess, this will output
A
B
A
Now, how does this work? Libsew internally generates this code:
0x100150: mflr r0
0x100154: stw r0,8(r1)
0x100158: stwu r1,-64(r1)
0x10015c: bla 0x2654 <a>
0x100160: bla 0x2690 <b>
0x100164: bla 0x2654 <a>
0x100168: lwz r0,72(r1)
0x10016c: addic r1,r1,64
0x100170: mtlr r0
0x100174: blr
The three bla
(Branch Absolute and Link) call the subroutines
mentioned, the rest sets up and removes the stack frame. Usually,
every opcode will just be a C function; the machine state is kept in
global variables.
Of course, code generation like this is highly architecture dependent—so far I only target PPC, because that’s what I currently run. However, a port of the code generator can be done on a rainy afternoon, it is pretty minimal and short.
If you make use of global register variables (that’s of course only reasonable with RISC), the actual code for the instructions generated by GCC is short and efficient. It also will be easy to inline it, resulting in zero-overhead compared to usually generated code (though you obviously wont make use of any optimization).
The big fun, however, is using that library and hooking it onto Ruby. Thanks to Ruby/DL, this can be done without a single additional line in C. I even can define opcodes in pure Ruby!
require 'sew'
s = Sew::Seam.new
s.add_call { p "foo!" }
5.times {
s.add_call { p "bar!" }
}
s.finalize
s.call
Putting everything together, we can do Behavior-Driven Virtual Machine Design (a.k.a. BDVMD) using RSpec. For example, have a look at this:
context "An empty stack" do
specify "should raise on drop" do
handler = Sew::Seam[lambda { raise Sew::StackUnderflow }]
lambda {
Sew::Seam[:stack_clear,
handler.address, :stack_set_handler,
:stack_drop].call
}.should.raise Sew::StackUnderflow
end
end
Truly a powerful thing to have at your fingertips, no?
More details and code to come soon.
NP: Alice In Chains—Man In The Box