Finally, the translation toolchain transfers the control to one its backends, which are re-
sponsible to actually generate the nal executable. Throughout the text, we will refer
to indicate the executable produced by the backend xxx. At the moment of
writing, there are three maintained backends:
The C backend is based on lltype and emits C source code, which is in turn compiled
or Visual C++
. The produced executable is equivalent to CPython,
as it is a native executable for the target platform, which currently can be Linux,
Mac OS X or Microsoft Windows.
The CLI backend [Cun06] [AACM07] is based on ootype and emits IL code for the
CLI, i.e. the virtual machine, at the core of the .NET Framework. Currently, both
Mono and Microsoft CLR are supported. The produced executable is roughly the
counterpart of IronPython, although the latter is much better integrated with the
The JVM backend [AACM07] is also based on ootype and emits bytecode for the
JVM. Although the backend is complete, the resulting
is of little practical
use because it still cannot access the hosting Java environment. It aims to be the
equivalent of Jython, a Python implementation written in Java and fully integrated
with the JVM.
3.3 JIT compiler generator
One of the most interesting components of the translation toolchain optionally generates
aJIT compiler from the source code of the interpreter, in an automated way.
From the end user point of view, the presence of the JIT compiler is completely transpar-
.Theexecutable contains both theoriginal interpreter and the generated JIT compiler:
the code starts being interpreted, then the JIT automatically compiles the so called hot
spots, i.e. the parts of the program that are executed more often and thus are most useful
to optimize. The generated compiler is a tracing JIT: Chapter 5 describes the general idea
behind it, and the current state of the art.
From the language implementor point of view, the generation of the JIT compiler is also
mostly transparent: the programmer only needs to annotate the source code of the inter-
Thegenerated code is ANSI C, butmakes optionallyuseof some speciccompiler extension depending
on the exact compiler we are using.
Actually, there are hooks that the end user can call to tune the various parameters of the JIT, even
though this it is not strictly necessary.
preter with few hints to drive the JIT compiler generator. These hints will be covered in
details in Chapter 6.
The JIT compiler generator is divided into a frontend and several backends: to avoid
confusion with the translation backends described above, we will refer to these as JIT
The frontend contains all the architecture independent code: its job is to analyze the
interpreted running program, nd the hotspots, and optimize them. Its nal result is a
low level architecture independent representation of the program to compile: the actual
generation of the executable machine code is done by the JIT backend. At the moment of
writing, there are two maintained JIT backends:
the x86 JIT backend, which generates code for the IA-32 architecture, i.e. for all the
common 32 bit Intel-compatible processors around. In the future, there will probably
be an x86
64 JIT backend to exploit the new instruction set of the 64 bit processors
the CLI JIT backend, which emits bytecode for the CLI virtual machine of the .NET
Framework. The generated bytecode will in turn be translated into machine code by
the .NET’s own JIT compiler (see Section 4.2 for details).
It is obvious that thechoiceof the JIT backend is dependent on the choice ofthe translation
backend: in particular, the x86 JIT backend is usable only in conjunction with the C
translation backend, and the same for the CLI JIT backend and its homonym translation
From an implementation point of view, the CLI JIT backend represent one of the major
contributions of this thesis, and will be described in detail in Chapter 7.
3.4 Why choosing PyPy
In summary, we think that PyPy is a very good choice for the kind of research we are
interested in. First of all, it is a mature project with an already working infrastructure
and a vibrant (although not so large) development community, which is ideal to develop
new ideas and solutions.
Moreover, the architecture and the modularity of the codebase allows us to concentrate
on the main issues we are interested in: in particular, we do not have to worry about the
complex semantics of Python, or to implement all the well known compilation techniques
that form the starting point for the \interesting" work. In addition, by reusing PyPy we
can apply the JIT compilation techniquesdescribed in this thesis toa real language, making
the evaluation much more trustworthy than if we applied them to e.g. a toy language.
Last but not least, by working on a meta level the nal result is more than just a fast
implementation of Python: by providing a multiplatform JIT compiler generator, PyPy
may become a very appealing framework to implement in a simple way all kinds of fast
and portable dynamic languages: see for example [BKL
08] for a description of the imple-
mentation of Smalltalk and in PyPy, and [BV09] for a description of PyGirl, a Gambeboy
Characterization of the target
From an implementation point of view, this thesis is about a dynamic JIT compiler for the
.NET platform, and in particular for its Virtual Machine (VM), the CLI. However, our
research is not strictly limited to the CLI, but it is potentially applicable to all the VMs
which are similar enough.
What does \similar enough" mean? Dening the concept is hard, because even a small
dierence about a particular feature of the VM can have a big impact on the implemen-
tation strategy. On the other hand, some of the solutions proposed in this thesis can be
applied equally well to virtual machines such as e.g. LLVM, whose underlying philosophy
is very dierent from the CLI’s one, but still share some commonality. LLVM [LA04]
stands for Low Level Virtual Machine and, as the name suggests, it is a virtual machine
whose instruction sets is particularly close tothe metal, contrarily to object oriented virtual
machines whose instruction set is more high level.
This is especially true when speaking about performance: when targeting a VM, it is hard
to accurately predict the performance behavior of the compiled programs, as they are
going to be executed on a number of dierent implementations of the virtual machine. For
example, .NET programs can berun either under theCLR, i.e. the original implementation
from Microsoft, or under Mono, an alternative open source implementation. Moreover, all
the implementations rapidly evolve over time, each new version providing slightly dierent
performance of each feature or construct of the VM.
However, for languages implementors, this is far from being ideal: during the development
of a compiler, they need to take a huge number of decisions about which constructs to use
and which not to use to implement each particular feature, but their assumption might
not be valid for alternative implementations or newer/older versions of the VM. As we will
see next in this chapter, this is especially true for more esoteric features or for constructs
used in a slightly dierent way than they have been designed.
4.1 Dynamic loading of new bytecode
By denition, a JIT compiler emits new code during the execution of the program: thus,
the rst and most important requirement that a VM needs to fulll in order to apply this
research is the ability of generating and loading new bytecode at runtime, although the
exact details vary between the CLI and the JVM.
For the CLI, the standard library of the .NET Framework, provides the necessary tools to
generate and load single methods, in the namespace
ular, by instantiating the class
it is possible to create new methods that
are not bound to any particular class.
On the other hand, in the JVM the minimum unit of loading is the class: by writing a
custom classloader, it is possible to generate and load new classes, and hence new methods,
y. Moreover, there are external libraries such as ASM[asm] and BCEL[bce] that
simplify the task of generating and loading these classes.
4.2 JIT layering
Anotherimportantfeatureshared by theCLI and theJVM is thepresenceofa JIT compiler
that translates the intermediate code into executable machine code
For the cases we are analyzing, the JIT compiler generated by the PyPy translation
toolchain emits code in the intermediate format proper of the hosting virtual machine.
Then, this intermediate code is in turn compiled into executable machine code by the JIT
compiler of the VM itself.
Thus, before being translated to executable machine code, the source code of our programs
passes through two dierent JIT compilation layers:
the high level layer, implemented by the PyPy translation toolchain
the low level layer, implemented by the VM itself
The specications of both the CLI and the JVM does not mandate the presence of a JIT compiler,
which should be considered an implementation detail. However, all the current implementations of the
CLI employ a JIT compiler, as well as all the most popular ones of the JVM.
JIT layering is a novel concept introduced with this thesis. In theory, if the low level
JIT compiler were good enough, it could produce optimal code for whatever construct it
encounters; in practice however, this rarely happens because either the low level JIT does
not employ advanced techniques (as it is the case of the CLI), or it cannot have a deep
understanding of the language semantics, thus missing a lot of optimization opportunities
(as proved by the current implementations of dynamic languages for the JVM).
By adding an additional JIT compilation layer, specialized fora specichigh-level language,
much better and ecient code can be generated without modifying the underlying VM.
This has several advantages:
since the high level JIT compilers do not touch any of the internals of the VM, it is
automaticaly portable across multiple implementations of the VM itself
usually, the existing VMs and their corresponding JITs are very complex pieces of
software, hard to modify: by writing our JIT on top of that, we avoid this problem;
moreover, this way it is much easier to experiment with new features
for the same reason, our approach is the only viable solution in cases we do not have
access to the codebase of the VM, as in the case of Microsoft .NET
The main drawback of this approach is that the high level JIT compiler adds a overhead:
if on the long run the time spent in the compiler is negligible compared to the time saved
by running the optimized code instead of the non optimized one, in the short run programs
could be slower.
As we will show in Chapter 8, this thesis proves that this approach is eective, and that
the resulting implementation of the language can be much faster than a more ordinary
implementation which relies only on the low level JIT. At the same time, we will also see
that the overhead of the high level JIT compiler is not worth of in case of short running
4.3 Immutable methods
As we seen in Section 4.1, both the CLI and the JVM oer the possibility of generating
and loading new code at runtime. However once it has been loaded, it is not possibile to
modify the code inside methods.
In particular, both VMs do not oer any support for incremental lazy compilation. For
example, there are cases in which we do not want to (or we cannot) eagerly generate the
bytecode for all the possible code paths inside a method: the usual solution is to generate
the code only for the hot paths, and delay the generation of the others until they are
reached for the rst time, or until they have proved to be hot enough to justify the cost of
Unfortunately, since it is not possible to modify a method, such a strategy cannot be
easily implemented. As we will see in Section 7.5, this restriction is a serious limitation for
language implementors who want to use adaptive techniques, and the proposed solutions
(or, better, workarounds) either have a negative impact on the time spent for the high level
JIT compilation or on the eciency of the generated code.
4.4 Tail calls
On the CLI, we can explicitly mark a method invocation as a tail call, assuming that it
is in tail position
.Tail calls behaves exactly as normal calls, with the dierence that the
call frame of the caller is removed from the stack and replaced by the call frame of the
callee: the result is that we can have an arbitrary deep chain of tail calls without any risk
of exhausting the stack, as it would happen with normal calls. This process is called tail
call optimization. Many functional programming languages such as Lisp, Scheme or Caml
implement tail call optimization very eciently, so that tail calls are as ecient as gotos
As we will see in Section 7.5, the problem of immutable methods could be partly solved
by the presence of ecient tail calls. However, this is not the case for the current VMs:
In the current implementations oftheCLI tail calls are tooslow to beused in practice.
In particular, on Microsoft CLR tail calls are about 10 times slower than normal calls,
while on Monothey are not even implemented correctly; in either case, they are orders
of magnitude slower than a simple goto, making the
for code that needs to be executed often.
At the moment Java HotSpot does not support tail call optimization (see for example
[Sch09] for a description of a possible implementation).
4.4.1 Tail calls benchmark
Figure 4.1 shows the source code used to benchmark the eciency of tail calls. Both static
methods compute the sum of the rst n integers, the rst using a loop, the second using
tail recursion. Note that, although the code in the gure is written in C# for clarity, the
Acall is in tail position if it is immediately followed by a return instruction
public static int loop(int n)
int sum = 0;
sum += n;
public static int tailcall(int n, int sum)
if (n < 0)
// note: C# does not support the tail.call instruction
return tailcall(n-1, sum+n);
Figure 4.1: Tail call benchmark
language does not support the
instruction, so we had to manually write the
algorithm directly in IL bytecode.
If tail call optimization is applied correctly, we would expect both versions to have about
the same performance. However, this is not the case: on Mono, the recursive version is
about 2.3 times slower than the loop, while on the CLR it is about 18.3 times slower.
Moreover, the Mono implementation of tail calls is known to be buggy[bug]: each tail call
leaks a bit of stack space, with the result that a deeply nested calls might result in a stack
ow. As we will see in Section 8.5, this might trigger a crash in the code generated.
4.5 Primitive types vs reference types
Both the CLI and the JVM support values of primitive types and of reference types
primitive types include numeric values, such as integers and
oating point numbers
of various sizes
reference types include objects, that is, class instances
Actually, the CLI also supports value types, enumerations, generic types, etc., but they outside the
scope of this paragraph.
Although values of primitive types are not objects, it is possible to convert them through
the mechanism of boxing: for each primitive type, there is a corresponding boxed type
which wraps the value into a proper object. Once you have a boxed object, it is possible
to get its original value by unboxing it.
Unfortunately, arithmetic operations between boxed values are much slower than between
primitive values, due to the extra level of indirection and to the fact that it is necessary
to allocate a new object on the heap to hold every intermediate result. Thus, to get high
performance it is very important to use primitive types instead of boxed whenever it is
4.6 Delegates (or: object oriented function pointers)
In the CLI, a delegate is a special kind of object that wraps a method (either instance
or static): once created, the delegates can be freely stored and passed around, just as
normal objects. The only method exposed by delegates is
,which calls the wrapped
Delegates are type-safe: to instantiate one, we rst need to create a delegate type, i.e.
aclass which inherits from
and species its exact signature.
Then, the VM can check that the signature of the delegate matches the signature of the
method being wrapped.
Moreover, it is possible to bind a delegate to an object, which will be automatically passed
as the rst argument to the method when the delegate is called. This is a limited form of
closure, which is usually exploited to create delegates that invokes an instance method on
The JVM does not oer anything similar to delegates natively. However, it is possible to
implement them by using the classical Command design pattern [GHJV93], i.e. by dening
an interface for each signature we are interested in, and a class that implements it for each
method we want to invoke.
Documents you may be interested
Documents you may be interested