1 Mono Ahead Of Time Compiler
2 ===========================
4 The Ahead of Time compilation feature in Mono allows Mono to
5 precompile assemblies to minimize JIT time, reduce memory
6 usage at runtime and increase the code sharing across multiple
7 running Mono application.
9 To precompile an assembly use the following command:
11 mono --aot -O=all assembly.exe
13 The `--aot' flag instructs Mono to ahead-of-time compile your
14 assembly, while the -O=all flag instructs Mono to use all the
15 available optimizations.
17 * Position Independent Code
18 ---------------------------
20 On x86 and x86-64 the code generated by Ahead-of-Time compiled
21 images is position-independent code. This allows the same
22 precompiled image to be reused across multiple applications
23 without having different copies: this is the same way in which
24 ELF shared libraries work: the code produced can be relocated
27 The implementation of Position Independent Code had a
28 performance impact on Ahead-of-Time compiled images but
29 compiler bootstraps are still faster than JIT-compiled images,
30 specially with all the new optimizations provided by the Mono
33 * How to support Position Independent Code in new Mono Ports
34 ------------------------------------------------------------
36 Generated native code needs to reference various runtime
37 structures/functions whose address is only known at run
38 time. JITted code can simple embed the address into the native
39 code, but AOT code needs to do an indirection. This
40 indirection is done through a table called the Global Offset
41 Table (GOT), which is similar to the GOT table in the Elf
42 spec. When the runtime saves the AOT image, it saves some
43 information for each method describing the GOT table entries
44 used by that method. When loading a method from an AOT image,
45 the runtime will fill out the GOT entries needed by the
48 * Computing the address of the GOT
50 Methods which need to access the GOT first need to compute its
51 address. On the x86 it is done by code like this:
55 add <OFFSET TO GOT>, ebx
56 <save got addr to a register>
58 The variable representing the got is stored in
59 cfg->got_var. It is allways allocated to a global register to
60 prevent some problems with branches + basic blocks.
62 * Referencing GOT entries
64 Any time the native code needs to access some other runtime
65 structure/function (i.e. any time the backend calls
66 mono_add_patch_info ()), the code pointed by the patch needs
67 to load the value from the got. For example, instead of:
71 call *<OFFSET>(<GOT REG>)
73 Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
75 For more examples on the changes required, see
77 svn diff -r 37739:38213 mini-x86.c
79 * The Precompiled File Format
80 -----------------------------
82 We use the native object format of the platform. That way it
83 is possible to reuse existing tools like objdump and the
84 dynamic loader. All we need is a working assembler, i.e. we
85 write out a text file which is then passed to gas (the gnu
86 assembler) to generate the object file.
88 The precompiled image is stored in a file next to the original
89 assembly that is precompiled with the native extension for a shared
90 library (on Linux its ".so" to the generated file).
92 For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
94 The following things are saved in the object file and can be
95 looked up using the equivalent to dlsym:
99 A copy of the assembly GUID.
103 The format of the AOT file format.
107 The optimizations flags used to build this
112 Contains additional information needed by the runtime for using the
113 precompiled method, like the GOT entries it uses.
117 Maps method indexes to offsets in the method_infos array.
121 A table that lists all the internal calls
122 references by the precompiled image.
126 A list of assemblies referenced by this AOT
131 The equivalent to a procedure linkage table.
133 * Performance considerations
134 ----------------------------
136 Using AOT code is a trade-off which might lead to higher or slower performance,
137 depending on a lot of circumstances. Some of these are:
139 - AOT code needs to be loaded from disk before being used, so cold startup of
140 an application using AOT code MIGHT be slower than using JITed code. Warm
141 startup (when the code is already in the machines cache) should be faster.
142 Also, JITing code takes time, and the JIT compiler also need to load
143 additional metadata for the method from the disk, so startup can be faster
144 even in the cold startup case.
145 - AOT code is usually compiled with all optimizations turned on, while JITted
146 code is usually compiled with default optimizations, so the generated code
147 in the AOT case should be faster.
148 - JITted code can directly access runtime data structures and helper functions,
149 while AOT code needs to go through an indirection (the GOT) to access them,
150 so it will be slower and somewhat bigger as well.
151 - When JITting code, the JIT compiler needs to load a lot of metadata about
152 methods and types into memory.
153 - JITted code has better locality, meaning that if A method calls B, then
154 the native code for A and B is usually quite close in memory, leading to
155 better cache behaviour thus improved performance. In contrast, the native
156 code of methods inside the AOT file is in a somewhat random order.
161 - Currently, the runtime needs to setup some data structures and fill out
162 GOT entries before a method is first called. This means that even calls to
163 a method whose code is in the same AOT image need to go through the GOT,
164 instead of using a direct call.
165 - On x86, the generated code uses call 0, pop REG, add GOTOFFSET, REG to
166 materialize the GOT address. Newer versions of gcc use a separate function
167 to do this, maybe we need to do the same.
168 - Currently, we get vtable addresses from the GOT. Another solution would be
169 to store the data from the vtables in the .bss section, so accessing them
170 would involve less indirection.