2 The mmo object format is used exclusively together with Professor
3 Donald E.@: Knuth's educational 64-bit processor MMIX. The simulator
4 @command{mmix} which is available at
5 @url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}
6 understands this format. That package also includes a combined
7 assembler and linker called @command{mmixal}. The mmo format has
8 no advantages feature-wise compared to e.g. ELF. It is a simple
9 non-relocatable object format with no support for archives or
10 debugging information, except for symbol value information and
11 line numbers (which is not yet implemented in BFD). See
12 @url{http://www-cs-faculty.stanford.edu/~knuth/mmix.html} for more
13 information about MMIX. The ELF format is used for intermediate
14 object files in the BFD implementation.
16 @c We want to xref the symbol table node. A feature in "chew"
17 @c requires that "commands" do not contain spaces in the
18 @c arguments. Hence the hyphen in "Symbol-table".
22 * mmo section mapping::
25 @node File layout, Symbol-table, mmo, mmo
26 @subsection File layout
27 The mmo file contents is not partitioned into named sections as
28 with e.g.@: ELF. Memory areas is formed by specifying the
29 location of the data that follows. Only the memory area
30 @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} is executable, so
31 it is used for code (and constants) and the area
32 @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} is used for
33 writable data. @xref{mmo section mapping}.
35 There is provision for specifying ``special data'' of 65536
36 different types. We use type 80 (decimal), arbitrarily chosen the
37 same as the ELF @code{e_machine} number for MMIX, filling it with
38 section information normally found in ELF objects. @xref{mmo
41 Contents is entered as 32-bit words, xor:ed over previous
42 contents, always zero-initialized. A word that starts with the
43 byte @samp{0x98} forms a command called a @samp{lopcode}, where
44 the next byte distinguished between the thirteen lopcodes. The
45 two remaining bytes, called the @samp{Y} and @samp{Z} fields, or
46 the @samp{YZ} field (a 16-bit big-endian number), are used for
47 various purposes different for each lopcode. As documented in
48 @url{http://www-cs-faculty.stanford.edu/~knuth/mmixal-intro.ps.gz},
53 0x98000001. The next word is contents, regardless of whether it
54 starts with 0x98 or not.
57 0x9801YYZZ, where @samp{Z} is 1 or 2. This is a location
58 directive, setting the location for the next data to the next
59 32-bit word (for @math{Z = 1}) or 64-bit word (for @math{Z = 2}),
60 plus @math{Y * 2^56}. Normally @samp{Y} is 0 for the text segment
61 and 2 for the data segment.
64 0x9802YYZZ. Increase the current location by @samp{YZ} bytes.
67 0x9803YYZZ, where @samp{Z} is 1 or 2. Store the current location
68 as 64 bits into the location pointed to by the next 32-bit
69 (@math{Z = 1}) or 64-bit (@math{Z = 2}) word, plus @math{Y *
73 0x9804YYZZ. @samp{YZ} is stored into the current location plus
77 0x980500ZZ. @samp{Z} is 16 or 24. A value @samp{L} derived from
78 the following 32-bit word are used in a manner similar to
79 @samp{YZ} in lop_fixr: it is xor:ed into the current location
80 minus @math{4 * L}. The first byte of the word is 0 or 1. If it
81 is 1, then @math{L = (@var{lowest 24 bits of word}) - 2^Z}, if 0,
82 then @math{L = (@var{lowest 24 bits of word})}.
85 0x9806YYZZ. @samp{Y} is the file number, @samp{Z} is count of
86 32-bit words. Set the file number to @samp{Y} and the line
87 counter to 0. The next @math{Z * 4} bytes contain the file name,
88 padded with zeros if the count is not a multiple of four. The
89 same @samp{Y} may occur multiple times, but @samp{Z} must be 0 for
90 all but the first occurrence.
93 0x9807YYZZ. @samp{YZ} is the line number. Together with
94 lop_file, it forms the source location for the next 32-bit word.
95 Note that for each non-lopcode 32-bit word, line numbers are
96 assumed incremented by one.
99 0x9808YYZZ. @samp{YZ} is the type number. Data until the next
100 lopcode other than lop_quote forms special data of type @samp{YZ}.
101 @xref{mmo section mapping}.
103 Other types than 80, (or type 80 with a content that does not
104 parse) is stored in sections named @code{.MMIX.spec_data.@var{n}}
105 where @var{n} is the @samp{YZ}-type. The flags for such a
106 sections say not to allocate or load the data. The vma is 0.
107 Contents of multiple occurrences of special data @var{n} is
108 concatenated to the data of the previous lop_spec @var{n}s. The
109 location in data or code at which the lop_spec occurred is lost.
112 0x980901ZZ. The first lopcode in a file. The @samp{Z} field forms the
113 length of header information in 32-bit words, where the first word
114 tells the time in seconds since @samp{00:00:00 GMT Jan 1 1970}.
117 0x980a00ZZ. @math{Z > 32}. This lopcode follows after all
118 content-generating lopcodes in a program. The @samp{Z} field
119 denotes the value of @samp{rG} at the beginning of the program.
120 The following @math{256 - Z} big-endian 64-bit words are loaded
121 into global registers @samp{$G} @dots{} @samp{$255}.
124 0x980b0000. The next-to-last lopcode in a program. Must follow
125 immediately after the lop_post lopcode and its data. After this
126 lopcode follows all symbols in a compressed format
127 (@pxref{Symbol-table}).
130 0x980cYYZZ. The last lopcode in a program. It must follow the
131 lop_stab lopcode and its data. The @samp{YZ} field contains the
132 number of 32-bit words of symbol table information after the
133 preceding lop_stab lopcode.
136 Note that the lopcode "fixups"; @code{lop_fixr}, @code{lop_fixrx} and
137 @code{lop_fixo} are not generated by BFD, but are handled. They are
138 generated by @code{mmixal}.
140 This trivial one-label, one-instruction file:
146 can be represented this way in mmo:
149 0x98090101 - lop_pre, one 32-bit word with timestamp.
151 0x98010002 - lop_loc, text segment, using a 64-bit address.
152 Note that mmixal does not emit this for the file above.
153 0x00000000 - Address, high 32 bits.
154 0x00000000 - Address, low 32 bits.
155 0x98060002 - lop_file, 2 32-bit words for file-name.
157 0x2e730000 - ".s\0\0"
158 0x98070001 - lop_line, line 1.
159 0x00010203 - TRAP 1,2,3
160 0x980a00ff - lop_post, setting $255 to 0.
163 0x980b0000 - lop_stab for ":Main" = 0, serial 1.
164 0x203a4040 @xref{Symbol-table}.
169 0x980c0005 - lop_end; symbol table contained five 32-bit words.
171 @node Symbol-table, mmo section mapping, File layout, mmo
172 @subsection Symbol table format
173 From mmixal.w (or really, the generated mmixal.tex) in
174 @url{http://www-cs-faculty.stanford.edu/~knuth/programs/mmix.tar.gz}):
175 ``Symbols are stored and retrieved by means of a @samp{ternary
176 search trie}, following ideas of Bentley and Sedgewick. (See
177 ACM--SIAM Symp.@: on Discrete Algorithms @samp{8} (1997), 360--369;
178 R.@:Sedgewick, @samp{Algorithms in C} (Reading, Mass.@:
179 Addison--Wesley, 1998), @samp{15.4}.) Each trie node stores a
180 character, and there are branches to subtries for the cases where
181 a given character is less than, equal to, or greater than the
182 character in the trie. There also is a pointer to a symbol table
183 entry if a symbol ends at the current node.''
185 So it's a tree encoded as a stream of bytes. The stream of bytes
186 acts on a single virtual global symbol, adding and removing
187 characters and signalling complete symbol points. Here, we read
188 the stream and create symbols at the completion points.
190 First, there's a control byte @code{m}. If any of the listed bits
191 in @code{m} is nonzero, we execute what stands at the right, in
196 0x40 - Traverse left trie.
197 (Read a new command byte and recurse.)
200 0x2f - Read the next byte as a character and store it in the
201 current character position; increment character position.
202 Test the bits of @code{m}:
205 0x80 - The character is 16-bit (so read another byte,
206 merge into current character.
209 0xf - We have a complete symbol; parse the type, value
210 and serial number and do what should be done
211 with a symbol. The type and length information
215 j == 0xf: A register variable. The following
216 byte tells which register.
217 j <= 8: An absolute symbol. Read j bytes as the
218 big-endian number the symbol equals.
219 A j = 2 with two zero bytes denotes an
221 j > 8: As with j <= 8, but add (0x20 << 56)
222 to the value in the following j - 8
225 Then comes the serial number, as a variant of
226 uleb128, but better named ubeb128:
227 Read bytes and shift the previous value left 7
228 (multiply by 128). Add in the new byte, repeat
229 until a byte has bit 7 set. The serial number
230 is the computed value minus 128.
233 0x20 - Traverse middle trie. (Read a new command byte
234 and recurse.) Decrement character position.
237 0x10 - Traverse right trie. (Read a new command byte and
241 Let's look again at the @code{lop_stab} for the trivial file
242 (@pxref{File layout}).
245 0x980b0000 - lop_stab for ":Main" = 0, serial 1.
253 This forms the trivial trie (note that the path between ``:'' and
266 016e "n" is the last character in a full symbol, and
267 with a value represented in one byte.
269 81 The serial number is 1.
272 @node mmo section mapping, , Symbol-table, mmo
273 @subsection mmo section mapping
274 The implementation in BFD uses special data type 80 (decimal) to
275 encapsulate and describe named sections, containing e.g.@: debug
276 information. If needed, any datum in the encapsulation will be
277 quoted using lop_quote. First comes a 32-bit word holding the
278 number of 32-bit words containing the zero-terminated zero-padded
279 segment name. After the name there's a 32-bit word holding flags
280 describing the section type. Then comes a 64-bit big-endian word
281 with the section length (in bytes), then another with the section
282 start address. Depending on the type of section, the contents
283 might follow, zero-padded to 32-bit boundary. For a loadable
284 section (such as data or code), the contents might follow at some
285 later point, not necessarily immediately, as a lop_loc with the
286 same start address as in the section description, followed by the
287 contents. This in effect forms a descriptor that must be emitted
288 before the actual contents. Sections described this way must not
291 For areas that don't have such descriptors, synthetic sections are
292 formed by BFD. Consecutive contents in the two memory areas
293 @samp{0x0000@dots{}00} to @samp{0x01ff@dots{}ff} and
294 @samp{0x2000@dots{}00} to @samp{0x20ff@dots{}ff} are entered in
295 sections named @code{.text} and @code{.data} respectively. If an area
296 is not otherwise described, but would together with a neighboring
297 lower area be less than @samp{0x40000000} bytes long, it is joined
298 with the lower area and the gap is zero-filled. For other cases,
299 a new section is formed, named @code{.MMIX.sec.@var{n}}. Here,
300 @var{n} is a number, a running count through the mmo file,
303 A loadable section specified as:
306 .section secname,"ax"
307 TETRA 1,2,3,4,-1,-2009
311 and linked to address @samp{0x4}, is represented by the sequence:
314 0x98080050 - lop_spec 80
315 0x00000002 - two 32-bit words for the section name
318 0x00000033 - flags CODE, READONLY, LOAD, ALLOC
319 0x00000000 - high 32 bits of section length
320 0x0000001c - section length is 28 bytes; 6 * 4 + 1 + alignment to 32 bits
321 0x00000000 - high 32 bits of section address
322 0x00000004 - section address is 4
323 0x98010002 - 64 bits with address of following data
324 0x00000000 - high 32 bits of address
325 0x00000004 - low 32 bits: data starts at address 4
332 0x50000000 - 80 as a byte, padded with zeros.
335 Note that the lop_spec wrapping does not include the section
336 contents. Compare this to a non-loaded section specified as:
344 This, when linked to address @samp{0x200000000000001c}, is
348 0x98080050 - lop_spec 80
349 0x00000002 - two 32-bit words for the section name
352 0x00000010 - flag READONLY
353 0x00000000 - high 32 bits of section length
354 0x0000000c - section length is 12 bytes; 2 * 4 + 2 + alignment to 32 bits
355 0x20000000 - high 32 bits of address
356 0x0000001c - low 32 bits of address 0x200000000000001c
359 0x26280000 - 38, 40 as bytes, padded with zeros
362 For the latter example, the section contents must not be
363 loaded in memory, and is therefore specified as part of the
364 special data. The address is usually unimportant but might
365 provide information for e.g.@: the DWARF 2 debugging format.