doc/The-Assembler.html

   1 <html lang="en">
   2 <head>
   3 <title>The Assembler - The XORcyst Manual</title>
   4 <meta http-equiv="Content-Type" content="text/html">
   5 <meta name="description" content="The XORcyst Manual">
   6 <meta name="generator" content="makeinfo 4.7">
   7 <link title="Top" rel="start" href="index.html#Top">
   8 <link rel="prev" href="Overview.html#Overview" title="Overview">
   9 <link rel="next" href="The-Linker.html#The-Linker" title="The Linker">
  10 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
  11 <!--
  12 This is the manual for The XORcyst version 1.4.5.
  13
  14 Copyright (C) 2004, 2005 Kent Hansen.-->
  15 <meta http-equiv="Content-Style-Type" content="text/css">
  16 <style type="text/css"><!--
  17   pre.display { font-family:inherit }
  18   pre.format  { font-family:inherit }
  19   pre.smalldisplay { font-family:inherit; font-size:smaller }
  20   pre.smallformat  { font-family:inherit; font-size:smaller }
  21   pre.smallexample { font-size:smaller }
  22   pre.smalllisp    { font-size:smaller }
  23   span.sc { font-variant:small-caps }
  24   span.roman { font-family: serif; font-weight: normal; }
  25 --></style>
  26 </head>
  27 <body>
  28 <div class="node">
  29 <p>
  30 <a name="The-Assembler"></a>Next:&nbsp;<a rel="next" accesskey="n" href="The-Linker.html#The-Linker">The Linker</a>,
  31 Previous:&nbsp;<a rel="previous" accesskey="p" href="Overview.html#Overview">Overview</a>,
  32 Up:&nbsp;<a rel="up" accesskey="u" href="index.html#Top">Top</a>
  33 <hr><br>
  34 </div>
  35
  36 <h2 class="chapter">3 The Assembler</h2>
  37
  38 <p>The XORcyst assembler takes a <dfn>plaintext file</dfn> containing a sequence of 6502 instructions and assembler
  39 directives (collectively referred to as assembler statements), and produces from this an <dfn>object file</dfn> (usually referred to as a <dfn>unit</dfn>) that can be fed on to the XORcyst linker.
  40
  41    <p>The reason for not producing a plain 6502 binary is largely due to the aim of producing position-independent code- and data-segments. Specifically, in The XORcyst universe code and data labels are not meant to be assigned addresses until the final process of linking. Relocation on the 6502 isn't as simple as just adding an offset to an instruction operand; the 6502 has a special set of <dfn>zero-page instructions</dfn> which can be used when addresses fall in the range 0..255, and we want to utilize these whenever possible. So until we know whether, say, a data label will fall in the zero-page range or not, we don't know whether instructions which refer to this label will have a 1-byte or 2-byte operand. Using the non-zero-page (absolute) version of an instruction would, in the general case, ensure that the address will fit, but is too wasteful in size and processor cycles. So instead of hardcoded addresses the object code contains symbolic links which are up to the linker to resolve and translate. The object file can be thought of as a more compact, linker-ready version of the original assembler file.
  42
  43    <p>Another goal is to relieve the programmer of the burden of having to make sure that all variables in a large, complex program have unique addresses, by shifting as much of this responsibility onto the linker as possible. By postponing the mapping of symbol names to addresses until the link phase, variables can be added and moved to any part of the program without risking that it will interfere with the storage allocation of another part of the program.
  44
  45    <p>The object file format also enables complex resource sharing between units. An assembler expression can be arbitrarily complex, with references to any number of constants, variables or procedures defined in another unit.
  46
  47 <h3 class="section">3.1 Invoking the assembler (<span class="command">xasm</span>)</h3>
  48
  49 <p>The basic usage is
  50
  51    <p><span class="samp">xasm </span><var>assembler-file</var>
  52
  53    <p>where <var>assembler-file</var> is the (top-level) file of assembler statements.
  54 If all goes well, this will produce a similarly named file of extension <span class="file">.o</span>.
  55
  56    <p>For example,
  57 <pre class="example">     xasm driver.asm
  58 </pre>
  59    <p>produces the object file <span class="file">driver.o</span> if no errors are encountered by the assembler.
  60
  61 <h4 class="subsection">3.1.1 Switches</h4>
  62
  63      <dl>
  64 <dt><code>--define IDENT[=VALUE]</code><dd>Enters the identifier <code>IDENT</code> into the global symbol table, optionally assigning it the value <code>VALUE</code>. The default value is integer <code>0</code>. To assign a string, escape sequences must be used, i.e. <code>--define my_string=\"Have a nice day\"</code>.
  65
  66      <br><dt><code>--output FILE</code><dd>Directs output to the file <code>FILE</code> rather than the default file.
  67
  68      <br><dt><code>--swap-parens</code><dd>Changes the operators used to specify indirection from <code>[ ]</code> to <code>( )</code>. <code>[ ]</code> takes over <code>( )</code>'s role in arithmetic expressions.
  69
  70      <br><dt><code>--no-warn</code><dd>Suppresses assembler warning messages.
  71
  72      <br><dt><code>--verbose</code><dd>Instructs the assembler to print some messages about what it is doing.
  73
  74      <br><dt><code>--debug</code><dd>Retains file and line information, so that the linker can produce more descriptive warning and error messages.
  75
  76 </dl>
  77
  78    <p>For the full list of switches, run <code>xasm --help</code>.
  79
  80 <h3 class="section">3.2 Assembler statements</h3>
  81
  82 <p>(<strong>Note:</strong> This is not meant to be an introductory guide to 6502 assembly. Only the XORcyst-specific features and quirks will be explained. (For readers new to the 6502 and assemblers, <a href="http://www.google.com/search?q=6502+tutorial">http://www.google.com/search?q=6502+tutorial</a> may be a good starting point.)
  83
  84    <p>Because the assembler aims to enforce completely position-independent code, it does not allow the <code>.org </code><var>address</var> or <code>.base </code><var>address</var> directives commonly employed by 6502 assemblers. But most other constructs familiar to some people are in place. These and additional features will be explained subsequently. (For a complete list of directives, see <a href="Assembler-Directives.html#Assembler-Directives">Assembler Directives</a>.)
  85
  86    <p>In the code templates given in this section, any arguments enclosed in italic square brackets <em>[ ... ]</em> are optional.
  87
  88 <h4 class="subsection">3.2.1 A simple assembler example</h4>
  89
  90 <p>Here is a short assembler file which demonstrates basic functionality:
  91
  92 <pre class="example">     .dataseg                   ; begin data segment
  93
  94        my_variable .byte          ; define a byte variable
  95
  96        my_array .word[16]         ; define an array of 16 words
  97
  98      .codeseg                   ; begin code segment
  99
 100      .include "config.h"        ; include another source file
 101
 102      ; conditional definition of constant my_priority
 103      .ifdef HAVE_CONFIG_H
 104        my_priority = 10
 105      .else
 106        my_priority = 0
 107      .endif
 108
 109      ; declare a macro named store_const with parameters value and addr
 110      .macro store_const value, addr
 111        lda #value
 112        sta addr
 113      .endm                      ; end macro
 114
 115      ; a subroutine entrypoint is here
 116      .proc my_subroutine
 117        store_const $10, my_array+10           ; macro invocation
 118        store_const my_priority, my_variable   ; macro invocation
 119
 120        lda [$0A], y               ; NOTE: [ ] used for indirection, not ( ), unless --swap-parens switch used
 121        beq +
 122        jsr some_function          ; call external function
 123
 124      ; produce a short delay
 125      + ldx #60
 126        @@delay:
 127        dex
 128        bne @@delay
 129
 130      ; exit with my_priority in accumulator
 131        lda #my_priority
 132        rts
 133      .endp                      ; end of procedure definition
 134
 135      .public my_subroutine      ; make my_subroutine visible to other units
 136      .extrn some_function:proc  ; some_function is located in another unit
 137
 138      .end                       ; end of assembler input
 139 </pre>
 140    <p>While the example itself doesn't do anything useful, it shows how you can.
 141
 142 <h4 class="subsection">3.2.2 Literals</h4>
 143
 144 <p>The following kinds of integer literal are understood by the assembler (examples given in parentheses):
 145
 146      <ul>
 147 <li><strong>Decimal:</strong> Non-zero decimal digit followed by zero or more decimal digits (<code>1234</code>)
 148
 149      <li><strong>Hexadecimal:</strong> <code>0x</code> or <code>$</code> followed by one or more hexadecimal digits (<code>0xFACE, $BEEF</code>); one or more hexadecimal digits followed by <code>h</code> (<code>95Ah</code>). In the latter case numbers beginning with A through F must be preceded by a 0 (otherwise, say, <code>BABEh</code> would be interpreted as an identifier).
 150
 151      <li><strong>Binary:</strong> String of binary digits either preceded by <code>%</code> or succeeded by <code>b</code> (<code>%010110, 11001100b</code>).
 152
 153      <li><strong>Octal:</strong> A string of octal digits preceded by a 0 (<code>0755</code>).
 154
 155    </ul>
 156
 157    <p>String literals must be enclosed inbetween a pair of <code>"</code> (as in <code>"You are a dweeb"</code>).
 158
 159    <p>Character literals must be of the form <code>'A'</code>.
 160
 161 <h4 class="subsection">3.2.3 Identifiers</h4>
 162
 163 <p>Identifiers must conform to the regular expression <code>[[:alpha:]_][[:alnum:]_]*</code>. They are case sensitive.
 164 Examples of valid identifiers are
 165 <pre class="example">     no_brainer, schools_out, my_2nd_home, catch22, FunkyMama
 166 </pre>
 167    <p>Examples of invalid identifiers are
 168 <pre class="example">     3stooges, i-was-here, f00li$h
 169 </pre>
 170    <h4 class="subsection">3.2.4 Expressions</h4>
 171
 172 <p>Operands to assembler statements are expressions. An expression can contain any number of operators, identifiers and literals, and parentheses to group terms. The operators are the familiar arithmetic, binary, shift and relational ones (same as in C, pretty much), plus a few more which are useful when writing code for a machine which has a 16-bit address space but only 8-bit registers:
 173
 174      <ul>
 175 <li><code>&lt; </code><var>expression</var> : Get low 8 bits of <var>expression</var>
 176
 177      <li><code>&gt; </code><var>expression</var> : Get high 8 bits of <var>expression</var>
 178
 179    </ul>
 180
 181    <p><code>$</code> can be used in an expression to refer to the address where the current instruction is assembled.
 182
 183    <p><code>^</code><var>symbol</var> gets the bank number in which <var>symbol</var> is located (determined at link time).
 184
 185    <p><code>sizeof(</code><var>symbol</var><code>)</code> gets the size of <var>symbol</var> in bytes.
 186
 187    <p>When both operands to an operator are strings, the semantics are as follows: <var>str1</var> + <var>str2</var> concatenates; the relational operators perform string comparison; and all other operators are invalid. When one operand is a string and the other is an integer, the integer is implicitly converted to a string and concatenated with the string operand to produce a string as result.
 188
 189 <h4 class="subsection">3.2.5 Global labels</h4>
 190
 191 <p>There are two ways to define a global label.
 192
 193      <ul>
 194 <li><var>identifier</var><strong>:</strong> at the beginning of a source line defines the label <var>identifier</var> and assigns it the address of the current Program Counter. The colon is mandatory.
 195
 196      <li>Using the <code>.label</code> directive. It is of the form
 197
 198      <pre class="example">          .label <var>identifier</var> <em>[= </em><var>address</var><em>]</em> <em>[ : </em><var>type</var><em>]</em>
 199      </pre>
 200      <p>The absolute address of the label can be specified. If no address is given, the address is the current Program Counter.
 201
 202      <p>The type of data that the label addresses can also be specified. The valid type specifiers are <code>byte</code>, <code>word</code>, <code>dword</code>, or an identifier, which must be the name of a user-defined type.
 203
 204    </ul>
 205
 206 <h4 class="subsection">3.2.6 Local labels</h4>
 207
 208 <p>A <dfn>local label</dfn> is only visible in the scope consisting of the statements between two regular labels; or, for macros, only in the body of the macro. Just as a regular label must be unique in the whole program scope, a local label must be unique in the scope in which it is defined. The big advantage here is that the name of the local label can be reused as long as the definitions exist in different local scopes. Local labels are prefixed by <code>@@</code>. Unlike regular labels the local name itself can start with a digit, so for instance <code>@@10</code> is valid.
 209 The following example shows how a local label can exist unambigiously in two scopes.
 210 <pre class="example">     my_first_delay:        ; new local scope begins here
 211      ldx #100
 212      @@loop:                ; this label exists in my_first_delay's namespace
 213      dex
 214      bne @@loop
 215      rts
 216
 217      my_second_delay:        ; new local scope begins here
 218      ldy #200
 219      @@loop:                 ; this label exists in my_second_delay's namespace
 220      dey
 221      bne @@loop
 222      rts
 223 </pre>
 224    <p>As mentioned, the same local cannot be redefined within a scope. So having, say, two labels called <code>@@loop</code> in the same scope would produce an assembler error. Also, something like the following would produce an error:
 225 <pre class="example">     adc #10
 226      bvs @@handle_overflow
 227      barrier:
 228      rts
 229      @@handle_overflow:
 230      ; ...
 231 </pre>
 232    <p>since the branch instruction refers to a local label defined in a different scope (because of the strategic placement of the label <code>barrier</code>).
 233
 234 <h4 class="subsection">3.2.7 Forward/backward branches</h4>
 235
 236 <p>These are &ldquo;anonymous&rdquo; labels that can be redefined as many times as you want. A reference to a forward/backward label is resolved to the closest matching definition in the succeeding assembly statements (forward branches) or preceding assembly statements (backward branches).
 237
 238    <p>A forward branch consists of one or more (up to eight) consecutive <code>+</code> (plus) symbols. A backward branch consists of one or more (up to eight) consecutive <code>-</code> (minus) symbols. The following examples illustrate use of forward and backward branches.
 239
 240 <pre class="example">        lda $50
 241         bmi ++
 242         lda $40
 243         bne +         ; branches to first forward label
 244         ; do something ...
 245      +  dex           ; first forward label
 246         beq +         ; branches to second forward label
 247         ; do something more ...
 248      +  sta $40       ; second forward label
 249      ++ rts
 250
 251 </pre>
 252    <pre class="example">        lda $60
 253         bmi +
 254       - lda $2002      ; first backward label
 255         bne -          ; branches to first backward label
 256       - lda $2002      ; second backward label
 257         bne -          ; branches to second backward label
 258       + rts
 259 </pre>
 260    <h4 class="subsection">3.2.8 Equates</h4>
 261
 262 <p>There are three ways to define equates.
 263      <ul>
 264 <li>With the <code>=</code> operator. An equate defined this way can be redefined, and it obeys program order.
 265
 266      <pre class="example">          i = 10
 267           ldx #i
 268           i = i + 1
 269           ldy #i
 270      </pre>
 271      <p>In the example above, the assembler will substitute <code>10</code> for the first occurence of <code>i</code> and <code>11</code> for the last.
 272
 273      <li>With the <code>.equ</code> directive. An equate defined this way can only be defined once, and it does not obey program order (that is, it can be defined at a later point from where it is used). An equate of this type can be exported, so that it may be accessed by other units (more on exporting symbols later).
 274
 275      <pre class="example">          lib_version .equ $10
 276           lib_author .equ "The Godfather"
 277      </pre>
 278      <li>With the <code>.define</code> directive. This directive is semantically equal to <code>.equ</code>, but the value is optional, so you can write CPP-like defines, which is more compact. When no value is given, the symbol is defined as integer 0.
 279
 280      <pre class="example">          .ifndef MYHEADER_H
 281           .define MYHEADER_H
 282           ; ...
 283           .endif     ; !MYHEADER_H
 284      </pre>
 285      </ul>
 286
 287 <h4 class="subsection">3.2.9 Conditional assembly</h4>
 288
 289 <p>There are two ways to go about doing conditional assembly. One way is to test if a certain identifier has been defined (that is, equated) using the <code>.ifdef</code> directive, as shown in the next two templates.
 290
 291 <pre class="example">     .ifdef <var>identifier</var>
 292      <var>statements</var>
 293      .endif
 294 </pre>
 295    <pre class="example">     .ifdef <var>identifier</var>
 296      <var>true-statements</var>
 297      .else
 298      <var>false-statements</var>
 299      .endif
 300 </pre>
 301    <p>The other way is to test a full-fledged expression, as shown in the next template.
 302
 303 <pre class="example">     .if <var>expression</var>
 304      <var>statements</var>
 305      .elif <var>expression-II</var>
 306      <var>statements-II</var>
 307      .else
 308      <var>other-statements</var>
 309      .endif
 310 </pre>
 311    <h4 class="subsection">3.2.10 Macros</h4>
 312
 313 <p>Macro definitions are of the form
 314
 315 <pre class="example">     .macro <var>name</var> <em>[</em><var>parameter1</var><em>, </em><var>parameter2</var><em>, ...]</em>
 316      <var>statements</var>
 317      .endm
 318 </pre>
 319    <p>The parameters must be legal identifiers.
 320
 321    <p>To invoke (expand) the statements (body) of a macro in your program, issue the assembler statement <var>name</var>, where <var>name</var> is the macro name, followed by a comma-separated list of actual arguments, if the macro has any. The arguments will be substituted for the respective parameter names in the resulting statements.
 322
 323    <p>You can use local labels in the body of a macro. These labels will be completely local and unique to each expanded macro instance; any local labels defined outside the expanded body are not &ldquo;seen&rdquo;. For example, if you have the following macro definition
 324 <pre class="example">     .macro my_macro
 325      @@loop:
 326      dey
 327      bne @@loop
 328      .endm
 329 </pre>
 330    <p>and then use the macro as shown in the following
 331 <pre class="example">     @@loop:
 332      my_macro
 333      my_macro
 334      dex
 335      bne @@loop
 336 </pre>
 337    <p>each expansion of <code>my_macro</code> will have its own local label <code>@@loop</code>, neither of which interfere with the local label <code>@@loop</code> in the scope where the macro is invoked.
 338
 339    <p>Macros can be nested to arbitrary depth.
 340
 341 <h4 class="subsection">3.2.11 Anonymous macros</h4>
 342
 343 <p>An anonymous REPT (REPeaT) macro is of the form
 344
 345 <pre class="example">     i = 1
 346      <strong>.rept 8</strong>
 347      .db i
 348      i = i*2
 349      <strong>.endm</strong>
 350 </pre>
 351    <p>The statements between <code>rept</code> and <code>endm</code> will be repeated as many times as specified by the argument to <code>rept</code>. In the preceding example, the resulting expansion is equivalent to
 352
 353 <pre class="example">     .db 1, 2, 4, 8, 16, 32, 64, 128
 354 </pre>
 355    <p>Similarly, an anonymous WHILE macro is of the form
 356
 357 <pre class="example">     i = 1
 358      <strong>.while i &lt;= 128</strong>
 359      .db i
 360      i = i*2
 361      <strong>.endm</strong>
 362 </pre>
 363    <p>The statements between <code>while</code> and <code>endm</code> will be repeated while the expression given as argument to <code>while</code> is true (non-zero). The code inside the macro body is responsible for updating the variables involved in the expression, so that it will eventually become false. In the preceding example, the resulting expansion is equivalent to
 364
 365 <pre class="example">     .db 1, 2, 4, 8, 16, 32, 64, 128
 366 </pre>
 367    <h4 class="subsection">3.2.12 Including files</h4>
 368
 369 <p>There are two directives for including files.
 370
 371      <ul>
 372 <li><code>.incsrc "</code><var>src-file</var><code>"</code> (can also be written <code>.include</code>) interprets the specified file as textual assembler statements.
 373
 374      <li><code>.incbin "</code><var>bin-file</var><code>"</code> interprets the specified file as a binary buffer.
 375
 376    </ul>
 377
 378 <h4 class="subsection">3.2.13 Defining native data</h4>
 379
 380 <p>There is a class of directives for defining data storage and values.
 381
 382      <ul>
 383 <li><code>.db</code> <em>[</em><var>expression</var><em>, ...]</em> : Defines a string of bytes
 384 <li><code>.dw</code> <em>[</em><var>expression</var><em>, ...]</em> : Defines a string of words
 385 <li><code>.dd</code> <em>[</em><var>expression</var><em>, ...]</em> : Defines a string of doublewords
 386 <li><code>.char</code> <em>[</em><var>expression</var><em>, ...]</em> : Defines a string of characters (explained later)
 387 <li><code>.dsb</code> <em>[</em><var>expression</var><em>]</em> : Defines a storage of size <var>expression</var> bytes
 388 <li><code>.dsw</code> <em>[</em><var>expression</var><em>]</em> : Defines a storage of size <var>expression</var> words
 389 <li><code>.dsd</code> <em>[</em><var>expression</var><em>]</em> : Defines a storage of size <var>expression</var> doublewords
 390
 391    </ul>
 392
 393    <p>If no argument is given to the directive, a single item of the respective datatype is allocated, i.e.
 394 <pre class="example">     .db
 395 </pre>
 396    <p>is equivalent to
 397 <pre class="example">     .dsb 1
 398 </pre>
 399    <p>Alternatively, data arrays can be allocated using square brackets [ ] like in C:
 400
 401 <pre class="example">     .db[100]
 402 </pre>
 403    <p>which is equivalent to
 404 <pre class="example">     .dsb 100
 405 </pre>
 406    <p><code>.byte</code>, <code>.word</code> and <code>.dword</code> are more verbose aliases for <code>.db</code>, <code>.dw</code> and <code>.dd</code>, respectively.
 407
 408    <p>Note that data cannot be initialized in a data segment; only storage for the data can be allocated there.
 409
 410 <h3 class="heading">Defining non-ASCII text data</h3>
 411
 412 <p>Use the <code>.charmap</code> directive to specify a map file describing the mapping from regular ASCII-coded characters to your custom set. See <a href="Custom-Character-Maps.html#Custom-Character-Maps">Custom Character Maps</a> for a description of the format of such a custom character map file. Once the character map has been set, you can define your textual data by using the <code>.char</code>-directive. The information in the character map is applied to the given data by the assembler in order to transform it to a regular <code>.db</code> directive internally. The <code>.charmap</code> directive obeys program order, meaning you can use different character maps at different points in your code. If no character map has been set, <code>.char</code> is equivalent to <code>.db</code>. A simple example of the use of <code>.charmap</code> and <code>.char</code> follows.
 413
 414 <pre class="example">     .charmap "my_map.tbl"          ; set the custom character map to the one defined in my_map.tbl
 415      .char "It is a delight for me to be encoded in non-ASCII form", 0
 416 </pre>
 417    <h4 class="subsection">3.2.14 User-defined types</h4>
 418
 419 <p>There are currently four kinds of types that can be defined by the user. For further information on the concepts of their use, consult a C manual.
 420
 421      <ul>
 422 <li><strong>Structures</strong>.
 423
 424      <pre class="example">          .struc my_struc
 425           my_1st_field .db
 426           my_2nd_field .dw
 427           my_3rd_field .type my_other_struc
 428           .ends
 429      </pre>
 430      <p>Using &ldquo;flat&rdquo; addressing, structure members are accessed just like in C.
 431
 432      <pre class="example">          lda the_player.inventory.sword
 433      </pre>
 434      <p>For indirect addressing, the scope operator can be used to get the offset of the field.
 435
 436      <pre class="example">          ldy #(player_struct::inventory + inventory_struct::sword)
 437           lda [$00],y     ; load ($00).inventory.sword
 438      </pre>
 439      <li><strong>Unions</strong>.
 440
 441      <pre class="example">          .union my_union
 442           byte_value .db
 443           word_value .dw
 444           string_value .char[32]
 445           .ends
 446      </pre>
 447      <p>In a union, the fields are &ldquo;overlaid&rdquo;; that is, they share the same storage, and in general only one of the fields is used (at a time) for a particular instance of the union. A typical usage is to define a structure with two members: An enumerated type that selects one of the union fields, and the actual union containing the fields.
 448
 449      <p>Anonymous unions can be defined &ldquo;inline&rdquo; as part of a structure, as shown in the following example:
 450
 451      <pre class="example">          .struc my_struc
 452           type  .byte
 453           <strong>    .union</strong>
 454           <strong>    byte_value .byte[4]</strong>
 455           <strong>    word_value .word[2]</strong>
 456           <strong>    dword_value .dword</strong>
 457           <strong>    .ends</strong>
 458           .ends
 459      </pre>
 460      <p><code>byte_value</code>, <code>word_value</code> and <code>dword_value</code> may then be accessed as top-level members of the structure, but do in fact share storage.
 461
 462      <li><strong>Records</strong> (bitfields).
 463
 464      <pre class="example">          .record my_record top_bits:3, middle_bits:2, bottom_bits:3
 465      </pre>
 466      <p>A record can be maximum 8 bits (1 byte) wide. The bitfields are arranged from high to low; for example, in the record shown above, <code>top_bits</code> would occupy bits 7:5, <code>middle_bits</code> 4:3 and <code>bottom_bits</code> 2:0. Lower bits are padded if necessary to fill the byte.
 467
 468      <p>The scope operator (<code>::</code>) returns the number of right shifts necessary to bring the LSb of a bitfield into the LSb of the accumulator. The <code>MASK</code> operator returns a bitfield's logical AND mask. For example, using the record definition shown above,
 469
 470      <pre class="example">          my_record::middle_bits
 471      </pre>
 472      <p>returns <code>3</code>, and
 473      <pre class="example">          MASK my_record::middle_bits
 474      </pre>
 475      <p>returns <code>%00011000</code>. These are the two basic operations necessary to manipulate bitfields. The following macro shows how a field can be extracted:
 476
 477      <pre class="example">          ; IN:  ACC = instance of record `rec'
 478           ;      rec = record type identifier
 479           ;      fld = bitfield identifier
 480           ; OUT: ACC = field `fld' of `rec' in lower bits; upper bits zero
 481           .macro get_field rec, fld
 482               and #(mask rec::fld)       ; ditch other fields
 483               .rept rec::fld             ; shift down to bit 0
 484               lsr
 485               .endm
 486           .endm
 487      </pre>
 488      <li><strong>Enumerations</strong>.
 489
 490      <pre class="example">          .enum my_enum
 491           option_1 = 1
 492           option_2
 493           option_3
 494           option_4
 495           .ende
 496      </pre>
 497      <p>Note that an enumerated value is encoded as a <code>byte</code>.
 498
 499    </ul>
 500
 501 <h4 class="subsection">3.2.15 Defining data of user-defined types</h4>
 502
 503 <p>The general syntax is
 504
 505 <pre class="example">     .type <var>identifier</var>
 506 </pre>
 507    <p>or just
 508
 509 <pre class="example">     .<var>identifier</var>
 510 </pre>
 511    <p>Where <var>identifier</var> is the name of a user-defined type. This allocates <code>sizeof(</code><var>identifier</var><code>)</code> bytes of storage. Optionally, a value initializer can be specified (only in code segments). The form of this initializer depends on the type of data.
 512
 513      <ul>
 514 <li><strong>Structure</strong>. The initializer is of the form
 515
 516      <pre class="example">          { <var>field1-value</var>, <em>[</em><var>field2-value</var><em>, ..., ]</em> }
 517      </pre>
 518      <p>The field initializers must match the order of the fields in the type definition. To leave a field blank, leave its initializer empty. For example
 519
 520      <pre class="example">          my_array .type my_struc { 10, , "hello" }, { , , "cool!" }, { 45 }
 521      </pre>
 522      <p>defines three instances of type <code>my_struc</code>, with various fields explicitly initialized and others implicitly padded by the assembler.
 523
 524      <p>Since structures can contain sub-structures, so can a structure initializer. To initialize a sub-structure, simply start a new pair of { } and specify field values, recursively.
 525
 526      <li><strong>Union</strong>. The initializer is of the same form as a structure initializer, except only one of the fields in the union can be initialized.
 527
 528      <li><strong>Record</strong>. The initializer is of the same form as a structure initializer, but cannot contain sub-structure initializers (each bitfield is a &ldquo;simple&rdquo; value).
 529
 530      <li><strong>Enum</strong>. The initializer is simply an identifier that must be one of the identifiers appearing in the type definition.
 531
 532    </ul>
 533
 534    <p>To define an array of (uninitialized) values of a user-defined type, use the C-style method, for example:
 535
 536 <pre class="example">     my_array .my_struc<strong>[100]</strong>        ; array of 100 values of type my_struc
 537 </pre>
 538    <h4 class="subsection">3.2.16 Indexing symbols statically</h4>
 539
 540 <p>A symbol can be indexed statically using the C-style syntax
 541
 542 <pre class="example">     <var>identifier</var><strong>[</strong><var>expression</var><strong>]</strong>
 543 </pre>
 544    <p>For byte arrays, this is simply equivalent to the expression
 545
 546 <pre class="example">     <var>identifier</var> + <var>expression</var>
 547 </pre>
 548    <p>In general, it is equivalent to
 549
 550 <pre class="example">     <var>identifier</var> + <var>expression</var> * sizeof <var>identifier-type</var>
 551 </pre>
 552    <p>where <var>identifier-type</var> is the type of <var>identifier</var>.
 553
 554    <p>An example:
 555
 556 <pre class="example">     my_array .my_struc[10]        ; array of 10 values of type my_struc
 557      lda #1
 558      i = 0
 559      .while i &lt; 10
 560      sta my_array[i].my_field               ; initialize my_field to 1
 561      i = i + 1
 562      .endm
 563
 564 </pre>
 565    <h4 class="subsection">3.2.17 Procedures</h4>
 566
 567 <p>A procedure is of the form
 568
 569 <pre class="example">     .proc <var>name</var>
 570      <var>statements</var>
 571      .endp
 572 </pre>
 573    <p>Currently, there is no internal differentiation between a procedure and a label, but <code>.proc</code> is more specific than a label, so it improves the semantics.
 574
 575 <h4 class="subsection">3.2.18 Importing and exporting symbols</h4>
 576
 577 <p>To specify that a symbol used in your code is defined in a different unit, use the <code>.extrn</code> directive. This way you can call procedures or access constants exported by that unit. When you use the linker to create a final executable you also have to link in the unit(s) where the external symbols you use are defined.
 578
 579    <p>The <code>extrn</code> directive takes as arguments a comma-separated list of identifiers, followed by a colon (:), followed by a <var>symbol type</var>. The symbol type must be one of <code>BYTE</code>, <code>WORD</code>, <code>DWORD</code>, <code>LABEL</code>, <code>PROC</code>, or the name of a user-defined type, such as a structure or union.
 580
 581    <p>To export a symbol defined in your own code, thereby making it accessible to other units, use the <code>.public</code> directive. The next example shows how both directives may be used.
 582
 583 <pre class="example">     .extrn proc1, proc2, proc3 : proc  ; these are defined somewhere else
 584      my_proc:
 585      jsr proc1
 586      jsr proc2
 587      jsr proc3
 588      rts
 589      .public my_proc                ; make my_proc accessible to the outside world
 590
 591 </pre>
 592    <p>You can also specify the <code>.public</code> keyword directly when defining a variable, so you don't need a separate directive to make it public:
 593
 594 <pre class="example">     .public my_public_variable .word
 595 </pre>
 596    <h4 class="subsection">3.2.19 Controlling data mapping</h4>
 597
 598 <p>By default, the linker takes the members of data segments and maps them to the best free RAM locations it finds. However, there are times when you want to specify some constraints on the mapping. For example, you want the variable to always be mapped to the 6502's zero page. Or, you have a large array and want it to be aligned to a proper boundary so you don't risk suffering page cross penalties on indexed accesses.
 599
 600    <p>The XORcyst assembler provides the following ways to communicate mapping constraints to the linker.
 601
 602      <ul>
 603 <li>To specify that a data segment variable should always be mapped to zero page, precede its definition by the <code>.zeropage</code> keyword:
 604
 605      <pre class="example">          .zeropage my_zeropage_variable .byte
 606      </pre>
 607      <p>Alternatively, specify the <code>.zeropage</code> keyword as argument to the <code>.dataseg</code> directive:
 608
 609      <pre class="example">          .dataseg .zeropage       ; turn on .zeropage constraint
 610           my_1st_var .byte         ; .zeropage constraint will be set automatically
 611           my_2nd_var .word         ; ditto
 612           .dataseg                 ; turn off .zeropage constraint
 613      </pre>
 614      <li>To specify that one or more data variables should be aligned, use the <code>.align</code> directive. It takes a list of identifiers followed by the alignment boundary, for example
 615
 616      <pre class="example">          .dataseg
 617           my_array .byte[64]
 618           .align my_array 64       ; my_array should be aligned on a 64-byte boundary
 619      </pre>
 620      </ul>
 621
 622 <h4 class="subsection">3.2.20 An important note on indirect addressing</h4>
 623
 624 <p>If you're familiar with 6502 assembly, you know that parentheses ( ) are normally used to indicate indirect addressing modes. Unfortunately, this clashes with the use of parentheses in operand expressions. I couldn't get Bison (the parser generator) to deal with this context dependency. As I'm used to coding Intel X86 assembly, which uses brackets for indirection, I opted for [ ] as the default indirection operators. This could be a source of bugs, since if you type it the &ldquo;old&rdquo; way, <code>LDA ($FA),Y</code> is equivalent to <code>LDA $FA,Y</code> &ndash; which probably isn't what you wanted. However, by specifying the switch
 625
 626 <pre class="example">     --swap-parens
 627 </pre>
 628    <p>upon invoking the assembler, the behaviour of [ ] and ( ) will be reversed. That is, the &ldquo;normal&rdquo; way of specifying indirection, i.e. <code>LDA ($00),Y</code> is used, while expression operands are grouped with [ ], i.e. <code>A/[B+C]</code>.
 629
 630    </body></html>
 631