xorcyst.texinfo

   1 % $Id: xorcyst.texinfo,v 1.10 2005/01/09 11:20:49 kenth Exp $
   2 % $Log: xorcyst.texinfo,v $
   3 % Revision 1.10  2005/01/09 11:20:49  kenth
   4 % xorcyst 1.4.5
   5 %
   6 % Revision 1.9  2005/01/05 09:40:48  kenth
   7 % xorcyst 1.4.4
   8 %
   9 % Revision 1.8  2005/01/05 02:29:50  kenth
  10 % xorcyst 1.4.3
  11 %
  12 % Revision 1.7  2004/12/29 21:51:58  kenth
  13 % xorcyst 1.4.2
  14 %
  15 % Revision 1.6  2004/12/25 02:25:51  kenth
  16 % xorcyst 1.4.1
  17 %
  18 % Revision 1.5  2004/12/19 20:56:25  kenth
  19 % xorcyst 1.4.0
  20 %
  21 % Revision 1.4  2004/12/16 13:26:22  kenth
  22 % xorcyst 1.3.5
  23 %
  24 % Revision 1.3  2004/12/14 01:51:07  kenth
  25 % xorcyst 1.3.0
  26 %
  27 % Revision 1.2  2004/12/11 02:12:02  kenth
  28 % xorcyst 1.2.0
  29 %
  30 % Revision 1.1  2004/12/10 20:49:34  kenth
  31 % Initial revision
  32 %
  33
  34 \input texinfo @c -*-texinfo-*-
  35 @c %**start of header
  36 @setfilename xorcyst.info
  37 @settitle The XORcyst Manual
  38 @c %**end of header
  39
  40 @copying
  41 This is the manual for The XORcyst version 1.4.5.
  42
  43 Copyright @copyright{} 2004, 2005 Kent Hansen.
  44 @end copying
  45
  46 @titlepage
  47 @title The XORcyst Manual
  48
  49 @c The following two commands start the copyright page.
  50 @page
  51 @vskip 0pt plus 1filll
  52 @insertcopying
  53 @end titlepage
  54
  55 @c Output the table of contents at the beginning.
  56 @contents
  57
  58 @ifnottex
  59 @node Top
  60 @top The XORcyst Manual
  61
  62 @insertcopying
  63 @end ifnottex
  64
  65 @menu
  66 * What's New::       An overview of the latest improvements.
  67 * Overview::         What is this thing?
  68 * The Assembler::    Describes the use and operation of the XORcyst assembler.
  69 * The Linker::       Describes the use and operation of the XORcyst linker.
  70 * Implementation Details:: Nice-to-know technical details concerning The XORcyst's implementation.
  71 * Known Bugs and Limitations:: Known bugs and limitations.
  72 * Assembler Directives:: Assembler directives.
  73 * Linker Script Commands:: Linker script commands.
  74 * Object Code Format:: Describes the format of the assembler's output.
  75 * Custom Character Maps :: Describes the valid contents of custom character maps.
  76 * Error and Warning Messages :: Alphabetical listing.
  77 @end menu
  78
  79 @node What's New
  80 @chapter What's New
  81
  82 @heading Version 1.4.5
  83
  84 @strong{Assembler}
  85
  86 @itemize
  87
  88 @item Fixed bug that prevented local labels from being used as operand to @code{DB}, @code{DW}, @code{DD} directives.
  89
  90 @item Fixed bug in processing of array of operands to @code{DB}, @code{DW}, @code{DD} directives (some high-level constructs were only reduced in first item).
  91
  92 @item Negative immediate operand no longer gives truncation warning as long as it fits in signed byte (@code{DB}, immediate mode instructions) or word (@code{DW}).
  93
  94 @item Added @code{BLT}, @code{BGE} as aliases for @code{BCC}, @code{BCS} (unsigned comparison).
  95
  96 @end itemize
  97
  98 @strong{Linker}
  99
 100 @itemize
 101
 102 @item Prints physical addresses of relocated public symbols when @code{--verbose}.
 103
 104 @end itemize
 105
 106 @heading Version 1.4.4
 107
 108 @strong{Linker}
 109
 110 @itemize
 111
 112 @item Fixed bug in RAM allocator.
 113
 114 @item Prints statistics on RAM management (total, used, left) when @code{--verbose}.
 115
 116 @end itemize
 117
 118 @heading Version 1.4.3
 119
 120 @strong{Assembler}
 121
 122 @itemize
 123
 124 @item Support for anonymous unions.
 125
 126 @item Fixed bug in result of @code{sizeof} operator when applied to an initialized structure variable.
 127
 128 @item Returns error code so that i.e. Make stops after the first erroneous invocation.
 129
 130 @end itemize
 131
 132 @strong{Linker}
 133
 134 @itemize
 135
 136 @item Returns error code so that i.e. Make stops after the first erroneous invocation.
 137
 138 @end itemize
 139
 140 @heading Version 1.4.2
 141
 142 @strong{Assembler}
 143
 144 @itemize
 145
 146 @item Symbols can be indexed statically, C-style; see section 3.2.16, ``Indexing symbols statically''.
 147
 148 @item @code{sizeof} operator now works correctly when applied to an array.
 149
 150 @item Fixed bug that lead to dysfunctional symbol table when using `=' equates.
 151
 152 @end itemize
 153
 154 @strong{Linker}
 155
 156 @itemize
 157
 158 @item Fixed bug in RAM allocator.
 159
 160 @item Fixed line number bug in error messages.
 161
 162 @item Removed duplicate error message (unresolved symbols).
 163
 164 @end itemize
 165
 166 @heading Version 1.4.1
 167
 168 This is a bugfix release.
 169
 170 @strong{Assembler}
 171
 172 @itemize
 173
 174 @item Fixed bug in processing of declaration of array of user-defined type (!).
 175
 176 @item Fixed bug that lead to no error message when declaring an uninitialized variable of non-existing user-defined type.
 177
 178 @end itemize
 179
 180 @strong{Linker}
 181
 182 @itemize
 183
 184 @item Fixed imperfection in allocation of alignment-constrained data.
 185
 186 @item Fixed memory leak in RAM allocator.
 187
 188 @end itemize
 189
 190 @heading Version 1.4.0
 191
 192 @strong{Assembler}
 193
 194 @itemize
 195
 196 @item Added @code{--debug} switch (short form: @code{-g}). When this switch is given, the assembler will retain file and line information in the object file, which the linker can use to produce more descriptive link-time warning and error messages.
 197
 198 @item @code{LABEL} directive can take a specific address as argument, so that ``pointers'' can be made to any part of memory (i.e. you can address memory location @code{$200} as a structure (or array of structures), without having to explicitly define storage for it).
 199
 200 @item Constraints can be communicated to the linker on how contents of data segments should be mapped to RAM; see section 3.2.19, ``Controlling data mapping''.
 201
 202 @item PUBLIC modifier can be specified directly when defining a variable.
 203
 204 @item Fixed a bug in code generation of exported string constants.
 205
 206 @end itemize
 207
 208 @strong{Linker}
 209
 210 @itemize
 211
 212 @item Uses the information generated from the assembler @code{--debug} switch to produce descriptive warning and error messages.
 213
 214 @item Rewrote data segment mapping function to take zeropage and alignment constraints into account.
 215
 216 @item Improved code relocation; as a result, the current PC ($) can be used freely in any expression, and the @code{origin} argument for the @code{pad} command works.
 217
 218 @item Fixed a linker script parsing bug.
 219
 220 @end itemize
 221
 222 @heading Version 1.3.5
 223
 224 @strong{Assembler}
 225
 226 @itemize
 227
 228 @item Added ability to declare storage for array of user-defined types, C-style (works for native types too).
 229
 230 @item Added ability to specify the type of data that a label addresses.
 231
 232 @item Fixed bug in code generation of storage of user-defined types.
 233
 234 @item Fixed some error detection and parsing woes.
 235
 236 @item Added @code{DEFINE} directive (same semantics as @code{EQU}, but potentially more compact).
 237
 238 @end itemize
 239
 240 @strong{Linker}
 241
 242 @itemize
 243
 244 @item Fixed a bad code relocation bug.
 245
 246 @item Implemented bank operator (@code{^}).
 247
 248 @item @code{--verbose} switch now gives helpful info on what the linker is doing.
 249
 250 @end itemize
 251
 252 @heading Version 1.3.0
 253
 254 @itemize
 255
 256 @item Added support for user-defined records (@code{RECORD} directive, @code{MASK} operator).
 257
 258 @item Added @code{WHILE} directive.
 259
 260 @item Implemented @code{ELIF} directive.
 261
 262 @item Improved @code{--define} switch: A value can now be assigned to the identifier (i.e. @code{--define a=10}).
 263
 264 @item @code{SIZEOF} operator now works on variable identifiers too.
 265
 266 @item Fixed bug that prevented single-character identifiers from working.
 267
 268 @item Added @code{--no-warn} switch to suppress assembler warnings.
 269
 270 @item Early support for @code{--verbose} switch.
 271
 272 @end itemize
 273
 274 @heading Version 1.2.0
 275
 276 @itemize
 277
 278 @item Added support for forward/backward branches (@code{-}, @code{--}, @code{+}, @code{++}, and so on, up to eight levels (@code{++++++++})).
 279
 280 @item Fixed bug that caused the assembler to run out of file handles when including a large number of files.
 281
 282 @item Fixed bug that caused @code{.db <a, >b} and other lines with @code{< >} to be parsed erroneously.
 283
 284 @end itemize
 285
 286 @heading Version 1.1.0
 287
 288 @itemize
 289
 290 @item Full support for user-defined types: Structures, unions and enums.
 291
 292 @item Better separation of symbol types. In the previous versions, @emph{everything} was a label. The assembler now distinguishes properly between labels, procedures, variables, constants and user-defined types.
 293
 294 @item Support for anonymous macros (@code{rept} directive).
 295
 296 @item Crash-bug fixes (@code{if} directive, @code{incbin} directive).
 297
 298 @item Preliminary support for @code{--define=@var{IDENT}} assembler switch (can be used in @code{ifdef} and @code{ifndef} directives).
 299
 300 @item Added @code{message} directive.
 301
 302 @item Improved literal expression folding. @var{"hello " + 123} will now be folded to @var{"hello 123"}.
 303
 304 @item Added assembler switch @code{--swap-parens}, which swaps the operators used for indirection from [ ] to ( ).
 305
 306 @item Syntax of @code{extrn} directive changed slightly: Must now specify the symbol type.
 307
 308 @item Relaxed syntax of @code{db}, @code{dsb} and similar directives. If no expression is given as argument, a single item is allocated.
 309
 310 @end itemize
 311
 312 @node Overview
 313 @chapter Overview
 314
 315 The XORcyst is a set of languages and commandline tools for assembling and linking code to be run on a 6502 processor.
 316
 317 @node The Assembler
 318 @chapter The Assembler
 319
 320 The XORcyst assembler takes a @dfn{plaintext file} containing a sequence of 6502 instructions and assembler
 321 directives (collectively referred to as assembler statements), and produces from this an @dfn{object file} (usually referred to as a @dfn{unit}) that can be fed on to the XORcyst linker.
 322
 323 The reason for not producing a plain 6502 binary is largely due to the aim of producing position-independent code- and data-segments. Specifically, in The XORcyst universe code and data labels are not meant to be assigned addresses until the final process of linking. Relocation on the 6502 isn't as simple as just adding an offset to an instruction operand; the 6502 has a special set of @dfn{zero-page instructions} which can be used when addresses fall in the range 0..255, and we want to utilize these whenever possible. So until we know whether, say, a data label will fall in the zero-page range or not, we don't know whether instructions which refer to this label will have a 1-byte or 2-byte operand. Using the non-zero-page (absolute) version of an instruction would, in the general case, ensure that the address will fit, but is too wasteful in size and processor cycles. So instead of hardcoded addresses the object code contains symbolic links which are up to the linker to resolve and translate. The object file can be thought of as a more compact, linker-ready version of the original assembler file.
 324
 325 Another goal is to relieve the programmer of the burden of having to make sure that all variables in a large, complex program have unique addresses, by shifting as much of this responsibility onto the linker as possible. By postponing the mapping of symbol names to addresses until the link phase, variables can be added and moved to any part of the program without risking that it will interfere with the storage allocation of another part of the program.
 326
 327 The object file format also enables complex resource sharing between units. An assembler expression can be arbitrarily complex, with references to any number of constants, variables or procedures defined in another unit.
 328
 329 @section Invoking the assembler (@command{xasm})
 330
 331 The basic usage is
 332
 333 @samp{@command{xasm} @var{assembler-file}}
 334
 335 where @var{assembler-file} is the (top-level) file of assembler statements.
 336 If all goes well, this will produce a similarly named file of extension @file{.o}.
 337
 338 For example,
 339 @example
 340 xasm driver.asm
 341 @end example
 342 produces the object file @file{driver.o} if no errors are encountered by the assembler.
 343
 344 @subsection Switches
 345
 346 @table @code
 347
 348 @item --define IDENT[=VALUE]
 349 Enters the identifier @code{IDENT} into the global symbol table, optionally assigning it the value @code{VALUE}. The default value is integer @code{0}. To assign a string, escape sequences must be used, i.e. @code{--define my_string=\"Have a nice day\"}.
 350
 351 @item --output FILE
 352 Directs output to the file @code{FILE} rather than the default file.
 353
 354 @item --swap-parens
 355 Changes the operators used to specify indirection from @code{[ ]} to @code{( )}. @code{[ ]} takes over @code{( )}'s role in arithmetic expressions.
 356
 357 @item --no-warn
 358 Suppresses assembler warning messages.
 359
 360 @item --verbose
 361 Instructs the assembler to print some messages about what it is doing.
 362
 363 @item --debug
 364 Retains file and line information, so that the linker can produce more descriptive warning and error messages.
 365
 366 @end table
 367
 368 For the full list of switches, run @code{xasm --help}.
 369
 370 @section Assembler statements
 371 (@strong{Note:} This is not meant to be an introductory guide to 6502 assembly. Only the XORcyst-specific features and quirks will be explained. (For readers new to the 6502 and assemblers, @uref{http://www.google.com/search?q=6502+tutorial} may be a good starting point.)
 372
 373 Because the assembler aims to enforce completely position-independent code, it does not allow the @code{.org @var{address}} or @code{.base @var{address}} directives commonly employed by 6502 assemblers. But most other constructs familiar to some people are in place. These and additional features will be explained subsequently. (For a complete list of directives, see @ref{Assembler Directives}.)
 374
 375 In the code templates given in this section, any arguments enclosed in italic square brackets @emph{[ ... ]} are optional.
 376
 377 @subsection A simple assembler example
 378
 379 Here is a short assembler file which demonstrates basic functionality:
 380
 381 @example
 382 .dataseg                   ; begin data segment
 383
 384   my_variable .byte          ; define a byte variable
 385
 386   my_array .word[16]         ; define an array of 16 words
 387
 388 .codeseg                   ; begin code segment
 389
 390 .include "config.h"        ; include another source file
 391
 392 ; conditional definition of constant my_priority
 393 .ifdef HAVE_CONFIG_H
 394   my_priority = 10
 395 .else
 396   my_priority = 0
 397 .endif
 398
 399 ; declare a macro named store_const with parameters value and addr
 400 .macro store_const value, addr
 401   lda #value
 402   sta addr
 403 .endm                      ; end macro
 404
 405 ; a subroutine entrypoint is here
 406 .proc my_subroutine
 407   store_const $10, my_array+10           ; macro invocation
 408   store_const my_priority, my_variable   ; macro invocation
 409
 410   lda [$0A], y               ; NOTE: [ ] used for indirection, not ( ), unless --swap-parens switch used
 411   beq +
 412   jsr some_function          ; call external function
 413
 414 ; produce a short delay
 415 + ldx #60
 416   @@@@delay:
 417   dex
 418   bne @@@@delay
 419
 420 ; exit with my_priority in accumulator
 421   lda #my_priority
 422   rts
 423 .endp                      ; end of procedure definition
 424
 425 .public my_subroutine      ; make my_subroutine visible to other units
 426 .extrn some_function:proc  ; some_function is located in another unit
 427
 428 .end                       ; end of assembler input
 429 @end example
 430
 431 While the example itself doesn't do anything useful, it shows how you can.
 432
 433 @subsection Literals
 434
 435 The following kinds of integer literal are understood by the assembler (examples given in parentheses):
 436
 437 @itemize
 438
 439 @item @strong{Decimal:} Non-zero decimal digit followed by zero or more decimal digits (@code{1234})
 440
 441 @item @strong{Hexadecimal:} @code{0x} or @code{$} followed by one or more hexadecimal digits (@code{0xFACE, $BEEF}); one or more hexadecimal digits followed by @code{h} (@code{95Ah}). In the latter case numbers beginning with A through F must be preceded by a 0 (otherwise, say, @code{BABEh} would be interpreted as an identifier).
 442
 443 @item @strong{Binary:} String of binary digits either preceded by @code{%} or succeeded by @code{b} (@code{%010110, 11001100b}).
 444
 445 @item @strong{Octal:} A string of octal digits preceded by a 0 (@code{0755}).
 446
 447 @end itemize
 448
 449 String literals must be enclosed inbetween a pair of @code{"} (as in @code{"You are a dweeb"}).
 450
 451 Character literals must be of the form @code{'A'}.
 452
 453 @subsection Identifiers
 454
 455 Identifiers must conform to the regular expression @code{[[:alpha:]_][[:alnum:]_]*}. They are case sensitive.
 456 Examples of valid identifiers are
 457 @example
 458 no_brainer, schools_out, my_2nd_home, catch22, FunkyMama
 459 @end example
 460 Examples of invalid identifiers are
 461 @example
 462 3stooges, i-was-here, f00li$h
 463 @end example
 464
 465 @subsection Expressions
 466
 467 Operands to assembler statements are expressions. An expression can contain any number of operators, identifiers and literals, and parentheses to group terms. The operators are the familiar arithmetic, binary, shift and relational ones (same as in C, pretty much), plus a few more which are useful when writing code for a machine which has a 16-bit address space but only 8-bit registers:
 468
 469 @itemize
 470
 471 @item @code{< @var{expression}} : Get low 8 bits of @var{expression}
 472
 473 @item @code{> @var{expression}} : Get high 8 bits of @var{expression}
 474
 475 @end itemize
 476
 477 @code{$} can be used in an expression to refer to the address where the current instruction is assembled.
 478
 479 @code{^@var{symbol}} gets the bank number in which @var{symbol} is located (determined at link time).
 480
 481 @code{sizeof(@var{symbol})} gets the size of @var{symbol} in bytes.
 482
 483 When both operands to an operator are strings, the semantics are as follows: @var{str1} + @var{str2} concatenates; the relational operators perform string comparison; and all other operators are invalid. When one operand is a string and the other is an integer, the integer is implicitly converted to a string and concatenated with the string operand to produce a string as result.
 484
 485 @subsection Global labels
 486
 487 There are two ways to define a global label.
 488
 489 @itemize
 490
 491 @item @code{@var{identifier}@strong{:}} at the beginning of a source line defines the label @var{identifier} and assigns it the address of the current Program Counter. The colon is mandatory.
 492
 493 @item Using the @code{.label} directive. It is of the form
 494
 495 @example
 496 .label @var{identifier} @emph{[= @var{address}]} @emph{[ : @var{type}]}
 497 @end example
 498
 499 The absolute address of the label can be specified. If no address is given, the address is the current Program Counter.
 500
 501 The type of data that the label addresses can also be specified. The valid type specifiers are @code{byte}, @code{word}, @code{dword}, or an identifier, which must be the name of a user-defined type.
 502
 503 @end itemize
 504
 505 @subsection Local labels
 506
 507 A @dfn{local label} is only visible in the scope consisting of the statements between two regular labels; or, for macros, only in the body of the macro. Just as a regular label must be unique in the whole program scope, a local label must be unique in the scope in which it is defined. The big advantage here is that the name of the local label can be reused as long as the definitions exist in different local scopes. Local labels are prefixed by @code{@@@@}. Unlike regular labels the local name itself can start with a digit, so for instance @code{@@@@10} is valid.
 508 The following example shows how a local label can exist unambigiously in two scopes.
 509 @example
 510 my_first_delay:        ; new local scope begins here
 511 ldx #100
 512 @@@@loop:                ; this label exists in my_first_delay's namespace
 513 dex
 514 bne @@@@loop
 515 rts
 516
 517 my_second_delay:        ; new local scope begins here
 518 ldy #200
 519 @@@@loop:                 ; this label exists in my_second_delay's namespace
 520 dey
 521 bne @@@@loop
 522 rts
 523 @end example
 524
 525 As mentioned, the same local cannot be redefined within a scope. So having, say, two labels called @code{@@@@loop} in the same scope would produce an assembler error. Also, something like the following would produce an error:
 526 @example
 527 adc #10
 528 bvs @@@@handle_overflow
 529 barrier:
 530 rts
 531 @@@@handle_overflow:
 532 ; ...
 533 @end example
 534 since the branch instruction refers to a local label defined in a different scope (because of the strategic placement of the label @code{barrier}).
 535
 536 @subsection Forward/backward branches
 537
 538 These are ``anonymous'' labels that can be redefined as many times as you want. A reference to a forward/backward label is resolved to the closest matching definition in the succeeding assembly statements (forward branches) or preceding assembly statements (backward branches).
 539
 540 A forward branch consists of one or more (up to eight) consecutive @code{+} (plus) symbols. A backward branch consists of one or more (up to eight) consecutive @code{-} (minus) symbols. The following examples illustrate use of forward and backward branches.
 541
 542 @example
 543    lda $50
 544    bmi ++
 545    lda $40
 546    bne +         ; branches to first forward label
 547    ; do something ...
 548 +  dex           ; first forward label
 549    beq +         ; branches to second forward label
 550    ; do something more ...
 551 +  sta $40       ; second forward label
 552 ++ rts
 553
 554 @end example
 555
 556 @example
 557    lda $60
 558    bmi +
 559  - lda $2002      ; first backward label
 560    bne -          ; branches to first backward label
 561  - lda $2002      ; second backward label
 562    bne -          ; branches to second backward label
 563  + rts
 564 @end example
 565
 566 @subsection Equates
 567
 568 There are three ways to define equates.
 569 @itemize
 570
 571 @item With the @code{=} operator. An equate defined this way can be redefined, and it obeys program order.
 572
 573 @example
 574 i = 10
 575 ldx #i
 576 i = i + 1
 577 ldy #i
 578 @end example
 579
 580 In the example above, the assembler will substitute @code{10} for the first occurence of @code{i} and @code{11} for the last.
 581
 582 @item With the @code{.equ} directive. An equate defined this way can only be defined once, and it does not obey program order (that is, it can be defined at a later point from where it is used). An equate of this type can be exported, so that it may be accessed by other units (more on exporting symbols later).
 583
 584 @example
 585 lib_version .equ $10
 586 lib_author .equ "The Godfather"
 587 @end example
 588
 589 @item With the @code{.define} directive. This directive is semantically equal to @code{.equ}, but the value is optional, so you can write CPP-like defines, which is more compact. When no value is given, the symbol is defined as integer 0.
 590
 591 @example
 592 .ifndef MYHEADER_H
 593 .define MYHEADER_H
 594 ; ...
 595 .endif     ; !MYHEADER_H
 596 @end example
 597
 598 @end itemize
 599
 600 @subsection Conditional assembly
 601
 602 There are two ways to go about doing conditional assembly. One way is to test if a certain identifier has been defined (that is, equated) using the @code{.ifdef} directive, as shown in the next two templates.
 603
 604 @example
 605 .ifdef @var{identifier}
 606 @var{statements}
 607 .endif
 608 @end example
 609
 610 @example
 611 .ifdef @var{identifier}
 612 @var{true-statements}
 613 .else
 614 @var{false-statements}
 615 .endif
 616 @end example
 617
 618 The other way is to test a full-fledged expression, as shown in the next template.
 619
 620 @example
 621 .if @var{expression}
 622 @var{statements}
 623 .elif @var{expression-II}
 624 @var{statements-II}
 625 .else
 626 @var{other-statements}
 627 .endif
 628 @end example
 629
 630 @subsection Macros
 631
 632 Macro definitions are of the form
 633
 634 @example
 635 .macro @var{name} @emph{[@var{parameter1}, @var{parameter2}, ...]}
 636 @var{statements}
 637 .endm
 638 @end example
 639 The parameters must be legal identifiers.
 640
 641 To invoke (expand) the statements (body) of a macro in your program, issue the assembler statement @code{@var{name}}, where @var{name} is the macro name, followed by a comma-separated list of actual arguments, if the macro has any. The arguments will be substituted for the respective parameter names in the resulting statements.
 642
 643 You can use local labels in the body of a macro. These labels will be completely local and unique to each expanded macro instance; any local labels defined outside the expanded body are not ``seen''. For example, if you have the following macro definition
 644 @example
 645 .macro my_macro
 646 @@@@loop:
 647 dey
 648 bne @@@@loop
 649 .endm
 650 @end example
 651 and then use the macro as shown in the following
 652 @example
 653 @@@@loop:
 654 my_macro
 655 my_macro
 656 dex
 657 bne @@@@loop
 658 @end example
 659 each expansion of @code{my_macro} will have its own local label @code{@@@@loop}, neither of which interfere with the local label @code{@@@@loop} in the scope where the macro is invoked.
 660
 661 Macros can be nested to arbitrary depth.
 662
 663 @subsection Anonymous macros
 664
 665 An anonymous REPT (REPeaT) macro is of the form
 666
 667 @example
 668 i = 1
 669 @strong{.rept 8}
 670 .db i
 671 i = i*2
 672 @strong{.endm}
 673 @end example
 674
 675 The statements between @code{rept} and @code{endm} will be repeated as many times as specified by the argument to @code{rept}. In the preceding example, the resulting expansion is equivalent to
 676
 677 @example
 678 .db 1, 2, 4, 8, 16, 32, 64, 128
 679 @end example
 680
 681 Similarly, an anonymous WHILE macro is of the form
 682
 683 @example
 684 i = 1
 685 @strong{.while i <= 128}
 686 .db i
 687 i = i*2
 688 @strong{.endm}
 689 @end example
 690
 691 The statements between @code{while} and @code{endm} will be repeated while the expression given as argument to @code{while} is true (non-zero). The code inside the macro body is responsible for updating the variables involved in the expression, so that it will eventually become false. In the preceding example, the resulting expansion is equivalent to
 692
 693 @example
 694 .db 1, 2, 4, 8, 16, 32, 64, 128
 695 @end example
 696
 697 @subsection Including files
 698
 699 There are two directives for including files.
 700
 701 @itemize
 702
 703 @item @code{.incsrc "@var{src-file}"} (can also be written @code{.include}) interprets the specified file as textual assembler statements.
 704
 705 @item @code{.incbin "@var{bin-file}"} interprets the specified file as a binary buffer.
 706
 707 @end itemize
 708
 709 @subsection Defining native data
 710
 711 There is a class of directives for defining data storage and values.
 712
 713 @itemize
 714
 715 @item @code{.db} @emph{[@var{expression}, ...]} : Defines a string of bytes
 716 @item @code{.dw} @emph{[@var{expression}, ...]} : Defines a string of words
 717 @item @code{.dd} @emph{[@var{expression}, ...]} : Defines a string of doublewords
 718 @item @code{.char} @emph{[@var{expression}, ...]} : Defines a string of characters (explained later)
 719 @item @code{.dsb} @emph{[@var{expression}]} : Defines a storage of size @var{expression} bytes
 720 @item @code{.dsw} @emph{[@var{expression}]} : Defines a storage of size @var{expression} words
 721 @item @code{.dsd} @emph{[@var{expression}]} : Defines a storage of size @var{expression} doublewords
 722
 723 @end itemize
 724
 725 If no argument is given to the directive, a single item of the respective datatype is allocated, i.e.
 726 @example
 727 .db
 728 @end example
 729 is equivalent to
 730 @example
 731 .dsb 1
 732 @end example
 733
 734 Alternatively, data arrays can be allocated using square brackets [ ] like in C:
 735
 736 @example
 737 .db[100]
 738 @end example
 739 which is equivalent to
 740 @example
 741 .dsb 100
 742 @end example
 743
 744 @code{.byte}, @code{.word} and @code{.dword} are more verbose aliases for @code{.db}, @code{.dw} and @code{.dd}, respectively.
 745
 746 Note that data cannot be initialized in a data segment; only storage for the data can be allocated there.
 747
 748 @heading Defining non-ASCII text data
 749
 750 Use the @code{.charmap} directive to specify a map file describing the mapping from regular ASCII-coded characters to your custom set. See @ref{Custom Character Maps} for a description of the format of such a custom character map file. Once the character map has been set, you can define your textual data by using the @code{.char}-directive. The information in the character map is applied to the given data by the assembler in order to transform it to a regular @code{.db} directive internally. The @code{.charmap} directive obeys program order, meaning you can use different character maps at different points in your code. If no character map has been set, @code{.char} is equivalent to @code{.db}. A simple example of the use of @code{.charmap} and @code{.char} follows.
 751
 752 @example
 753 .charmap "my_map.tbl"          ; set the custom character map to the one defined in my_map.tbl
 754 .char "It is a delight for me to be encoded in non-ASCII form", 0
 755 @end example
 756
 757 @subsection User-defined types
 758
 759 There are currently four kinds of types that can be defined by the user. For further information on the concepts of their use, consult a C manual.
 760
 761 @itemize
 762
 763 @item @strong{Structures}.
 764
 765 @example
 766 .struc my_struc
 767 my_1st_field .db
 768 my_2nd_field .dw
 769 my_3rd_field .type my_other_struc
 770 .ends
 771 @end example
 772
 773 Using ``flat'' addressing, structure members are accessed just like in C.
 774
 775 @example
 776 lda the_player.inventory.sword
 777 @end example
 778
 779 For indirect addressing, the scope operator can be used to get the offset of the field.
 780
 781 @example
 782 ldy #(player_struct::inventory + inventory_struct::sword)
 783 lda [$00],y     ; load ($00).inventory.sword
 784 @end example
 785
 786 @item @strong{Unions}.
 787
 788 @example
 789 .union my_union
 790 byte_value .db
 791 word_value .dw
 792 string_value .char[32]
 793 .ends
 794 @end example
 795
 796 In a union, the fields are ``overlaid''; that is, they share the same storage, and in general only one of the fields is used (at a time) for a particular instance of the union. A typical usage is to define a structure with two members: An enumerated type that selects one of the union fields, and the actual union containing the fields.
 797
 798 Anonymous unions can be defined ``inline'' as part of a structure, as shown in the following example:
 799
 800 @example
 801 .struc my_struc
 802 type    .byte
 803 @strong{    .union}
 804 @strong{    byte_value .byte[4]}
 805 @strong{    word_value .word[2]}
 806 @strong{    dword_value .dword}
 807 @strong{    .ends}
 808 .ends
 809 @end example
 810
 811 @code{byte_value}, @code{word_value} and @code{dword_value} may then be accessed as top-level members of the structure, but do in fact share storage.
 812
 813 @item @strong{Records} (bitfields).
 814
 815 @example
 816 .record my_record top_bits:3, middle_bits:2, bottom_bits:3
 817 @end example
 818
 819 A record can be maximum 8 bits (1 byte) wide. The bitfields are arranged from high to low; for example, in the record shown above, @code{top_bits} would occupy bits 7:5, @code{middle_bits} 4:3 and @code{bottom_bits} 2:0. Lower bits are padded if necessary to fill the byte.
 820
 821 The scope operator (@code{::}) returns the number of right shifts necessary to bring the LSb of a bitfield into the LSb of the accumulator. The @code{MASK} operator returns a bitfield's logical AND mask. For example, using the record definition shown above,
 822
 823 @example
 824 my_record::middle_bits
 825 @end example
 826 returns @code{3}, and
 827 @example
 828 MASK my_record::middle_bits
 829 @end example
 830 returns @code{%00011000}. These are the two basic operations necessary to manipulate bitfields. The following macro shows how a field can be extracted:
 831
 832 @example
 833 ; IN:  ACC = instance of record `rec'
 834 ;      rec = record type identifier
 835 ;      fld = bitfield identifier
 836 ; OUT: ACC = field `fld' of `rec' in lower bits; upper bits zero
 837 .macro get_field rec, fld
 838     and #(mask rec::fld)       ; ditch other fields
 839     .rept rec::fld             ; shift down to bit 0
 840     lsr
 841     .endm
 842 .endm
 843 @end example
 844
 845 @item @strong{Enumerations}.
 846
 847 @example
 848 .enum my_enum
 849 option_1 = 1
 850 option_2
 851 option_3
 852 option_4
 853 .ende
 854 @end example
 855
 856 Note that an enumerated value is encoded as a @code{byte}.
 857
 858 @end itemize
 859
 860 @subsection Defining data of user-defined types
 861
 862 The general syntax is
 863
 864 @example
 865 .type @var{identifier}
 866 @end example
 867
 868 or just
 869
 870 @example
 871 .@var{identifier}
 872 @end example
 873
 874 Where @var{identifier} is the name of a user-defined type. This allocates @code{sizeof(@var{identifier})} bytes of storage. Optionally, a value initializer can be specified (only in code segments). The form of this initializer depends on the type of data.
 875
 876 @itemize
 877
 878 @item @strong{Structure}. The initializer is of the form
 879
 880 @example
 881 @{ @var{field1-value}, @emph{[@var{field2-value}, ..., ]} @}
 882 @end example
 883
 884 The field initializers must match the order of the fields in the type definition. To leave a field blank, leave its initializer empty. For example
 885
 886 @example
 887 my_array .type my_struc @{ 10, , "hello" @}, @{ , , "cool!" @}, @{ 45 @}
 888 @end example
 889
 890 defines three instances of type @code{my_struc}, with various fields explicitly initialized and others implicitly padded by the assembler.
 891
 892 Since structures can contain sub-structures, so can a structure initializer. To initialize a sub-structure, simply start a new pair of @{ @} and specify field values, recursively.
 893
 894 @item @strong{Union}. The initializer is of the same form as a structure initializer, except only one of the fields in the union can be initialized.
 895
 896 @item @strong{Record}. The initializer is of the same form as a structure initializer, but cannot contain sub-structure initializers (each bitfield is a ``simple'' value).
 897
 898 @item @strong{Enum}. The initializer is simply an identifier that must be one of the identifiers appearing in the type definition.
 899
 900 @end itemize
 901
 902 To define an array of (uninitialized) values of a user-defined type, use the C-style method, for example:
 903
 904 @example
 905 my_array .my_struc@strong{[100]}        ; array of 100 values of type my_struc
 906 @end example
 907
 908 @subsection Indexing symbols statically
 909
 910 A symbol can be indexed statically using the C-style syntax
 911
 912 @example
 913 @var{identifier}@strong{[}@var{expression}@strong{]}
 914 @end example
 915
 916 For byte arrays, this is simply equivalent to the expression
 917
 918 @example
 919 @var{identifier} + @var{expression}
 920 @end example
 921
 922 In general, it is equivalent to
 923
 924 @example
 925 @var{identifier} + @var{expression} * sizeof @var{identifier-type}
 926 @end example
 927
 928 where @var{identifier-type} is the type of @var{identifier}.
 929
 930 An example:
 931
 932 @example
 933 my_array .my_struc[10]        ; array of 10 values of type my_struc
 934 lda #1
 935 i = 0
 936 .while i < 10
 937 sta my_array[i].my_field               ; initialize my_field to 1
 938 i = i + 1
 939 .endm
 940
 941 @end example
 942
 943 @subsection Procedures
 944
 945 A procedure is of the form
 946
 947 @example
 948 .proc @var{name}
 949 @var{statements}
 950 .endp
 951 @end example
 952
 953 Currently, there is no internal differentiation between a procedure and a label, but @code{.proc} is more specific than a label, so it improves the semantics.
 954
 955 @subsection Importing and exporting symbols
 956
 957 To specify that a symbol used in your code is defined in a different unit, use the @code{.extrn} directive. This way you can call procedures or access constants exported by that unit. When you use the linker to create a final executable you also have to link in the unit(s) where the external symbols you use are defined.
 958
 959 The @code{extrn} directive takes as arguments a comma-separated list of identifiers, followed by a colon (:), followed by a @var{symbol type}. The symbol type must be one of @code{BYTE}, @code{WORD}, @code{DWORD}, @code{LABEL}, @code{PROC}, or the name of a user-defined type, such as a structure or union.
 960
 961 To export a symbol defined in your own code, thereby making it accessible to other units, use the @code{.public} directive. The next example shows how both directives may be used.
 962
 963 @example
 964 .extrn proc1, proc2, proc3 : proc  ; these are defined somewhere else
 965 my_proc:
 966 jsr proc1
 967 jsr proc2
 968 jsr proc3
 969 rts
 970 .public my_proc                ; make my_proc accessible to the outside world
 971
 972 @end example
 973
 974 You can also specify the @code{.public} keyword directly when defining a variable, so you don't need a separate directive to make it public:
 975
 976 @example
 977 .public my_public_variable .word
 978 @end example
 979
 980 @subsection Controlling data mapping
 981 By default, the linker takes the members of data segments and maps them to the best free RAM locations it finds. However, there are times when you want to specify some constraints on the mapping. For example, you want the variable to always be mapped to the 6502's zero page. Or, you have a large array and want it to be aligned to a proper boundary so you don't risk suffering page cross penalties on indexed accesses.
 982
 983 The XORcyst assembler provides the following ways to communicate mapping constraints to the linker.
 984
 985 @itemize
 986
 987 @item To specify that a data segment variable should always be mapped to zero page, precede its definition by the @code{.zeropage} keyword:
 988
 989 @example
 990 .zeropage my_zeropage_variable .byte
 991 @end example
 992
 993 Alternatively, specify the @code{.zeropage} keyword as argument to the @code{.dataseg} directive:
 994
 995 @example
 996 .dataseg .zeropage       ; turn on .zeropage constraint
 997 my_1st_var .byte         ; .zeropage constraint will be set automatically
 998 my_2nd_var .word         ; ditto
 999 .dataseg                 ; turn off .zeropage constraint
1000 @end example
1001
1002 @item To specify that one or more data variables should be aligned, use the @code{.align} directive. It takes a list of identifiers followed by the alignment boundary, for example
1003
1004 @example
1005 .dataseg
1006 my_array .byte[64]
1007 .align my_array 64       ; my_array should be aligned on a 64-byte boundary
1008 @end example
1009
1010 @end itemize
1011
1012 @subsection An important note on indirect addressing
1013
1014 If you're familiar with 6502 assembly, you know that parentheses ( ) are normally used to indicate indirect addressing modes. Unfortunately, this clashes with the use of parentheses in operand expressions. I couldn't get Bison (the parser generator) to deal with this context dependency. As I'm used to coding Intel X86 assembly, which uses brackets for indirection, I opted for [ ] as the default indirection operators. This could be a source of bugs, since if you type it the ``old'' way, @code{LDA ($FA),Y} is equivalent to @code{LDA $FA,Y} -- which probably isn't what you wanted. However, by specifying the switch
1015
1016 @example
1017 --swap-parens
1018 @end example
1019
1020 upon invoking the assembler, the behaviour of [ ] and ( ) will be reversed. That is, the ``normal'' way of specifying indirection, i.e. @code{LDA ($00),Y} is used, while expression operands are grouped with [ ], i.e. @code{A/[B+C]}.
1021
1022 @node The Linker
1023 @chapter The Linker
1024
1025 The main job of the linker is to take object code files (units) created by the assembler, resolve any dependencies among them and reduce them to pure 6502 binaries.
1026
1027 The XORcyst linker takes as input a linker script. The linker script is a plaintext file containing a sequence of commands which describe the layout and contents of the linker output. (For a complete list and description of script commands, see @ref{Linker Script Commands}.) The final output of the linker process is a single binary file containing all the 6502 code properly relocated and resolved, plus any other data specified in the linker script.
1028
1029 @section Invoking the linker (@command{xlnk})
1030
1031 The basic usage is
1032
1033 @samp{@command{xlnk} @var{script-file}}
1034
1035 where @var{script-file} is the linker script file containing commands to be processed by the linker.
1036
1037 To have the linker print some information on what it is doing, give the @code{--verbose} switch.
1038
1039 @section A simple linker script example
1040
1041 The example below shows what a very simple linker script may look like. It is the simplest case, where you have a single unit @file{my_unit.o} (created by the assembler, presumably from @file{my_unit.asm}), and want to create executable 6502 code from it. For small, single-source projects you won't need much more than this.
1042
1043 @example
1044 ram@{start=0x0000,end=0x0800@}           # define an available range of 6502 RAM
1045 output@{file=program.bin@}               # set the output file
1046 link@{file=my_unit.o, origin=0xC000@}    # relocate my_unit.o to 0xC000 and write it to output
1047 @end example
1048
1049 Commands in the script are of the form @code{@var{command-name}@{@emph{[@var{arg-name}=@var{value}, @var{arg-name}=@var{value}, ...]}@}}. The kind and number of valid arguments depends on the particular command. Some arguments are optional while others are mandatory, again depending on the particular command. (Even if the command has no arguments, you have to have a pair of empty braces).
1050
1051 The @code{ram}-command tells the linker that it has available a chunk of RAM in the 6502's memory starting at address 0x0000 and ending at 0x0800. The linker will map the contents of data segments to physical addresses in this region.
1052
1053 The @code{output}-command is used to tell the linker which file to direct its output to.
1054
1055 The @code{link}-command tells the linker to relocate the given unit and output the resulting binary representation.
1056
1057 As you can see, a line comment in the script is initiated with a @code{#}-character.
1058
1059 @section Linking multiple units
1060
1061 In principle, linking more than one unit into the same output file is simple: Just add appropriate @code{link}-commands to the linker script. For example, say you have written a small library of functions you commonly use across all your projects and assembled it to @file{my_lib.o}. Assume that your main program, say, @file{my_unit.o} depends on @file{my_lib.o}; it calls one or more functions exported from the library. You would then add an additional line to the previous example script:
1062
1063 @example
1064 ram@{start=0x0000,end=0x0800@}
1065 output@{file=program.bin@}
1066 link@{file=my_unit.o, origin=0xC000@}
1067 link@{file=my_lib.o@}                     # my_lib will be relocated to directly after my_unit.o
1068 @end example
1069
1070 Note that there is no @option{origin}-argument to the latter @code{link}-command. This is because we generally don't know how much space the code from @file{my_unit.o} will occupy. So we let the linker take care of it; when no origin is specified, the unit will be relocated to the location where the previous entity processed by the linker ended (the linker manages a ``pseudo-Program Counter'' internally to keep track of where it is in 6502 memory). So if the code for @file{my_unit.o} was @code{0x0ABC} bytes in size, @file{my_lib.o} would be relocated to @code{0xCABC}.
1071
1072 @section Separating units into banks
1073
1074 You will get an error during linking if the Program Counter exceeds 64K. To write larger programs you normally have to divide the program into banks and manually switch them in and out of 6502 memory as they are needed. How the switching is done is very system-specific, so The XORcyst doesn't corcern itself with that. However, it does allow you to manage banks.
1075
1076 The linker script command @code{bank} is used to start a new bank. There are two (semi-optional) arguments to @code{bank}:
1077 @itemize
1078
1079 @item @option{size}, which specifies the bank size in bytes. If a size is not specified, the size of the previous bank is used; and
1080
1081 @item @option{origin}, which specifies the bank's origin in 6502 memory. This is the address where the bank must be located when it resides in memory during program execution. If an origin is not specified, the origin of the previous bank is used.
1082
1083 @end itemize
1084
1085 For example,
1086 @example
1087 bank@{size=0x4000, origin=0x8000@}
1088 @end example
1089 indicates the start of a bank that is to be 16KBytes in size, and its contents should be linked relative to address 0x8000.
1090
1091 So to build on our previous example script, say that you for some reason want to put the library in a separate bank from your main program:
1092
1093 @example
1094 ram@{start=0x0000,end=0x0800@}
1095 output@{file=program.bin@}
1096 bank@{size=0x4000, origin=0x8000@}
1097 link@{file=my_unit.o@}
1098 bank@{size=0x4000, origin=0xC000@}
1099 link@{file=my_lib.o@}                     # my_lib will be relocated to directly after my_unit.o
1100 @end example
1101
1102 This will create an output 32KBytes in size, the first 16KBytes being the bank containing the code from @file{my_unit.o} and the latter 16KBytes containing @file{my_lib.o}'s code.
1103
1104 A couple of things worth mentioning:
1105 @enumerate
1106
1107 @item It isn't necessary to specify the origin in any of the @code{link}-commands anymore, since an origin is specified in the owning bank instead. If you do specify an origin in the @code{link}-command, it will override the internal linker origin.
1108
1109 @item When you start a new bank, the previous bank may not have been completely ``filled up'' with code and/or data; in this case the output is automatically padded with zeroes so that the size of the output matches the given bank size. (In addition to the 16-bit Program Counter, the linker also keeps track of the current 0-relative bank offset, and advances it as stuff is added to output.)
1110
1111 @end enumerate
1112
1113 @section Partitioning 6502 RAM
1114
1115 Usually you don't want to let the linker have @emph{all} the 6502 RAM at its disposal for data mapping; some regions of memory have special meaning and should generally be off-limits to the linker. For example, the 6502 has a stack which grows down from address 0x01FF. So it would be good idea to reserve some space there for the stack.
1116
1117 Partitioning the RAM is easy. Just put multiple @code{ram}-commands in the linker script, leaving out the reserved regions. For example, this is a typical configuration I use for NES game programming:
1118 @example
1119 ram@{start=0x0000, end=0x0180@}
1120 ram@{start=0x0300, end=0x0800@}
1121 ram@{start=0x6000, end=0x6000@}    # only if the board has WRAM
1122 @end example
1123
1124 Here I have left out the region @code{0x0180}...@code{0x0300}. Address @code{0x0180} up to and including @code{0x01FF} is where the stack lives, while the page starting at @code{0x0200} is used to hold game sprite data; this address is hard-coded in the assembly source.
1125
1126 The order of @code{ram}-commands is significant. The order defines the order in which the linker will attempt to map data segments' symbols to RAM. This is why the region containing the zeropage should preferably come first, since we generally want as much data as possible to be mapped here. Only when the linker runs out of space in the first region will it try the next one, and so on.
1127
1128 @section Copying files to linker output
1129
1130 Analogous to the assembler directive @code{.incbin}, the linker script command @code{copy} allows you to copy a file straight to the linker's output file. For example, you might like to prepend your 6502 executable with a header. So you create a custom header file called, say, @file{header.bin} and, prior to the @code{bank}-commands, you issue the command
1131
1132 @example
1133 copy@{file=header.bin@}
1134 @end example
1135
1136 You can also use the @code{copy}-command inside banks of course, anywhere you like. In this case the internal Program Counter and bank offset will be advanced in the same manner as when a unit is linked and copied to the output with the @code{link}-command. The only difference is that you can tell in advance how much the offsets will be increased (by looking at the size of the file that is copied).
1137
1138 @section Padding the output
1139
1140 You can pad the output explicitly with the @code{pad}-command. This will write an appropriate number of zero-bytes to the output file. The following are the (mutually exclusive) arguments to the command.
1141 @itemize
1142
1143 @item @code{size} : Pad as many bytes as indicated
1144
1145 @c @item @code{origin} : Pad until Program Counter equals the given origin
1146
1147 @item @code{offset} : Pad until bank offset equals the given offset
1148
1149 @end itemize
1150
1151 @c @section Specifying options
1152
1153 @node Known Bugs and Limitations
1154 @chapter Known Bugs and Limitations
1155
1156 Every source file must end with a newline.
1157
1158 @node Implementation Details
1159 @appendix Implementation Details
1160
1161 Some deep discussion will eventually go here; in the meantime, have a look at the sourcecode, which is full of comments.
1162
1163 @node Assembler Directives
1164 @appendix Assembler Directives
1165
1166 It is considered good practice to prepend a period to a directive when invoking it (to differentiate it from identifiers), but this is not a strict requirement.
1167
1168 The following is an alphabetical listing of the directives supported by the assembler and the arguments they may take. Arguments enclosed in square brackets [ ] are optional.
1169
1170 @table @code
1171
1172 @item align @var{identifier} @emph{[, @var{identifier-2}, ...]} @var{boundary}
1173 Specifies alignment constraints for a list of data variables.
1174
1175 @item asc (@emph{Alias for} char)
1176
1177 @item byte (@emph{Alias for} db)
1178
1179 @item char @var{expression} @emph{[, @var{expression}, ...]}
1180 Define (array of) character transformed by custom character map
1181
1182 @item charmap "@var{filename}"
1183 Set custom character map
1184
1185 @item codeseg, code
1186 Switch to code segment
1187
1188 @item dataseg, data @emph{[zeropage]}
1189 Switch to data segment
1190
1191 @item db @var{expression} @emph{[, @var{expression}, ...]}
1192 Define (array of) byte
1193
1194 @item dd @var{expression} @emph{[, @var{expression}, ...]}
1195 Define (array of) doubleword
1196
1197 @item define @var{identifier} @emph{[@var{expression}]}
1198 See @code{equ} directive
1199
1200 @item dsb @var{expression}
1201 Define storage of bytes
1202
1203 @item dsd @var{expression}
1204 Define storage of doublewords
1205
1206 @item dsw @var{expression}
1207 Define storage of words
1208
1209 @item dw @var{expression} @emph{[, @var{expression}, ...]}
1210 Define (array of) word
1211
1212 @item dword (@emph{Alias for} dd)
1213
1214 @item elif
1215 Used in conjunction with if
1216
1217 @item else
1218 Used in conjunction with if, ifdef or ifndef
1219
1220 @item endif
1221 Ends a statement block preceded by an if
1222
1223 @item end
1224 Ends the assembly unit
1225
1226 @item ende
1227 Ends an enum definition
1228
1229 @item endm
1230 Ends a macro definition
1231
1232 @item endp
1233 Ends a procedure definition
1234
1235 @item ends
1236 Ends a structure or union definition
1237
1238 @item enum @var{identifier}
1239 Begins an enum definition
1240
1241 @item extrn @var{identifier} @emph{[, @var{identifier}, ...]} : @var{type}
1242 Flag identifier(s) as external (imported) of type @var{type}
1243
1244 @item @var{identifier} equ @var{expression}
1245 Define equate
1246
1247 @item if @var{expression}
1248 Assemble the following statement block only if @var{expression} evaluates to non-zero
1249
1250 @item ifdef @var{identifier}
1251 Assemble the following statement block only if @var{identifier} is defined
1252
1253 @item ifndef @var{identifier}
1254 Assemble the following statement block only if @var{identifier} is not defined
1255
1256 @item incbin "@var{filename}"
1257 Include contents of @var{filename} as binary data
1258
1259 @item include (@emph{Alias for} incsrc)
1260
1261 @item incsrc "@var{filename}"
1262 Include contents of @var{filename} as assembler statements
1263
1264 @item label @var{identifier} @emph{[= @var{address}]} @emph{[ : @var{type}]}
1265 Defines a global label
1266
1267 @item macro @var{identifier} @emph{[@var{identifier}, ...]}
1268 Begins a macro definition
1269
1270 @item message @var{expression}
1271 Prints a message to stdout during assembly
1272
1273 @item pad @var{expression}  (@emph{Alias for} dsb)
1274
1275 @item proc @var{identifier}
1276 Begins a procedure definition
1277
1278 @item public @var{identifier} @emph{[, @var{identifier}, ...]}
1279 Flag identifier(s) as public (exported)
1280
1281 @item record @var{identifier} @var{identifier}:@var{width} @emph{[, @var{identifier}:@var{width}, ...]}
1282 Defines a record consisting of bitfields.
1283
1284 @item rept @var{count}
1285 Begins an anonymous macro to be repeated @var{count} times
1286
1287 @item struc @var{identifier}
1288 Begins a structure definition
1289
1290 @item type @var{identifier} @emph{[@var{expression}, ...]}
1291 Define data of user-defined type @var{identifier}
1292
1293 @item union @var{identifier}
1294 Begins a union definition
1295
1296 @item while @var{expression}
1297 Begins an anonymous macro to be repeated while @var{expression} is true (non-zero)
1298
1299 @item word (@emph{Alias for} dw)
1300
1301 @end table
1302
1303 @node Linker Script Commands
1304 @appendix Linker Script Commands
1305
1306 The following is an alphabetical listing of the script commands recognized by the linker and the arguments they may take. Note that not all arguments are mandatory and some are mutually exclusive.
1307
1308 @table @code
1309
1310 @item bank @{ size=@var{size}, origin=@var{origin-address} @}
1311 Start a new bank of size @var{size} bytes and set initial relocation address to @var{origin-address}.
1312
1313 @item copy @{ file=@var{filename} @}
1314 Copy contents of @var{filename} to output.
1315
1316 @item link @{ file=@var{filename}, origin=@var{origin-address} @}
1317 Relocate code in the unit @var{filename} to @var{origin-address} and copy the result to output. (If an origin is not specified, the internally managed linker origin is used.)
1318
1319 @c @item options
1320
1321 @item output @{ file=@var{filename} @}
1322 Set the linker output file.
1323
1324 @item pad @{ origin=@var{origin-address}, offset=@var{offset}, size=@var{size} @}
1325 Pad to the given origin or bank offset, or pad @var{size} bytes (only one of the arguments should be given).
1326
1327 @item ram @{ start=@var{start-address}, end=@var{end-address} @}
1328 Specify that 6502 RAM in the range @var{start-address}...@var{end-address} (non-inclusive) may be used by the linker to map contents of data segments.
1329
1330 @end table
1331
1332 @node Object Code Format
1333 @appendix Object Code Format
1334
1335 An object code file, or unit, produced by the assembler has the following major sections:
1336
1337 @itemize
1338
1339 @item Magic number and assembler version
1340
1341 @item Definitions of exported constants
1342
1343 @item Descriptors for imported symbols
1344
1345 @item Data segment bytecodes
1346
1347 @item Code segment bytecodes
1348
1349 @item Definitions of expressions referred to by bytecodes
1350
1351 @end itemize
1352
1353 Each of these will be described in the sequel.
1354
1355 @section Magic number and assembler version
1356
1357 The magic number is a 16-bit constant (@code{0xCAFE}, if you must know), used to validate the object file. It is followed by 1 byte which denotes the version of the assembler that was used to build the file; the major version in the upper nibble and minor version in the lower nibble (should be @code{0x10}).
1358
1359 @section Definitions of exported constants
1360
1361 This is a series of triplets @var{(identifier, type, value)}, each describing a constant made publicly available.
1362
1363 @section Descriptors for imported symbols
1364
1365 This is a list of descriptors for the symbols used by this unit which are not defined in the unit itself; that is, they are external dependencies.
1366
1367 @section Data segment bytecodes
1368
1369 Statements in the original assembler file are encoded in a compact bytecode form. The bytecodes here define the labels and data storages located in the unit's @code{dataseg} section(s). The bytecode commands are a subset of the ones described in the next section.
1370
1371 @section Code segment bytecodes
1372
1373 These bytecodes are a compact representation of the contents of the unit's @code{codeseg} section(s). The commands and their arguments are as follows:
1374
1375 @table @code
1376
1377 @item CMD_END
1378 Indicates the end of the segment.
1379
1380 @item CMD_BIN8 @var{count} @var{byte1, byte2, ...}
1381 The next @var{count} bytes are binary data which needn't be processed in any special way. @var{count} is an 8-bit quantity.
1382
1383 @item CMD_BIN16 @var{count} @var{byte1, byte2, ...}
1384 The next @var{count} bytes are binary data which needn't be processed in any special way. @var{count} is a 16-bit quantity.
1385
1386 @item CMD_LABEL @var{flag} @var{identifier}
1387 Define a label. If bit 0 of the byte @var{flag} is set, this is a public variable and its identifier follows.
1388
1389 @item CMD_INSTR @var{opcode} @var{expression-id}
1390 An instruction whose operand must ultimately be resolved. @var{opcode} is the 6502 operation code. @var{expression-id} is a 16-bit quantity which refers to the expression which is the (symbolic) operand of the instruction (see the next section).
1391
1392 @item CMD_DB @var{expression-id}
1393 Define a byte symbolically. @var{expression-id} refers to the expression which is the operand.
1394
1395 @item CMD_DW @var{expression-id}
1396 Define a word symbolically. @var{expression-id} refers to the expression which is the operand.
1397
1398 @item CMD_DD @var{expression-id}
1399 Define a doubleword symbolically. @var{expression-id} refers to the expression which is the operand.
1400
1401 @item CMD_DSI8 @var{size}
1402 Define data storage of @var{size} bytes. @var{size} is an 8-bit quantity.
1403
1404 @item CMD_DSI16
1405 Define data storage of @var{size} bytes. @var{size} is a 16-bit quantity.
1406
1407 @item CMD_DSB @var{expression-id}
1408 Define data storage of bytes, the size of which is determined by the expression referred to by @var{expression-id}.
1409
1410 @end table
1411
1412 @section Expressions
1413
1414 As you may have noticed in the preceding section, many bytecodes have an expression identifier as argument. This is just an index into the list of expressions defined in the final part of the object file. The main advantages of separating the @emph{use} of an expression (through its identifier) from its @emph{definition} is ease of parsing and processing (each bytecoded instruction will always occupy 4 bytes), and the ability to share the same expression among several instructions without having to redefine it every time. In most cases, the expression will just be a stand-alone reference to a symbol (local or external). But any expression that the assembler understands can be encoded here.
1415
1416 @node Custom Character Maps
1417 @appendix Custom Character Maps
1418
1419 @c Using custom character maps is a convenient way of mapping ASCII characters to a different encoding.
1420 A custom character map is a plaintext file containing statements of the form
1421 @example
1422 @var{key} = @var{value}
1423 @end example
1424
1425 where @var{key} is a character or escape sequence and @var{value} is the integer literal instances of this character should be mapped to when occuring as argument to the @code{.char} assembler directive.
1426
1427 There is a also a shorthand form for mapping a range of characters at once:
1428 @example
1429 @var{low_key}-@var{high_key} = @var{value}
1430 @end example
1431
1432 This is most useful when mapping the decimal digits and the alphabet. Instead of typing monotonic statements like
1433 @example
1434 0=0x10
1435 1=0x11
1436 ...
1437 9=0x19
1438 @end example
1439 you can achieve the same result from the statement
1440 @example
1441 0-9=0x10
1442 @end example
1443
1444 @node Error and Warning Messages
1445 @appendix Error and Warning Messages
1446
1447 @section Assembler Error Messages
1448
1449 @table @samp
1450
1451 @item cannot expand `@var{identifier}'; not a macro
1452 Make sure @var{identifier} is a macro and not a label, constant or other type of symbol.
1453
1454 @item conditional expression does not evaluate to literal
1455 Conditional assembly with the @code{.if}-directive requires that the expression tested can be evaluated immediately, so it can't contain references to labels and such (since these aren't computed by the assembler, that's the linker's job).
1456
1457 @item could not open `@var{filename}' for reading
1458 Check that the file exists and that you have read privileges.
1459
1460 @item duplicate symbol `@var{identifier}'
1461 You tried to define the same symbol more than once. Global labels must be unique across the entire program. Local labels must be unique within the relevant local scope.
1462
1463 @item field declaration expected
1464 A structure or union field must be of the form @code{@var{identifier} @var{datatype} @var{[count]}}; for example, @code{my_field .db}, @code{my_2nd_field .dsw 10}, @code{my_3rd_field .type other_struc}.
1465
1466 @item data initialization not allowed here
1467 A structure or union declaration cannot contain initialization of its fields.
1468
1469 @item initializer does not evaluate to integer literal
1470 A member of an enumerated datatype must be assigned a constant value.
1471
1472 @item initializer for field `@var{identifier}' exceeds field size
1473 When defining the value of a structure or union field, the value must not exceed the number of bytes of storage allocated for that field. For example, if the field definition is @code{a_string .dsb 4}, then the value @code{"too long"} won't fit since it is 8 bytes long.
1474
1475 @item instructions not allowed in data segment
1476 Instructions can only be contained in a code segment (after a @code{codeseg} directive).
1477
1478 @item invalid addressing mode
1479 The combination of mnemonic and addressing mode for the 6502 instruction is invalid. For example, @code{LDX $00,X} is invalid since the @code{LDX} instruction does not have a ZeroPage,X-mode version. Consult a 6502 manual to see what modes are valid for each instruction.
1480
1481 @item invalid dataseg statement
1482 A data segment only supports a subset of the statements allowed in code segments. You can't put instructions or initialized data in a data segment.
1483
1484 @item invalid operand
1485 You supplied an invalid operand to a statement, for example a string as operand to the @code{LDA} instruction.
1486
1487 @item macro `@var{identifier}' does not take @var{count} argument(s)
1488 You supplied the wrong amount of arguments to the macro. Check the macro definition if you're unsure how many arguments it takes, and try again.
1489
1490 @item member `@var{sub-struct-identifier}' of `@var{struct-identifier}' is not a structure
1491 The expression @code{@var{struct-identifier}.@var{sub-struct-identifier}.@var{some-member}} did not resolve because @var{sub-struct-identifier} is not a structure.
1492
1493 @item member '@var{field-identifier}' of '@var{struct-identifier}' is of unknown type (`@var{type-identifier}')
1494 @var{type-identifier} is an undefined type.
1495
1496 @item only one field of union can be initialized
1497 When defining an instance of a union, only one of the possible fields can be given a value between the pair of enclosing braces @{ @}.
1498
1499 @item operand out of range
1500 A 6502 instruction has either an 8-bit or 16-bit operand, so its value has to fit in that many bits. However, the value you supplied was too large to fit.
1501
1502 @item procedures not allowed in data segment
1503 A procedure contains code. Code cannot be contained in a data segment.
1504
1505 @item repeat count does not evaluate to literal
1506 Anonymous macros are expanded as soon as they are encountered. Thus, the argument of a @code{rept} directive must be an immediate expression.
1507
1508 @item size of `@var{identifier}' is unknown
1509 The operand to the @code{sizeof} operator must be one of @code{BYTE}, @code{WORD}, @code{DWORD}, or the name of a structure definition.
1510
1511 @item string or integer argument expected
1512 The @code{message} directive takes a string or integer as its argument.
1513
1514 @item structure initializer expected
1515 When defining data that is an instance of a structure or union, the field value(s) must be enclosed in a pair of braces @{ @}.
1516
1517 @item too many field initializers
1518 There are too many values given compared to the actual number of fields in the structure or union.
1519
1520 @item union member must be of constant size
1521 The size of a union member must be known at assembly time. This restriction does not apply to structs.
1522
1523 @item unknown macro or directive `@var{identifier}'
1524 You attempted to invoke a macro or directive that the assembler doesn't recognize. Check your spelling (remember that identifiers are case sensitive) and/or your macro definitions.
1525
1526 @item unknown namespace `@var{identifier}'
1527 The expression @code{@var{identifier}::@var{symbol}} did not resolve because @var{identifier} is not a namespace.
1528
1529 @item unknown symbol `@var{identifier}'
1530 Your code refers to a symbol which hasn't been defined locally nor has it been declared to be external.
1531
1532 @item value not allowed in data segment
1533 Data cannot be initialized in a data segment; the data segment can only specify how many bytes of storage will be needed at runtime.
1534
1535 @item `@var{identifier}' declared as extrn but is defined locally
1536 Defining a label in your own code and then declaring it as an external symbol doesn't make much sense.
1537
1538 @item `@var{identifier}' already declared extrn
1539 An identifier already specified in a @code{public} directive cannot at the same time be external.
1540
1541 @item `@var{identifier}' is of non-exportable type
1542 Macros and other volatile symbols cannot be exported.
1543
1544 @item `@var{field-identifier}' is not a member of `@var{struct-identifier}'
1545 The expression @code{@var{struct-identifier}.@var{field-identifier}} did not resolve.
1546
1547 @end table
1548
1549 @section Assembler Warning Messages
1550
1551 @table @samp
1552
1553 @item `@var{identifier}' declared as public but is not defined
1554 You cannot export a symbol that isn't defined in your code.
1555
1556 @item `@var{identifier}' defined but not used
1557 Usually there is a reason for defining a symbol, so the assembler will warn you if there are no references to it in the code.
1558
1559 @item operand out of range; truncated
1560 Operand exceeds 8 or 16 bits, so the upper bits are chopped off.
1561
1562 @item redefinition of `@var{identifier}' is not identical; ignored
1563 When using the @var{.equ}-directive you can only define each identifier once. (Use the = operator instead if appropriate.)
1564
1565 @end table
1566
1567 @section Linker Error Messages
1568
1569 @table @samp
1570
1571 @item branch out of range
1572 A relative branch instruction went too far. Trim your code or do an inverse-branch-followed-by-jump combo instead.
1573
1574 @item duplicate symbol `@var{identifier}'
1575 A symbol with the same name is exported from two or more of the units being linked. When linking, exported names must be unique across all units.
1576
1577 @item incompatible operand(s) to `@var{operator}' in expression
1578
1579 @item instruction operand doesn't fit in 1 byte
1580 A rather fatal error. A zeropage instruction's operand address won't fit.
1581
1582 @item instruction operand doesn't fit in 2 bytes
1583 A rather fatal error which shouldn't even occur.
1584
1585 @item invalid instruction operand (string)
1586 6502 instructions only take integer operands.
1587
1588 @item negative count
1589 A storage directive must have a positive integer operand.
1590
1591 @item out of 6502 RAM while allocating unit `@var{unit}'
1592 The linker couldn't map the data segments to 6502 RAM because there was too little of it available. Check your @code{ram}-commands in the script or reduce your program's memory requirements.
1593
1594 @item PC went beyond 64K when linking `@var{unit}'
1595
1596 @item unexpected string operand (`@var{string}') to storage directive
1597 Storage directives only take integer operands.
1598
1599 @item unknown symbol `@var{identifier}' referenced from @var{unit}
1600 The external symbol couldn't be resolved. You need to link the unit containing the symbol.
1601
1602 @end table
1603
1604 @subsection Linker Script Error Messages
1605
1606 @table @samp
1607
1608 @item bank size (@var{size}) exceeded by @var{count} bytes
1609 The bank output exceeded the size of the current bank. The bank size is wrong, your code is too large or the files you are copying to the bank are.
1610
1611 @item cannot pad backwards
1612 If you start a bank and copy a, say, 2K file to it, then attempt to pad to offset 1K you will get this error. Padding can only be done from a smaller offset to a larger or equal offset. Your pad offset is wrong or the data preceding it is too large.
1613
1614 @item could not open `@var{filename}' for reading
1615 I'm sure you know what this means by now.
1616
1617 @item could not open `@var{filename}' for writing
1618 The specified output file could not be created.
1619
1620 @item `end' is smaller than `start'
1621 The end address should be larger or equal to the start address, not the other way around.
1622
1623 @item failed to load `@var{unit}'
1624 The object file could not be loaded from storage. The file is missing, you don't have access to it or it is corrupted.
1625
1626 @item invalid size
1627 The size must be a positive (larger than zero) quantity.
1628
1629 @item missing argument `@var{name}'
1630 The script command requires an argument which you did not supply.
1631
1632 @item no bank size set
1633 At a minimum, the first @code{bank}-command in the script must supply a bank size.
1634
1635 @item no output open
1636 When executing a script command which writes to the linker's output, an output file must have been specified first. Make sure that all @code{link}-, @code{copy}-, @code{pad}-commands etc. are preceded by the proper @code{output}-command.
1637
1638 @item value of argument `@var{name}' is out of range
1639 The script command argument's value is outside the expected range. For example, an argument which specifies a 6502 address should be between 0 and 64K.
1640
1641 @end table
1642
1643 @section Linker Warning Messages
1644
1645 @table @samp
1646
1647 @item `.D(B|W)' operand @var{integer} out of range; truncated
1648 Operand exceeds 8 or 16 bits, so the upper bits are chopped off.
1649
1650 @end table
1651
1652 @bye