level0/README

   1 ========================================================================
   2 ***WARNING*** UrForth level 0 IS *COMPLETE*! DO NOT USE IT FOR ANYTHING,
   3 DO NOT REPORT BUGS FOR IT, AND SO ON. UrForth level 1 IS THE WAY TO GO.
   4 ========================================================================
   5
   6 this is direct-threaded x86 32-bit GNU/Linux Forth system.
   7
   8 UrForth level 0 needs fasm to compile. after i finish some necessary
   9 parts (assembler with normal syntax), UrForth level 1 will be created,
  10 and it will be self-hosting.
  11
  12 some notes about standards-compliance: i don't fuckin' care. pre-ANS
  13 standards are too restrictive, and ANS standard is fubared with idiotic
  14 "portability", and other crap. also, there are no useful documents which
  15 summarizes all changes between different standards versions. and without
  16 full list of words with changed semantics, i won't try to implement them
  17 in standards-compliant manner. this is because UrForth was started as
  18 mostly dsForth-compatible (for historical reasons), and dsForth is a
  19 mishmash of FIG/F83/ANS. of course, i could manually compare all
  20 standards to find out what is different there, but fuck it. i don't
  21 expect anybody to use UrForth anyway.
  22
  23 also, about ANS "portability" crap: i strongly believe that default
  24 architecture should be 32-bit 2-complement one, without any stupid data
  25 align rules. for non-conformant systems, an implementation should emulate
  26 the abovementioned arch. this represents most architectures out there,
  27 and if your arch is different, your Forth system will take care of that,
  28 running standards-compliant code without any changes. and library authors
  29 can avoid error-prone jumping through the hoops.
  30
  31 if you think that speed is more important than compatibility, you always
  32 can manually fix the code for speed.
  33
  34 also, i'm not planning to create 64-bit versions of UrForth. "64-bitness"
  35 is another thing i consider idiotic. also, PIC code will not be supported
  36 (too much idiocity around!).
  37
  38
  39 more ANS idiocity: 2! and 2@ are storing 64-bit numbers as
  40 big-dword-endian. fuckin' morons. i introduced "2!LE" and "2@LE", and
  41 left "standard" words intact.
  42
  43 also, "cell" was another idiotic word choice. does "c@" operate on cells,
  44 or on chars? i wonder if anybody there had even one working brain cell.
  45
  46
  47 some advanced UrForth features, in no particular order:
  48
  49 * vocabularies has hastables for word names (256 bytes per vocabulary).
  50   this makes searches ~50-100 times faster:
  51     FORTH -- 799 words, 64 of 64 buckets used, 4 min items, 21 max items, average: 12 words per bucket
  52   considering that each word first checked for valid hash and length,
  53   the searcher usually does only one full string comparison.
  54   in other words: vocabulary searches are lightning fast.
  55
  56 * number prefixes:
  57     $,#,0x,&h -- hex number (note that 2012 standard wants "#" for decimal)
  58     %,0b,&b -- binary number
  59     0o,&o -- octal number
  60     0d,&d -- decimal number
  61
  62 * number postfixes:
  63     nnnH -- hex
  64     nnnO -- octal
  65     nnnB -- binary
  66
  67 * underscores in numbers are ignored:
  68     0x8000_00_00 is a valid number
  69
  70 * extended word search:
  71     you can use "a:b" to find word "b" in vocabulary "a".
  72     of course, "a:b:c" and such are allowed too.
  73
  74 * UrForth has fully working BREAK and CONTINUE
  75     they can be used in BEGIN and DO...LOOP.
  76     also, they know about CASE, so you can use
  77     BREAK/CONTINUE in OF/OTHERWISE clauses.
  78
  79 * BEGIN loops can contain arbitrary number of WHILE/NOT-WHILE parts,
  80   and they can be terminated with UNTIL/NOT-UNTIL even if WHILE is present.
  81   AGAIN is allowed too, for any kind of BEGIN loop.
  82
  83 * there is IFNOT in addition to IF.
  84
  85 * segfault handler will show you stack dump and backtrace.
  86
  87 * ANS wordlists are not supported. instead, there is F83 ONLY/ALSO mechanics.
  88
  89 * vocabularies supports public and hidden words. it is possible to create
  90   "nested" vocabulary, which will see all parent's hidden words by default.
  91
  92 * multiline comments: normal (* ... *), and nested (+ ... +)
  93   nested multiline comment allows other nested comments
  94
  95 * x86 assembler with defered plug-in interfaces to memory r/w and label manager
  96   (can be used to create metacompilers, or even standalone assemblers).
  97   it is using normal intel syntax, not yoda-style. also, you don't need to
  98   separate operands with spaces, because assembler does its own input stream
  99   parsing.
 100
 101 * there are `[:` and `;]` to create cblocks. this feature can be used like this:
 102     : foreach-do ( cfa -- )  10 0 do i over execute loop drop ;
 103     : a  [: . cr ;] foreach-do ;
 104   internally, it compiles header-less word, and leaves its CFA on the stack.
 105
 106 * cblocks can be used in interpreter too, i.e. you can type this at REPL:
 107     [: 10 0 do i . cr loop ;] execute
 108   and it will work. and even this will work:
 109     [: ." hey!" cr [: 65 emit cr ;] execute ;] execute
 110   note that you cannot assign such cblocks to DEFERed words, because they
 111   will not live long enough.
 112
 113 * internally, there is DP-TEMP variable. when it is 0, the normal DP is used
 114   for HERE. but when it is non-zero, all HERE-based words will use it instead.
 115   this is used to make cblocks working in interpreter, and it is also can be
 116   used to create metacompilers. no other words are accessing "DP" directly.
 117
 118 * "OVERRIDE" word can be used to override Forth words with other Forth words.
 119   it works like this:
 120     : newdot  ( n old-xtoken )
 121       check-some-condition if
 122         OVERRIDE-EXECUTE
 123       else
 124         2drop
 125       endif
 126     ;
 127     OVERRIDE . newdot
 128    or
 129     ' . ' newdot (OVERRIDE)
 130
 131   note that overriden word can be called as usual, and override forces even
 132   previously compiled words to call `newdot`. also note that `newdot` cannot
 133   be called as normal forth word anymore (no checks are made, it will just
 134   make your system unusable due to UB).
 135
 136   currently, there is no way to "unoverride" the word. it may be added later.
 137
 138   also, remember that "old-xtoken" cannot be called with "EXECUTE". but you
 139   can freely pass "old-xtoken" around and call it with "OVERRIDE-EXECUTE" at
 140   any moment, and at any place.
 141
 142   this can be used to create things like metacompilers, for example, without
 143   duplicating all compiler word definitions again.
 144
 145   if you will override already overriden word, old override will be replaced
 146   with the new one (i.e. the overrides are not chained).
 147
 148   you can chain overrides like this, if you want to:
 149     0 value (prev-ovr)  (hidden)
 150     : ovr-new  ( ... xtoken -- )
 151       (prev-ovr) ?drop override-execute
 152     ;
 153     get-override . to (prev-ovr)
 154     override . ovr-new
 155
 156 * "REPLACE" word can be used to perform system-wide word replacement:
 157     replace oldword newword
 158   WARNING! you can replace any word with any other word (including constants,
 159   code words, and so on, on both sides; the desired effect is up to you). i.e.
 160   this will work:
 161     69 constant fuck
 162     : hell 666 ;
 163     replace fuck hell
 164     fuck 666 = .  ( true )
 165
 166
 167
 168 some notable differences from ANS:
 169
 170 * TRUE is 1, not -1; there are LOGAND and LOGOR; NOT is logical, bitwise not is BITNOT
 171 * TIB is still there, no ANS SOURCE and such.
 172 * TIB size is in variable #TIB
 173 * current tib line is in variable TIB-LINE# (if it is 0, no line counting and debug info).
 174 * DOES> words use extra bytes after PFA.
 175 * there is [COMPILE], and no CHAR (use [CHAR] instead).
 176 * [CHAR] will error on any string that is not one-char.
 177 * there is no FIND, only WFIND-STR ( addr count -- cfa 1 // 0 ).
 178 * COUNT expects cell-counted string; for byte-counted, use CCOUNT.
 179 * WORD returns cell-counted string (at HERE).
 180 * there is N-ALLOT ( size -- start-addr ) low-level word, which is used by ALLOT and
 181   others. it returns starting address of the allocated dictionary memory (which can
 182   be in transient DP-TEMP).
 183 * i don't fuckin' know what "address unit" is, and why it is necessary. so MOVE
 184   works with bytes.
 185 * i am not interested in adding "CHARS" and other such crap.
 186 * FIG-style "CFA->NFA" and such are retained. no support for "BODY>", etc. is planned.
 187 * S" always unescapes string, because i see no reason to not do it. also, terminating
 188   zero byte is always there (but it is not included in count).
 189 * VARIABLE is FIG-style, and requires an initial value (change with `(SHIT-2012-IDIOCITY)`)
 190 * PAD is not relative to HERE, it resides in separate memory area
 191
 192
 193
 194 WARNING: use 'fasm -m 32768 urforth0.asm' to compile UrForth Level 0! the code is quite
 195          macro-heavy, so it needs a lot of memory, fasm default 16MB is not enough.
 196          also, you should use fasm1, not fasm-g!
 197
 198
 199 FAQ ;-)
 200
 201 Q: i am calling functions from .so, and they are segfaulting at random!
 202 it doesn't happen with C code, UrForth has a bug there!
 203
 204 A: nope. what is happening is that your system is broken, and doesn't
 205 follow ABI. 32-bit ABI doesn't require the stack to be aligned in any
 206 particluar way (except being dword-aligned), but modern GCC not only
 207 aligns the stack at 16 bytes, but generates code that expects the stack
 208 to be always aligned like that. it is a violation of ABI, and a bug in
 209 GCC. rebuild your system with "-mstackrealign" GCC flag to fix it.
 210
 211 Q: but everybody else is happy to do what GCC dumbfucks command them to
 212 do! why can't you simply add stack aligning code to UrForth?!
 213
 214 A: because i see no reason to workaround GCC bugs in my code. the longer
 215 we will tolerate GCC ABI breakage, the longer it will last.
 216
 217
 218 have fun, and happy hacking
 219 Ketmar Dark // Invisible Vector