data/README.md

   1 This directory contains data needed by Bison.
   2
   3 # Directory Content
   4 ## Skeletons
   5 Bison skeletons: the general shapes of the different parser kinds, that are
   6 specialized for specific grammars by the bison program.
   7
   8 Currently, the supported skeletons are:
   9
  10 - yacc.c
  11   It used to be named bison.simple: it corresponds to C Yacc
  12   compatible LALR(1) parsers.
  13
  14 - lalr1.cc
  15   Produces a C++ parser class.
  16
  17 - lalr1.java
  18   Produces a Java parser class.
  19
  20 - glr.c
  21   A Generalized LR C parser based on Bison's LALR(1) tables.
  22
  23 - glr.cc
  24   A Generalized LR C++ parser.  Actually a C++ wrapper around glr.c.
  25
  26 These skeletons are the only ones supported by the Bison team.  Because the
  27 interface between skeletons and the bison program is not finished, *we are
  28 not bound to it*.  In particular, Bison is not mature enough for us to
  29 consider that "foreign skeletons" are supported.
  30
  31 ## m4sugar
  32 This directory contains M4sugar, sort of an extended library for M4, which
  33 is used by Bison to instantiate the skeletons.
  34
  35 ## xslt
  36 This directory contains XSLT programs that transform Bison's XML output into
  37 various formats.
  38
  39 - bison.xsl
  40   A library of routines used by the other XSLT programs.
  41
  42 - xml2dot.xsl
  43   Conversion into GraphViz's dot format.
  44
  45 - xml2text.xsl
  46   Conversion into text.
  47
  48 - xml2xhtml.xsl
  49   Conversion into XHTML.
  50
  51 # Implementation Notes About the Skeletons
  52
  53 "Skeleton" in Bison parlance means "backend": a skeleton is fed by the bison
  54 executable with LR tables, facts about the symbols, etc. and they generate
  55 the output (say parser.cc, parser.hh, location.hh, etc.).  They are only in
  56 charge of generating the parser and its auxiliary files, they do not
  57 generate the XML output, the parser.output reports, nor the graphical
  58 rendering.
  59
  60 The bits of information passing from bison to the backend is named
  61 "muscles".  Muscles are passed to M4 via its standard input: it's a set of
  62 m4 definitions.  To see them, use `--trace=muscles`.
  63
  64 Except for muscles, whose names are generated by bison, the skeletons have
  65 no constraint at all on the macro names: there is no technical/theoretical
  66 limitation, as long as you generate the output, you can do what you want.
  67 However, of course, that would be a bad idea if, say, the C and C++
  68 skeletons used different approaches and had completely different
  69 implementations.  That would be a maintenance nightmare.
  70
  71 Below, we document some of the macros that we use in several of the
  72 skeletons.  If you are to write a new skeleton, please, implement them for
  73 your language.  Overall, be sure to follow the same patterns as the existing
  74 skeletons.
  75
  76 ## Vocabulary
  77
  78 We use "formal arguments", or "formals" for short, to denote the declared
  79 parameters of a function (e.g., `int argc, const char **argv`).  Yes, this
  80 is somewhat contradictory with `param` in the `%param` directives.
  81
  82 We use "effective arguments", or "args" for short, to denote the values
  83 passed in function calls (e.g., `argc, argv`).
  84
  85 ## Symbols
  86
  87 ### `b4_symbol(NUM, FIELD)`
  88 In order to unify the handling of the various aspects of symbols (tag, type
  89 name, whether terminal, etc.), bison.exe defines one macro per (token,
  90 field), where field can `has_id`, `id`, etc.: see
  91 `prepare_symbol_definitions()` in `src/output.c`.
  92
  93 NUM can be:
  94 - `empty` to denote the "empty" pseudo-symbol when it exists,
  95 - `eof`, `error`, or `undef`
  96 - a symbol number.
  97
  98 FIELD can be:
  99
 100 - `has_id`: 0 or 1
 101   Whether the symbol has an `id`.
 102
 103 - `id`: string (e.g., `exp`, `NUM`, or `TOK_NUM` with api.token.prefix)
 104   If `has_id`, the name of the token kind (prefixed by api.token.prefix if
 105   defined), otherwise empty.  Guaranteed to be usable as a C identifier.
 106   This is used to define the token kind (i.e., the enum used by the return
 107   value of yylex).  Should be named `token_kind`.
 108
 109 - `tag`: string
 110   A human readable representation of the symbol.  Can be `'foo'`,
 111   `'foo.id'`, `'"foo"'` etc.
 112
 113 - `code`: integer
 114   The token code associated to the token kind `id`.
 115   The external number as used by yylex.  Can be ASCII code when a character,
 116   some number chosen by bison, or some user number in the case of `%token
 117   FOO <NUM>`.  Corresponds to `yychar` in `yacc.c`.
 118
 119 - `is_token`: 0 or 1
 120   Whether this is a terminal symbol.
 121
 122 - `kind_base`: string (e.g., `YYSYMBOL_exp`, `YYSYMBOL_NUM`)
 123   The base of the symbol kind, i.e., the enumerator of this symbol (token or
 124   nonterminal) which is mapped to its `number`.
 125
 126 - `kind`: string
 127   Same as `kind_base`, but possibly with a prefix in some languages.  E.g.,
 128   EOF's `kind_base` and `kind` are `YYSYMBOL_YYEOF` in C, but are
 129   `S_YYEMPTY` and `symbol_kind::S_YYEMPTY` in C++.
 130
 131 - `number`: integer
 132   The code associated to the `kind`.
 133   The internal number (computed from the external number by yytranslate).
 134   Corresponds to yytoken in yacc.c.  This is the same number that serves as
 135   key in b4_symbol(NUM, FIELD).
 136
 137   In bison, symbols are first assigned increasing numbers in order of
 138   appearance (but tokens first, then nterms).  After grammar reduction,
 139   unused nterms are then renumbered to appear last (i.e., first tokens, then
 140   used nterms and finally unused nterms).  This final number NUM is the one
 141   contained in this field, and it is the one used as key in `b4_symbol(NUM,
 142   FIELD)`.
 143
 144   The code of the rule actions, however, is emitted before we know what
 145   symbols are unused, so they use the original numbers.  To avoid confusion,
 146   they actually use "orig NUM" instead of just "NUM".  bison also emits
 147   definitions for `b4_symbol(orig NUM, number)` that map from original
 148   numbers to the new ones.  `b4_symbol` actually resolves `orig NUM` in the
 149   other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the
 150   symbols whose original number was 42.
 151
 152 - `has_type`: 0, 1
 153   Whether has a semantic value.
 154
 155 - `type_tag`: string
 156   When api.value.type=union, the generated name for the union member.
 157   yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
 158
 159 - `type`: string
 160   If it has a semantic value, its type tag, or, if variant are used,
 161   its type.
 162   In the case of api.value.type=union, type is the real type (e.g. int).
 163
 164 - `slot`: string
 165   If it has a semantic value, the name of the union member (i.e., bounces to
 166   either `type_tag` or `type`).  It would be better to fix our mess and
 167   always use `type` for the true type of the member, and `type_tag` for the
 168   name of the union member.
 169
 170 - `has_printer`: 0, 1
 171 - `printer`: string
 172 - `printer_file`: string
 173 - `printer_line`: integer
 174 - `printer_loc`: location
 175   If the symbol has a printer, everything about it.
 176
 177 - `has_destructor`, `destructor`, `destructor_file`, `destructor_line`, `destructor_loc`
 178   Likewise.
 179
 180 ### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])`
 181 Expansion of $$, $1, $<TYPE-TAG>3, etc.
 182
 183 The semantic value from a given VAL.
 184 - `VAL`: some semantic value storage (typically a union).  e.g., `yylval`
 185 - `SYMBOL-NUM`: the symbol number from which we extract the type tag.
 186 - `TYPE-TAG`, the user forced the `<TYPE-TAG>`.
 187
 188 The result can be used safely, it is put in parens to avoid nasty precedence
 189 issues.
 190
 191 ### `b4_lhs_value(SYMBOL-NUM, [TYPE])`
 192 Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.
 193
 194 ### `b4_rhs_data(RULE-LENGTH, POS)`
 195 The data corresponding to the symbol `#POS`, where the current rule has
 196 `RULE-LENGTH` symbols on RHS.
 197
 198 ### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])`
 199 Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
 200 on RHS.
 201
 202 <!--
 203
 204 Local Variables:
 205 mode: markdown
 206 fill-column: 76
 207 ispell-dictionary: "american"
 208 End:
 209
 210 Copyright (C) 2002, 2008-2015, 2018-2021 Free Software Foundation, Inc.
 211
 212 This file is part of GNU Bison.
 213
 214 This program is free software: you can redistribute it and/or modify
 215 it under the terms of the GNU General Public License as published by
 216 the Free Software Foundation, either version 3 of the License, or
 217 (at your option) any later version.
 218
 219 This program is distributed in the hope that it will be useful,
 220 but WITHOUT ANY WARRANTY; without even the implied warranty of
 221 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 222 GNU General Public License for more details.
 223
 224 You should have received a copy of the GNU General Public License
 225 along with this program.  If not, see <https://www.gnu.org/licenses/>.
 226
 227 -->