TODO

   1 * Soon
   2 ** POSIX updates
   3 See the recent changes about function prototypes in POSIX Yacc.  Implement
   4 them.
   5
   6 ** Missing tests
   7 commit c22902e360e0fbbe9fd5657dcf107e03166da309
   8 Author: Akim Demaille <akim.demaille@gmail.com>
   9 Date:   Sat Jan 23 18:40:15 2021 +0100
  10
  11     tables: fix handling for useless tokens
  12
  13 See https://github.com/akimd/bison/issues/72#issuecomment-766153154
  14
  15 commit 2c294c132528ede23d8ae4959783a67e9ff05ac5
  16 Author: Vincent Imbimbo <vmi6@cornell.edu>
  17 Date:   Sat Jan 23 13:25:18 2021 -0500
  18
  19     cex: fix state-item pruning
  20
  21 See https://lists.gnu.org/r/bug-bison/2021-01/msg00002.html
  22
  23 ** pos_set_set
  24 The current approach is correct, but with poor performances.  Bitsets need
  25 to support 'assign' and 'shift'.  And instead of extending POS_SET just for
  26 the out-of-range new values, we need something like doubling the size.
  27
  28 ** glr
  29 There is no test with "Parse on stack %ld rejected by rule %d" in it.
  30
  31 ** yyrline etc.
  32 Clarify that rule numbers in the skeletons are 1-based.
  33
  34 ** Macros in C++
  35 There are many macros that should obey api.prefix: YY_CPLUSPLUS, YY_MOVE,
  36 etc.
  37
  38 ** YYDEBUG etc. in C++
  39 Discourage the use of YYDEBUG in C++ (see thread with Jot).
  40
  41 ** yyerrok in Java
  42 And add tests in calc.at, to prepare work for D.
  43
  44 ** YYERROR and yynerrs
  45 We are missing some cases.  Write a test case, and check all the skeletons.
  46
  47 ** Cex
  48 *** Improve gnulib
  49 Don't do this (counterexample.c):
  50
  51 // This is the fastest way to get the tail node from the gl_list API.
  52 gl_list_node_t
  53 list_get_end (gl_list_t list)
  54 {
  55   gl_list_node_t sentinel = gl_list_add_last (list, NULL);
  56   gl_list_node_t res = gl_list_previous_node (list, sentinel);
  57   gl_list_remove_node (list, sentinel);
  58   return res;
  59 }
  60
  61 *** Ambiguous rewriting
  62 If the user is stupid enough to have equal rules, then the derivations are
  63 harder to read:
  64
  65     Reduce/reduce conflict on tokens $end, "+", "⊕":
  66         2 exp: exp "+" exp .
  67         3 exp: exp "+" exp .
  68       Example                  exp "+" exp •
  69       First derivation         exp ::=[ exp "+" exp • ]
  70       Example                  exp "+" exp •
  71       Second derivation        exp ::=[ exp "+" exp • ]
  72
  73 Do we care about this?  In color, we use twice the same color here, but we
  74 could try to use the same color for the same rule.
  75
  76 *** XML reports
  77 Show the counterexamples.  This is going to be really hard and/or painful.
  78 Unless we play it dumb (little structure).
  79
  80 ** Bistromathic
  81 - How about not evaluating incomplete lines when the text is not finished
  82   (as shells do).
  83
  84 ** Questions
  85 *** Java
  86 - Should i18n be part of the Lexer?  Currently it's a static method of
  87   Lexer.
  88
  89 - is there a migration path that would allow to use TokenKinds in
  90   yylex?
  91
  92 - define the tokens as an enum too.
  93
  94 - promote YYEOF rather than EOF.
  95
  96 ** YYerror
  97 https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gettext-runtime/intl/plural.y;h=a712255af4f2f739c93336d4ff6556d932a426a5;hb=HEAD
  98
  99 should be updated to not use YYERRCODE.  Returning an undef token is good
 100 enough.
 101
 102 ** Java
 103 *** calc.at
 104 Stop hard-coding "Calc".  Adjust local.at (look for FIXME).
 105
 106 ** A dev warning for b4_
 107 Maybe we should check for m4_ and b4_ leaking out of the m4 processing, as
 108 Autoconf does.  It would have caught over-quotation issues.
 109
 110 ** doc
 111 I feel it's ugly to use the GNU style to declare functions in the doc.  It
 112 generates tons of white space in the page, and may contribute to bad page
 113 breaks.
 114
 115 ** consistency
 116 token vs terminal.
 117
 118 ** api.token.raw
 119 The YYUNDEFTOK could be assigned a semantic value so that yyerror could be
 120 used to report invalid lexemes.
 121
 122 ** push parsers
 123 Consider deprecating impure push parsers.  They add a lot of complexity, for
 124 a bad feature.  On the other hand, that would make it much harder to sit
 125 push parsers on top of pull parser.  Which is currently not relevant, since
 126 push parsers are measurably slower.
 127
 128 ** %define parse.error formatted
 129 How about pushing Bistromathic's yyreport_syntax_error as another standard
 130 way to generate the error message, and leave to the user the task of
 131 providing the message formats?  Currently in bistro, it reads:
 132
 133     const char *
 134     error_format_string (int argc)
 135     {
 136       switch (argc)
 137         {
 138         default: /* Avoid compiler warnings. */
 139         case 0: return _("%@: syntax error");
 140         case 1: return _("%@: syntax error: unexpected %u");
 141           // TRANSLATORS: '%@' is a location in a file, '%u' is an
 142           // "unexpected token", and '%0e', '%1e'... are expected tokens
 143           // at this point.
 144           //
 145           // For instance on the expression "1 + * 2", you'd get
 146           //
 147           // 1.5: syntax error: expected - or ( or number or function or variable before *
 148         case 2: return _("%@: syntax error: expected %0e before %u");
 149         case 3: return _("%@: syntax error: expected %0e or %1e before %u");
 150         case 4: return _("%@: syntax error: expected %0e or %1e or %2e before %u");
 151         case 5: return _("%@: syntax error: expected %0e or %1e or %2e or %3e before %u");
 152         case 6: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e before %u");
 153         case 7: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e or %5e before %u");
 154         case 8: return _("%@: syntax error: expected %0e or %1e or %2e or %3e or %4e or %5e or %6e before %u");
 155         }
 156     }
 157
 158 The message would have to be generated in a string, and pushed to yyerror.
 159 Which will be a pain in the neck in yacc.c.
 160
 161 If we want to do that, we should think very carefully about the syntax of
 162 the format string.
 163
 164 ** yyclearin does not invoke the lookahead token's %destructor
 165 https://lists.gnu.org/r/bug-bison/2018-02/msg00000.html
 166 Rici:
 167
 168 > Modifying yyclearin so that it calls yydestruct seems like the simplest
 169 > solution to this issue, but it is conceivable that such a change would
 170 > break programs which already perform some kind of workaround in order to
 171 > destruct the lookahead symbol. So it might be necessary to use some kind of
 172 > compatibility %define, or to create a new replacement macro with a
 173 > different name such as yydiscardin.
 174 >
 175 > At a minimum, the fact that yyclearin does not invoke the %destructor
 176 > should be highlighted in the documentation, since it is not at all obvious.
 177
 178 ** Issues in i18n
 179
 180 Les catégories d'avertissements incluent :
 181   conflicts-sr      conflits S/R (activé par défaut)
 182   conflicts-rr      conflits R/R (activé par défaut)
 183   dangling-alias    l'alias chaîne n'est pas attaché à un symbole
 184   deprecated        construction obsolète
 185   empty-rule        règle vide sans %empty
 186   midrule-values    valeurs de règle intermédiaire non définies ou inutilisées
 187   precedence        priorité et associativité inutiles
 188   yacc              incompatibilités avec POSIX Yacc
 189   other             tous les autres avertissements (activé par défaut)
 190   all               tous les avertissements sauf « dangling-alias » et « yacc »
 191   no-CATEGORY       désactiver les avertissements dans CATEGORIE
 192   none              désactiver tous les avertissements
 193   error[=CATEGORY]  traiter les avertissements comme des erreurs
 194
 195 Line -1 and -3 should mention CATEGORIE, not CATEGORY.
 196
 197 * Bison 3.8
 198 ** Rewrite glr.cc (currently glr2.cc)
 199 *** Remove jumps
 200 We can probably replace setjmp/longjmp with exceptions.  That would help
 201 tremendously other languages such as D and Java that probably have no
 202 similar feature.  If we remove jumps, we probably no longer need _Noreturn,
 203 so simplify `b4_attribute_define([noreturn])` into `b4_attribute_define`.
 204
 205 After discussing with Valentin, it was decided that it's better to stay with
 206 jumps, since in some places exceptions are ruled out from C++.
 207
 208 *** Coding style
 209 Move to our coding conventions.  In particular names such as yy_glr_stack,
 210 not yyGLRStack.
 211
 212 *** yydebug
 213 It should be a member of the parser object, see lalr1.cc.  Let the parser
 214 object decide what the debug stream is, rather than open coding std::cerr.
 215
 216 *** Avoid pointers
 217 There are many places where pointers should be replaced with references.
 218 Some occurrences were fixed, but now some have improper names:
 219
 220 -yygetToken (int *yycharp, ]b4_namespace_ref[::]b4_parser_class[& yyparser][]b4_pure_if([, glr_stack* yystackp])[]b4_user_formals[)
 221 +yygetToken (int& yycharp, ]b4_namespace_ref[::]b4_parser_class[& yyparser][]b4_pure_if([, glr_stack* yystackp])[]b4_user_formals[)
 222
 223 yycharp is no longer a Pointer.  And yystackp should probably also be a reference.
 224
 225 *** parse.assert
 226 Currently all the assertions are enabled.  Once we are confident in glr2.cc,
 227 let parse.assert use the same approach as in lalr1.cc.
 228
 229 *** debug_stream
 230 Stop using std::cerr everywhere.
 231
 232 *** glr.c
 233 When glr2.cc fully replaces glr.cc, get rid of the glr.cc scaffolding in
 234 glr.c.
 235
 236 * Chains
 237 ** Unit rules / Injection rules (Akim Demaille)
 238 Maybe we could expand unit rules (or "injections", see
 239 https://homepages.cwi.nl/~daybuild/daily-books/syntax/2-sdf/sdf.html), i.e.,
 240 transform
 241
 242         exp: arith | bool;
 243         arith: exp '+' exp;
 244         bool: exp '&' exp;
 245
 246 into
 247
 248         exp: exp '+' exp | exp '&' exp;
 249
 250 when there are no actions.  This can significantly speed up some grammars.
 251 I can't find the papers.  In particular the book 'LR parsing: Theory and
 252 Practice' is impossible to find, but according to 'Parsing Techniques: a
 253 Practical Guide', it includes information about this issue.  Does anybody
 254 have it?
 255
 256 ** clean up (Akim Demaille)
 257 Do not work on these items now, as I (Akim) have branches with a lot of
 258 changes in this area (hitting several files), and no desire to have to fix
 259 conflicts.  Addressing these items will happen after my branches have been
 260 merged.
 261
 262 *** lalr.c
 263 Introduce a goto struct, and use it in place of from_state/to_state.
 264 Rename states1 as path, length as pathlen.
 265 Introduce inline functions for things such as nullable[*rp - ntokens]
 266 where we need to map from symbol number to nterm number.
 267
 268 There are probably a significant part of the relations management that
 269 should be migrated on top of a bitsetv.
 270
 271 *** closure
 272 It should probably take a "state*" instead of two arguments.
 273
 274 *** traces
 275 The "automaton" and "set" categories are not so useful.  We should probably
 276 introduce lr(0) and lalr, just the way we have ielr categories.  The
 277 "closure" function is too verbose, it should probably have its own category.
 278
 279 "set" can still be used for summarizing the important sets.  That would make
 280 tests easy to maintain.
 281
 282 *** complain.*
 283 Rename these guys as "diagnostics.*" (or "diagnose.*"), since that's the
 284 name they have in GCC, clang, etc.  Likewise for the complain_* series of
 285 functions.
 286
 287 *** ritem
 288 states/nstates, rules/nrules, ..., ritem/nritems
 289 Fix the latter.
 290
 291 *** m4: slot, type, type_tag
 292 The meaning of type_tag varies depending on api.value.type.  We should avoid
 293 that and using clear definitions with stable semantics.
 294
 295 * D programming language
 296 There's a number of features that are missing, here sorted in _suggested_
 297 order of implementation.
 298
 299 When copying code from other skeletons, keep the comments exactly as they
 300 are.  Keep the same variable names.  If you change the wording in one place,
 301 do it in the others too.  In other words: make sure to keep the
 302 maintenance *simple* by avoiding any gratuitous difference.
 303
 304 ** CI
 305 Check when gdc and ldc.
 306
 307 ** GLR Parser
 308 This is very ambitious.  That's the final boss.  There are currently no
 309 "clean" implementation to get inspiration from.
 310
 311 glr.c is very clean but:
 312 - is low-level C
 313 - is a different skeleton from yacc.c
 314
 315 glr.cc is (currently) an ugly hack: a C++ shell around glr.c.  Valentin
 316 Tolmer is currently rewriting glr.cc to be clean C++, but he is not
 317 finished.  There will be a lot a common code between lalr1.cc and glr.cc, so
 318 eventually I would like them to be fused into a single skeleton, supporting
 319 both deterministic and generalized parsing.
 320
 321 It would be great for D to also support this.
 322
 323 The basic ideas of GLR are explained here:
 324
 325 https://www.codeproject.com/Articles/5259825/GLR-Parsing-in-Csharp-How-to-Use-The-Most-Powerful
 326
 327 * Better error messages
 328 The users are not provided with enough tools to forge their error messages.
 329 See for instance "Is there an option to change the message produced by
 330 YYERROR_VERBOSE?" by Simon Sobisch, on bison-help.
 331
 332 See also
 333 https://www.cs.tufts.edu/~nr/cs257/archive/clinton-jefferey/lr-error-messages.pdf
 334 https://research.swtch.com/yyerror
 335 http://gallium.inria.fr/~fpottier/publis/fpottier-reachability-cc2016.pdf
 336
 337 * Modernization
 338 Fix data/skeletons/yacc.c so that it defines YYPTRDIFF_T properly for modern
 339 and older C++ compilers.  Currently the code defaults to defining it to
 340 'long' for non-GCC compilers, but it should use the proper C++ magic to
 341 define it to the same type as the C ptrdiff_t type.
 342
 343 * Completion
 344 Several features are not available in all the back-ends.
 345
 346 - push parsers: glr.c, glr.cc, lalr1.cc (not very difficult)
 347 - token constructors: Java, C, D (a bit difficult)
 348 - glr: D, Java (super difficult)
 349
 350 * Bugs
 351 ** Autotest has quotation issues
 352 tests/input.at:1730:AT_SETUP([%define errors])
 353
 354 ->
 355
 356 $ ./tests/testsuite -l | grep errors | sed q
 357   38: input.at:1730      errors
 358
 359 * Short term
 360 ** Better design for diagnostics
 361 The current implementation of diagnostics is ad hoc, it grew organically.
 362 It works as a series of calls to several functions, with dependency of the
 363 latter calls on the former.  For instance:
 364
 365       complain (&sym->location,
 366                 sym->content->status == needed ? complaint : Wother,
 367                 _("symbol %s is used, but is not defined as a token"
 368                   " and has no rules; did you mean %s?"),
 369                 quote_n (0, sym->tag),
 370                 quote_n (1, best->tag));
 371       if (feature_flag & feature_caret)
 372         location_caret_suggestion (sym->location, best->tag, stderr);
 373
 374 We should rewrite this in a more FP way:
 375
 376 1. build a rich structure that denotes the (complete) diagnostic.
 377    "Complete" in the sense that it also contains the suggestions, the list
 378    of possible matches, etc.
 379
 380 2. send this to the pretty-printing routine.  The diagnostic structure
 381    should be sufficient so that we can generate all the 'format' of
 382    diagnostics, including the fixits.
 383
 384 If properly done, this diagnostic module can be detached from Bison and be
 385 put in gnulib.  It could be used, for instance, for errors caught by
 386 xgettext.
 387
 388 There's certainly already something alike in GCC.  At least that's the
 389 impression I get from reading the "-fdiagnostics-format=FORMAT" part of this
 390 page:
 391
 392 https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html
 393
 394 ** Graphviz display code thoughts
 395 The code for the --graph option is over two files: print_graph, and
 396 graphviz. This is because Bison used to also produce VCG graphs, but since
 397 this is no longer true, maybe we could consider these files for fusion.
 398
 399 An other consideration worth noting is that print_graph.c (correct me if I
 400 am wrong) should contain generic functions, whereas graphviz.c and other
 401 potential files should contain just the specific code for that output
 402 format. It will probably prove difficult to tell if the implementation is
 403 actually generic whilst only having support for a single format, but it
 404 would be nice to keep stuff a bit tidier: right now, the construction of the
 405 bitset used to show reductions is in the graphviz-specific code, and on the
 406 opposite side we have some use of \l, which is graphviz-specific, in what
 407 should be generic code.
 408
 409 Little effort seems to have been given to factoring these files and their
 410 print{,-xml} counterpart. We would very much like to re-use the pretty format
 411 of states from .output for the graphs, etc.
 412
 413 Since graphviz dies on medium-to-big grammars, maybe consider an other tool?
 414
 415 ** push-parser
 416 Check it too when checking the different kinds of parsers.  And be
 417 sure to check that the initial-action is performed once per parsing.
 418
 419 ** m4 names
 420 b4_shared_declarations is no longer what it is.  Make it
 421 b4_parser_declaration for instance.
 422
 423 ** yychar in lalr1.cc
 424 There is a large difference bw maint and master on the handling of
 425 yychar (which was removed in lalr1.cc).  See what needs to be
 426 back-ported.
 427
 428
 429     /* User semantic actions sometimes alter yychar, and that requires
 430        that yytoken be updated with the new translation.  We take the
 431        approach of translating immediately before every use of yytoken.
 432        One alternative is translating here after every semantic action,
 433        but that translation would be missed if the semantic action
 434        invokes YYABORT, YYACCEPT, or YYERROR immediately after altering
 435        yychar.  In the case of YYABORT or YYACCEPT, an incorrect
 436        destructor might then be invoked immediately.  In the case of
 437        YYERROR, subsequent parser actions might lead to an incorrect
 438        destructor call or verbose syntax error message before the
 439        lookahead is translated.  */
 440
 441     /* Make sure we have latest lookahead translation.  See comments at
 442        user semantic actions for why this is necessary.  */
 443     yytoken = yytranslate_ (yychar);
 444
 445
 446 ** Get rid of fake #lines [Bison: ...]
 447 Possibly as simple as checking whether the column number is nonnegative.
 448
 449 I have seen messages like the following from GCC.
 450
 451 <built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
 452
 453
 454 ** Discuss about %printer/%destroy in the case of C++.
 455 It would be very nice to provide the symbol classes with an operator<<
 456 and a destructor.  Unfortunately the syntax we have chosen for
 457 %destroy and %printer make them hard to reuse.  For instance, the user
 458 is invited to write something like
 459
 460    %printer { debug_stream() << $$; } <my_type>;
 461
 462 which is hard to reuse elsewhere since it wants to use
 463 "debug_stream()" to find the stream to use.  The same applies to
 464 %destroy: we told the user she could use the members of the Parser
 465 class in the printers/destructors, which is not good for an operator<<
 466 since it is no longer bound to a particular parser, it's just a
 467 (standalone symbol).
 468
 469 * Various
 470 ** Rewrite glr.cc in C++ (Valentin Tolmer)
 471 As a matter of fact, it would be very interesting to see how much we can
 472 share between lalr1.cc and glr.cc.  Most of the skeletons should be common.
 473 It would be a very nice source of inspiration for the other languages.
 474
 475 Valentin Tolmer is working on this.
 476
 477 * From lalr1.cc to yacc.c
 478 ** Single stack
 479 Merging the three stacks in lalr1.cc simplified the code, prompted for
 480 other improvements and also made it faster (probably because memory
 481 management is performed once instead of three times).  I suggest that
 482 we do the same in yacc.c.
 483
 484 (Some time later): it's also very nice to have three stacks: it's more dense
 485 as we don't lose bits to padding.  For instance the typical stack for states
 486 will use 8 bits, while it is likely to consume 32 bits in a struct.
 487
 488 We need trustworthy benchmarks for Bison, for all our backends.  Akim has a
 489 few things scattered around; we need to put them in the repo, and make them
 490 more useful.
 491
 492 * Report
 493
 494 ** Figures
 495 Some statistics about the grammar and the parser would be useful,
 496 especially when asking the user to send some information about the
 497 grammars she is working on.  We should probably also include some
 498 information about the variables (I'm not sure for instance we even
 499 specify what LR variant was used).
 500
 501 ** GLR
 502 How would Paul like to display the conflicted actions?  In particular,
 503 what when two reductions are possible on a given lookahead token, but one is
 504 part of $default.  Should we make the two reductions explicit, or just
 505 keep $default?  See the following point.
 506
 507 ** Disabled Reductions
 508 See 'tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
 509 what we want to do.
 510
 511 ** Documentation
 512 Extend with error productions.  The hard part will probably be finding
 513 the right rule so that a single state does not exhibit too many yet
 514 undocumented ''features''.  Maybe an empty action ought to be
 515 presented too.  Shall we try to make a single grammar with all these
 516 features, or should we have several very small grammars?
 517
 518 * Extensions
 519 ** More languages?
 520 Well, only if there is really some demand for it.
 521
 522 *** PHP
 523 https://github.com/scfc/bison-php/blob/master/data/lalr1.php
 524
 525 *** Python
 526 https://lists.gnu.org/r/bison-patches/2013-09/msg00000.html and following
 527
 528 ** Multiple start symbols
 529 Would be very useful when parsing closely related languages.  The idea is to
 530 declare several start symbols, for instance
 531
 532     %start stmt expr
 533     %%
 534     stmt: ...
 535     expr: ...
 536
 537 and to generate parse(), parse_stmt() and parse_expr().  Technically, the
 538 above grammar would be transformed into
 539
 540    %start yy_start
 541    %token YY_START_STMT YY_START_EXPR
 542    %%
 543    yy_start: YY_START_STMT stmt | YY_START_EXPR expr
 544
 545 so that there are no new conflicts in the grammar (as would undoubtedly
 546 happen with yy_start: stmt | expr).  Then adjust the skeletons so that this
 547 initial token (YY_START_STMT, YY_START_EXPR) be shifted first in the
 548 corresponding parse function.
 549
 550 *** Number of useless symbols
 551 AT_TEST(
 552 [[%start exp;
 553 exp: exp;]],
 554 [[input.y: warning: 2 nonterminals useless in grammar [-Wother]
 555 input.y: warning: 2 rules useless in grammar [-Wother]
 556 input.y:2.8-10: error: start symbol exp does not derive any sentence]])
 557
 558 We should say "1 nonterminal": the other one is $accept, which should not
 559 participate in the count.
 560
 561 *** Tokens
 562 Do we want to disallow terminal start symbols?  The limitation is not
 563 technical.  Can it be useful to someone to "parse" a token?
 564
 565 ** %include
 566 This is a popular demand.  We already made many changes in the parser that
 567 should make this reasonably easy to implement.
 568
 569 Bruce Mardle <marblypup@yahoo.co.uk>
 570 https://lists.gnu.org/r/bison-patches/2015-09/msg00000.html
 571
 572 However, there are many other things to do before having such a feature,
 573 because I don't want a % equivalent to #include (which we all learned to
 574 hate).  I want something that builds "modules" of grammars, and assembles
 575 them together, paying attention to keep separate bits separated, in pseudo
 576 name spaces.
 577
 578 ** Push parsers
 579 There is demand for push parsers in C++.
 580
 581 ** Generate code instead of tables
 582 This is certainly quite a lot of work.  See
 583 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.4539.
 584
 585 ** $-1
 586 We should find a means to provide an access to values deep in the
 587 stack.  For instance, instead of
 588
 589         baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
 590
 591 we should be able to have:
 592
 593   foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
 594
 595 Or something like this.
 596
 597 ** %if and the like
 598 It should be possible to have %if/%else/%endif.  The implementation is
 599 not clear: should it be lexical or syntactic.  Vadim Maslow thinks it
 600 must be in the scanner: we must not parse what is in a switched off
 601 part of %if.  Akim Demaille thinks it should be in the parser, so as
 602 to avoid falling into another CPP mistake.
 603
 604 (Later): I'm sure there's actually good case for this.  People who need that
 605 feature can use m4/cpp on top of Bison.  I don't think it is worth the
 606 trouble in Bison itself.
 607
 608 ** XML Output
 609 There are couple of available extensions of Bison targeting some XML
 610 output.  Some day we should consider including them.  One issue is
 611 that they seem to be quite orthogonal to the parsing technique, and
 612 seem to depend mostly on the possibility to have some code triggered
 613 for each reduction.  As a matter of fact, such hooks could also be
 614 used to generate the yydebug traces.  Some generic scheme probably
 615 exists in there.
 616
 617 XML output for GNU Bison and gcc
 618    http://www.cs.may.ie/~jpower/Research/bisonXML/
 619
 620 XML output for GNU Bison
 621    http://yaxx.sourceforge.net/
 622
 623 * Coding system independence
 624 Paul notes:
 625
 626         Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
 627         255).  It also assumes that the 8-bit character encoding is
 628         the same for the invocation of 'bison' as it is for the
 629         invocation of 'cc', but this is not necessarily true when
 630         people run bison on an ASCII host and then use cc on an EBCDIC
 631         host.  I don't think these topics are worth our time
 632         addressing (unless we find a gung-ho volunteer for EBCDIC or
 633         PDP-10 ports :-) but they should probably be documented
 634         somewhere.
 635
 636         More importantly, Bison does not currently allow NUL bytes in
 637         tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
 638         the source code.  This should get fixed.
 639
 640 * Broken options?
 641 ** %token-table
 642 ** Skeleton strategy
 643 Must we keep %token-table?
 644
 645 * Precedence
 646
 647 ** Partial order
 648 It is unfortunate that there is a total order for precedence.  It
 649 makes it impossible to have modular precedence information.  We should
 650 move to partial orders (sounds like series/parallel orders to me).
 651
 652 This is a prerequisite for modules.
 653
 654 * Pre and post actions.
 655 From: Florian Krohm <florian@edamail.fishkill.ibm.com>
 656 Subject: YYACT_EPILOGUE
 657 To: bug-bison@gnu.org
 658 X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
 659
 660 The other day I had the need for explicitly building the parse tree. I
 661 used %locations for that and defined YYLLOC_DEFAULT to call a function
 662 that returns the tree node for the production. Easy. But I also needed
 663 to assign the S-attribute to the tree node. That cannot be done in
 664 YYLLOC_DEFAULT, because it is invoked before the action is executed.
 665 The way I solved this was to define a macro YYACT_EPILOGUE that would
 666 be invoked after the action. For reasons of symmetry I also added
 667 YYACT_PROLOGUE. Although I had no use for that I can envision how it
 668 might come in handy for debugging purposes.
 669 All is needed is to add
 670
 671 #if YYLSP_NEEDED
 672     YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
 673 #else
 674     YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
 675 #endif
 676
 677 at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
 678
 679 I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
 680 to bison. If you're interested, I'll work on a patch.
 681
 682 * Better graphics
 683 Equip the parser with a means to create the (visual) parse tree.
 684
 685
 686 -----
 687
 688 # LocalWords:  Cex gnulib gl Bistromathic TokenKinds yylex enum YYEOF EOF
 689 # LocalWords:  YYerror gettext af hb YYERRCODE undef calc FIXME dev yyerror
 690 # LocalWords:  Autoconf YYUNDEFTOK lexemes parsers Bistromathic's yyreport
 691 # LocalWords:  const argc yacc yyclearin lookahead destructor Rici incluent
 692 # LocalWords:  yydestruct yydiscardin catégories d'avertissements sr activé
 693 # LocalWords:  conflits défaut rr l'alias chaîne n'est attaché un symbole
 694 # LocalWords:  obsolète règle vide midrule valeurs de intermédiaire ou avec
 695 # LocalWords:  définies inutilisées priorité associativité inutiles POSIX
 696 # LocalWords:  incompatibilités tous les autres avertissements sauf dans rp
 697 # LocalWords:  désactiver CATEGORIE traiter comme des erreurs glr Akim bool
 698 # LocalWords:  Demaille arith lalr goto struct pathlen nullable ntokens lr
 699 # LocalWords:  nterm bitsetv ielr ritem nstates nrules nritems yysymbol EQ
 700 # LocalWords:  SymbolKind YYEMPTY YYUNDEF YYTNAME NUM yyntokens yytname sed
 701 # LocalWords:  nonterminals yykind yycode YYNAMES yynames init getName conv
 702 # LocalWords:  TokenKind ival yychar yylval yylexer Tolmer hoc
 703 # LocalWords:  Sobisch YYPTRDIFF ptrdiff Autotest toknum yytoknum
 704 # LocalWords:  sym Wother stderr FP fixits xgettext fdiagnostics Graphviz
 705 # LocalWords:  graphviz VCG bitset xml bw maint yytoken YYABORT deps
 706 # LocalWords:  YYACCEPT yytranslate nonnegative destructors yyerrlab repo
 707 # LocalWords:  backends stmt expr yy Mardle baz qux Vadim Maslow CPP cpp
 708 # LocalWords:  yydebug gcc UCHAR EBCDIC gung PDP NUL Pre Florian Krohm utf
 709 # LocalWords:  YYACT YYLLOC YYLSP yyval yyvsp yylen yyloc yylsp endif
 710 # LocalWords:  ispell american
 711
 712 Local Variables:
 713 mode: outline
 714 coding: utf-8
 715 fill-column: 76
 716 ispell-dictionary: "american"
 717 End:
 718
 719 Copyright (C) 2001-2004, 2006, 2008-2015, 2018-2021 Free Software
 720 Foundation, Inc.
 721
 722 This file is part of Bison, the GNU Compiler Compiler.
 723
 724 Permission is granted to copy, distribute and/or modify this document
 725 under the terms of the GNU Free Documentation License, Version 1.3 or
 726 any later version published by the Free Software Foundation; with no
 727 Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
 728 Texts.  A copy of the license is included in the "GNU Free
 729 Documentation License" file as part of this distribution.