1 \input texinfo @c -*-texinfo-*-
2 @comment %**start of header
3 @setfilename bison.info
4 @documentencoding UTF-8
6 @settitle Bison @value{VERSION}
7 @xrefautomaticsectiontitle on
9 @c cite a reference in text. Could not find a means to have a single
10 @c definition that looks nice in all the output formats.
22 @c cite a reference in parentheses.
25 (@pxref{\ref\,,\ref\})
35 @c ## ---------------------- ##
36 @c ## Diagnostics in color. ##
37 @c ## ---------------------- ##
40 \gdef\rgbGreen{0 .80 0}
44 \gdef\rgbYellow{1 .5 0}
46 \setcolor{\rgbYellow}%
56 \gdef\rgbPurple{0.50 0 0.50}
58 \setcolor{\rgbPurple}%
61 \setcolor{\maincolor}%
64 \gdef\rgbError{0.80 0 0}
68 \gdef\rgbNotice{0 0 0.80}
70 \setcolor{\rgbNotice}%
72 \gdef\rgbWarning{0.50 0 0.50}
74 \setcolor{\rgbWarning}%
77 \setcolor{\maincolor}%
83 @inlineraw{html, <span style="color:green">}
86 @inlineraw{html, <span style="color:#ff8000">}
89 @inlineraw{html, <span style="color:red">}
92 @inlineraw{html, <span style="color:blue">}
95 @inlineraw{html, <span style="color:darkviolet">}
98 @inlineraw{html, </span>}
102 @inlineraw{html, <b style="color:red">}
105 @inlineraw{html, <b style="color:darkcyan">}
108 @inlineraw{html, <b style="color:darkviolet">}
111 @inlineraw{html, </b>}
116 @colorGreen{}\text\@colorOff{}
120 @colorYellow{}\text\@colorOff{}
124 @colorRed{}\text\@colorOff{}
128 @colorBlue{}\text\@colorOff{}
132 @colorPurple{}\text\@colorOff{}
135 @macro dwarning{text}
136 @diagWarning{}\text\@diagOff{}
140 @diagError{}\text\@diagOff{}
144 @diagNotice{}\text\@diagOff{}
149 @c SMALL BOOK version
150 @c This edition has been formatted so that you can format and print it in
151 @c the smallbook format.
153 @c @setchapternewpage odd
155 @c Set following if you want to document %default-prec and %no-default-prec.
156 @c This feature is experimental and may change in future Bison versions.
169 @comment %**end of header
173 This manual (@value{UPDATED}) is for GNU Bison (version @value{VERSION}),
174 the GNU parser generator.
176 Copyright @copyright{} 1988--1993, 1995, 1998--2015, 2018--2022 Free
177 Software Foundation, Inc.
180 Permission is granted to copy, distribute and/or modify this document under
181 the terms of the GNU Free Documentation License, Version 1.3 or any later
182 version published by the Free Software Foundation; with no Invariant
183 Sections, with the Front-Cover texts being ``A GNU Manual,'' and with the
184 Back-Cover Texts as in (a) below. A copy of the license is included in the
185 section entitled ``GNU Free Documentation License.''
187 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and modify
188 this GNU manual. Buying copies from the FSF supports it in developing GNU
189 and promoting software freedom.''
193 @dircategory Software development
195 * bison: (bison). GNU parser generator (Yacc replacement).
200 @subtitle The Yacc-compatible Parser Generator
201 @subtitle @value{UPDATED}, Bison Version @value{VERSION}
203 @author by Charles Donnelly and Richard Stallman
206 @vskip 0pt plus 1filll
209 Published by the Free Software Foundation @*
210 51 Franklin Street, Fifth Floor @*
211 Boston, MA 02110-1301 USA @*
212 Printed copies are available from the Free Software Foundation.@*
215 Cover art by Etienne Suvasa.
227 * Introduction:: What GNU Bison is.
228 * Conditions:: Conditions for using Bison and its output.
229 * Copying:: The GNU General Public License says
230 how you can copy and share Bison.
233 * Concepts:: Basic concepts for understanding Bison.
234 * Examples:: Three simple explained examples of using Bison.
237 * Grammar File:: Writing Bison declarations and rules.
238 * Interface:: C-language interface to the parser function @code{yyparse}.
239 * Algorithm:: How the Bison parser works at run-time.
240 * Error Recovery:: Writing rules for error recovery.
241 * Context Dependency:: What to do if your language syntax is too
242 messy for Bison to handle straightforwardly.
243 * Debugging:: Understanding or debugging Bison parsers.
244 * Invocation:: How to run Bison (to produce the parser implementation).
245 * Other Languages:: Creating C++, D and Java parsers.
246 * History:: How Bison came to be
247 * Versioning:: Dealing with Bison versioning
248 * FAQ:: Frequently Asked Questions
249 * Table of Symbols:: All the keywords of the Bison language are explained.
250 * Glossary:: Basic concepts are explained.
251 * GNU Free Documentation License:: Copying and sharing this manual
252 * Bibliography:: Publications cited in this manual.
253 * Index of Terms:: Cross-references to the text.
256 --- The Detailed Node Listing ---
258 The Concepts of Bison
260 * Language and Grammar:: Languages and context-free grammars,
261 as mathematical ideas.
262 * Grammar in Bison:: How we represent grammars for Bison's sake.
263 * Semantic Values:: Each token or syntactic grouping can have
264 a semantic value (the value of an integer,
265 the name of an identifier, etc.).
266 * Semantic Actions:: Each rule can have an action containing C code.
267 * GLR Parsers:: Writing parsers for general context-free languages.
268 * Locations:: Overview of location tracking.
269 * Bison Parser:: What are Bison's input and output,
270 how is the output used?
271 * Stages:: Stages in writing and running Bison grammars.
272 * Grammar Layout:: Overall structure of a Bison grammar file.
276 * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.
277 * Merging GLR Parses:: Using GLR parsers to resolve ambiguities.
278 * GLR Semantic Actions:: Considerations for semantic values and deferred actions.
279 * Semantic Predicates:: Controlling a parse with arbitrary computations.
283 * RPN Calc:: Reverse Polish Notation Calculator;
284 a first example with no operator precedence.
285 * Infix Calc:: Infix (algebraic) notation calculator.
286 Operator precedence is introduced.
287 * Simple Error Recovery:: Continuing after syntax errors.
288 * Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
289 * Multi-function Calc:: Calculator with memory and trig functions.
290 It uses multiple data-types for semantic values.
291 * Exercises:: Ideas for improving the multi-function calculator.
293 Reverse Polish Notation Calculator
295 * Rpcalc Declarations:: Prologue (declarations) for rpcalc.
296 * Rpcalc Rules:: Grammar Rules for rpcalc, with explanation.
297 * Rpcalc Lexer:: The lexical analyzer.
298 * Rpcalc Main:: The controlling function.
299 * Rpcalc Error:: The error reporting function.
300 * Rpcalc Generate:: Running Bison on the grammar file.
301 * Rpcalc Compile:: Run the C compiler on the output code.
303 Grammar Rules for @code{rpcalc}
305 * Rpcalc Input:: Explanation of the @code{input} nonterminal
306 * Rpcalc Line:: Explanation of the @code{line} nonterminal
307 * Rpcalc Exp:: Explanation of the @code{exp} nonterminal
309 Location Tracking Calculator: @code{ltcalc}
311 * Ltcalc Declarations:: Bison and C declarations for ltcalc.
312 * Ltcalc Rules:: Grammar rules for ltcalc, with explanations.
313 * Ltcalc Lexer:: The lexical analyzer.
315 Multi-Function Calculator: @code{mfcalc}
317 * Mfcalc Declarations:: Bison declarations for multi-function calculator.
318 * Mfcalc Rules:: Grammar rules for the calculator.
319 * Mfcalc Symbol Table:: Symbol table management subroutines.
320 * Mfcalc Lexer:: The lexical analyzer.
321 * Mfcalc Main:: The controlling function.
325 * Grammar Outline:: Overall layout of the grammar file.
326 * Symbols:: Terminal and nonterminal symbols.
327 * Rules:: How to write grammar rules.
328 * Semantics:: Semantic values and actions.
329 * Tracking Locations:: Locations and actions.
330 * Named References:: Using named references in actions.
331 * Declarations:: All kinds of Bison declarations are described here.
332 * Multiple Parsers:: Putting more than one Bison parser in one program.
334 Outline of a Bison Grammar
336 * Prologue:: Syntax and usage of the prologue.
337 * Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
338 * Bison Declarations:: Syntax and usage of the Bison declarations section.
339 * Grammar Rules:: Syntax and usage of the grammar rules section.
340 * Epilogue:: Syntax and usage of the epilogue.
344 * Rules Syntax:: Syntax of the rules.
345 * Empty Rules:: Symbols that can match the empty string.
346 * Recursion:: Writing recursive rules.
349 Defining Language Semantics
351 * Value Type:: Specifying one data type for all semantic values.
352 * Multiple Types:: Specifying several alternative data types.
353 * Type Generation:: Generating the semantic value type.
354 * Union Decl:: Declaring the set of all semantic value types.
355 * Structured Value Type:: Providing a structured semantic value type.
356 * Actions:: An action is the semantic definition of a grammar rule.
357 * Action Types:: Specifying data types for actions to operate on.
358 * Midrule Actions:: Most actions go at the end of a rule.
359 This says when, why and how to use the exceptional
360 action in the middle of a rule.
364 * Using Midrule Actions:: Putting an action in the middle of a rule.
365 * Typed Midrule Actions:: Specifying the semantic type of their values.
366 * Midrule Action Translation:: How midrule actions are actually processed.
367 * Midrule Conflicts:: Midrule actions can cause conflicts.
371 * Location Type:: Specifying a data type for locations.
372 * Actions and Locations:: Using locations in actions.
373 * Printing Locations:: Defining how locations are printed.
374 * Location Default Action:: Defining a general way to compute locations.
378 * Require Decl:: Requiring a Bison version.
379 * Token Decl:: Declaring terminal symbols.
380 * Precedence Decl:: Declaring terminals with precedence and associativity.
381 * Type Decl:: Declaring the choice of type for a nonterminal symbol.
382 * Symbol Decls:: Summary of the Syntax of Symbol Declarations.
383 * Initial Action Decl:: Code run before parsing starts.
384 * Destructor Decl:: Declaring how symbols are freed.
385 * Printer Decl:: Declaring how symbol values are displayed.
386 * Expect Decl:: Suppressing warnings about parsing conflicts.
387 * Start Decl:: Specifying the start symbol.
388 * Pure Decl:: Requesting a reentrant parser.
389 * Push Decl:: Requesting a push parser.
390 * Decl Summary:: Table of all Bison declarations.
391 * %define Summary:: Defining variables to adjust Bison's behavior.
392 * %code Summary:: Inserting code into the parser source.
394 Parser C-Language Interface
396 * Parser Function:: How to call @code{yyparse} and what it returns.
397 * Push Parser Interface:: How to create, use, and destroy push parsers.
398 * Lexical:: You must supply a function @code{yylex}
400 * Error Reporting:: Passing error messages to the user.
401 * Action Features:: Special features for use in actions.
402 * Internationalization:: How to let the parser speak in the user's
405 The Lexical Analyzer Function @code{yylex}
407 * Calling Convention:: How @code{yyparse} calls @code{yylex}.
408 * Special Tokens:: Signaling end-of-file and errors to the parser.
409 * Tokens from Literals:: Finding token kinds from string aliases.
410 * Token Values:: How @code{yylex} must return the semantic value
411 of the token it has read.
412 * Token Locations:: How @code{yylex} must return the text location
413 (line number, etc.) of the token, if the
415 * Pure Calling:: How the calling convention differs in a pure parser
420 * Error Reporting Function:: You must supply a @code{yyerror} function.
421 * Syntax Error Reporting Function:: You can supply a @code{yyreport_syntax_error} function.
423 Parser Internationalization
425 * Enabling I18n:: Preparing your project to support internationalization.
426 * Token I18n:: Preparing tokens for internationalization in error messages.
428 The Bison Parser Algorithm
430 * Lookahead:: Parser looks one token ahead when deciding what to do.
431 * Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
432 * Precedence:: Operator precedence works by resolving conflicts.
433 * Contextual Precedence:: When an operator's precedence depends on context.
434 * Parser States:: The parser is a finite-state-machine with stack.
435 * Reduce/Reduce:: When two rules are applicable in the same situation.
436 * Mysterious Conflicts:: Conflicts that look unjustified.
437 * Tuning LR:: How to tune fundamental aspects of LR-based parsing.
438 * Generalized LR Parsing:: Parsing arbitrary context-free grammars.
439 * Memory Management:: What happens when memory is exhausted. How to avoid it.
443 * Why Precedence:: An example showing why precedence is needed.
444 * Using Precedence:: How to specify precedence and associativity.
445 * Precedence Only:: How to specify precedence only.
446 * Precedence Examples:: How these features are used in the previous example.
447 * How Precedence:: How they work.
448 * Non Operators:: Using precedence for general conflicts.
452 * LR Table Construction:: Choose a different construction algorithm.
453 * Default Reductions:: Disable default reductions.
454 * LAC:: Correct lookahead sets in the parser states.
455 * Unreachable States:: Keep unreachable parser states for debugging.
457 Handling Context Dependencies
459 * Semantic Tokens:: Token parsing can depend on the semantic context.
460 * Lexical Tie-ins:: Token parsing can depend on the syntactic context.
461 * Tie-in Recovery:: Lexical tie-ins have implications for how
462 error recovery rules must be written.
464 Debugging Your Parser
466 * Counterexamples:: Understanding conflicts.
467 * Understanding:: Understanding the structure of your parser.
468 * Graphviz:: Getting a visual representation of the parser.
469 * Xml:: Getting a markup representation of the parser.
470 * Tracing:: Tracing the execution of your parser.
474 * Enabling Traces:: Activating run-time trace support
475 * Mfcalc Traces:: Extending @code{mfcalc} to support traces
479 * Bison Options:: All the options described in detail,
480 in alphabetical order by short options.
481 * Option Cross Key:: Alphabetical list of long options.
482 * Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
486 * Operation Modes:: Options controlling the global behavior of @command{bison}
487 * Diagnostics:: Options controlling the diagnostics
488 * Tuning the Parser:: Options changing the generated parsers
489 * Output Files:: Options controlling the output
491 Parsers Written In Other Languages
493 * C++ Parsers:: The interface to generate C++ parser classes
494 * D Parsers:: The interface to generate D parser classes
495 * Java Parsers:: The interface to generate Java parser classes
499 * A Simple C++ Example:: A short introduction to C++ parsers
500 * C++ Bison Interface:: Asking for C++ parser generation
501 * C++ Parser Interface:: Instantiating and running the parser
502 * C++ Semantic Values:: %union vs. C++
503 * C++ Location Values:: The position and location classes
504 * C++ Parser Context:: You can supply a @code{report_syntax_error} function.
505 * C++ Scanner Interface:: Exchanges between yylex and parse
506 * A Complete C++ Example:: Demonstrating their use
510 * C++ position:: One point in the source file
511 * C++ location:: Two points in the source file
512 * Exposing the Location Classes:: Using the Bison location class in your
514 * User Defined Location Type:: Required interface for locations
516 A Complete C++ Example
518 * Calc++ --- C++ Calculator:: The specifications
519 * Calc++ Parsing Driver:: An active parsing context
520 * Calc++ Parser:: A parser class
521 * Calc++ Scanner:: A pure C++ Flex scanner
522 * Calc++ Top Level:: Conducting the band
526 * D Bison Interface:: Asking for D parser generation
527 * D Semantic Values:: %token and %nterm vs. D
528 * D Location Values:: The position and location classes
529 * D Parser Interface:: Instantiating and running the parser
530 * D Parser Context Interface:: Circumstances of a syntax error
531 * D Scanner Interface:: Specifying the scanner for the parser
532 * D Action Features:: Special features for use in actions
533 * D Push Parser Interface:: Instantiating and running the push parser
534 * D Complete Symbols:: Using token constructors
538 * Java Bison Interface:: Asking for Java parser generation
539 * Java Semantic Values:: %token and %nterm vs. Java
540 * Java Location Values:: The position and location classes
541 * Java Parser Interface:: Instantiating and running the parser
542 * Java Parser Context Interface:: Circumstances of a syntax error
543 * Java Scanner Interface:: Specifying the scanner for the parser
544 * Java Action Features:: Special features for use in actions
545 * Java Push Parser Interface:: Instantiating and running the push parser
546 * Java Differences:: Differences between C/C++ and Java Grammars
547 * Java Declarations Summary:: List of Bison declarations used with Java
549 A Brief History of the Greater Ungulates
551 * Yacc:: The original Yacc
552 * yacchack:: An obscure early implementation of reentrancy
553 * Byacc:: Berkeley Yacc
554 * Bison:: This program
555 * Other Ungulates:: Similar programs
557 Bison Version Compatibility
559 * Versioning:: Dealing with Bison versioning
561 Frequently Asked Questions
563 * Memory Exhausted:: Breaking the Stack Limits
564 * How Can I Reset the Parser:: @code{yyparse} Keeps some State
565 * Strings are Destroyed:: @code{yylval} Loses Track of Strings
566 * Implementing Gotos/Loops:: Control Flow in the Calculator
567 * Multiple start-symbols:: Factoring closely related grammars
568 * Enabling Relocatability:: Moving Bison/using it through network shares
569 * Secure? Conform?:: Is Bison POSIX safe?
570 * I can't build Bison:: Troubleshooting
571 * Where can I find help?:: Troubleshouting
572 * Bug Reports:: Troublereporting
573 * More Languages:: Parsers in C++, Java, and so on
574 * Beta Testing:: Experimenting development versions
575 * Mailing Lists:: Meeting other Bison users
579 * GNU Free Documentation License:: Copying and sharing this manual
585 @unnumbered Introduction
588 @dfn{Bison} is a general-purpose parser generator that converts an annotated
589 context-free grammar into a deterministic LR or generalized LR (GLR) parser
590 employing LALR(1), IELR(1) or canonical LR(1) parser tables. Once you are
591 proficient with Bison, you can use it to develop a wide range of language
592 parsers, from those used in simple desk calculators to complex programming
595 Bison is upward compatible with Yacc: all properly-written Yacc grammars
596 ought to work with Bison with no change. Anyone familiar with Yacc should
597 be able to use Bison with little trouble. You need to be fluent in C, C++,
598 D or Java programming in order to use Bison or to understand this manual.
600 We begin with tutorial chapters that explain the basic concepts of
601 using Bison and show three explained examples, each building on the
602 last. If you don't know Bison or Yacc, start by reading these
603 chapters. Reference chapters follow, which describe specific aspects
606 Bison was written originally by Robert Corbett. Richard Stallman made
607 it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University
608 added multi-character string literals and other features. Since then,
609 Bison has grown more robust and evolved many other new features thanks
610 to the hard work of a long list of volunteers. For details, see the
611 @file{THANKS} and @file{ChangeLog} files included in the Bison
614 This edition corresponds to version @value{VERSION} of Bison.
617 @unnumbered Conditions for Using Bison
619 The distribution terms for Bison-generated parsers permit using the parsers
620 in nonfree programs. Before Bison version 2.2, these extra permissions
621 applied only when Bison was generating LALR(1) parsers in C@. And before
622 Bison version 1.24, Bison-generated parsers could be used only in programs
623 that were free software.
625 The other GNU programming tools, such as the GNU C compiler, have never had
626 such a requirement. They could always be used for nonfree software. The
627 reason Bison was different was not due to a special policy decision; it
628 resulted from applying the usual General Public License to all of the Bison
631 The main output of the Bison utility---the Bison parser implementation
632 file---contains a verbatim copy of a sizable piece of Bison, which is the
633 code for the parser's implementation. (The actions from your grammar are
634 inserted into this implementation at one point, but most of the rest of the
635 implementation is not changed.) When we applied the GPL terms to the
636 skeleton code for the parser's implementation, the effect was to restrict
637 the use of Bison output to free software.
639 We didn't change the terms because of sympathy for people who want to make
640 software proprietary. @strong{Software should be free.} But we concluded
641 that limiting Bison's use to free software was doing little to encourage
642 people to make other software free. So we decided to make the practical
643 conditions for using Bison match the practical conditions for using the
646 This exception applies when Bison is generating code for a parser. You can
647 tell whether the exception applies to a Bison output file by inspecting the
648 file for text beginning with ``As a special exception@dots{}''. The text
649 spells out the exact terms of the exception.
652 @unnumbered GNU GENERAL PUBLIC LICENSE
653 @include gpl-3.0.texi
656 @chapter The Concepts of Bison
658 This chapter introduces many of the basic concepts without which the details
659 of Bison will not make sense. If you do not already know how to use Bison
660 or Yacc, we suggest you start by reading this chapter carefully.
663 * Language and Grammar:: Languages and context-free grammars,
664 as mathematical ideas.
665 * Grammar in Bison:: How we represent grammars for Bison's sake.
666 * Semantic Values:: Each token or syntactic grouping can have
667 a semantic value (the value of an integer,
668 the name of an identifier, etc.).
669 * Semantic Actions:: Each rule can have an action containing C code.
670 * GLR Parsers:: Writing parsers for general context-free languages.
671 * Locations:: Overview of location tracking.
672 * Bison Parser:: What are Bison's input and output,
673 how is the output used?
674 * Stages:: Stages in writing and running Bison grammars.
675 * Grammar Layout:: Overall structure of a Bison grammar file.
678 @node Language and Grammar
679 @section Languages and Context-Free Grammars
681 @cindex context-free grammar
682 @cindex grammar, context-free
683 In order for Bison to parse a language, it must be described by a
684 @dfn{context-free grammar}. This means that you specify one or more
685 @dfn{syntactic groupings} and give rules for constructing them from their
686 parts. For example, in the C language, one kind of grouping is called an
687 `expression'. One rule for making an expression might be, ``An expression
688 can be made of a minus sign and another expression''. Another would be,
689 ``An expression can be an integer''. As you can see, rules are often
690 recursive, but there must be at least one rule which leads out of the
694 @cindex Backus-Naur form
695 The most common formal system for presenting such rules for humans to read
696 is @dfn{Backus-Naur Form} or ``BNF'', which was developed in
697 order to specify the language Algol 60. Any grammar expressed in
698 BNF is a context-free grammar. The input to Bison is
699 essentially machine-readable BNF.
701 @cindex LALR grammars
702 @cindex IELR grammars
704 There are various important subclasses of context-free grammars. Although
705 it can handle almost all context-free grammars, Bison is optimized for what
706 are called LR(1) grammars. In brief, in these grammars, it must be possible
707 to tell how to parse any portion of an input string with just a single token
708 of lookahead. For historical reasons, Bison by default is limited by the
709 additional restrictions of LALR(1), which is hard to explain simply.
710 @xref{Mysterious Conflicts}, for more information on this. You can escape
711 these additional restrictions by requesting IELR(1) or canonical LR(1)
712 parser tables. @xref{LR Table Construction}, to learn how.
715 @cindex generalized LR (GLR) parsing
716 @cindex ambiguous grammars
717 @cindex nondeterministic parsing
719 Parsers for LR(1) grammars are @dfn{deterministic}, meaning
720 roughly that the next grammar rule to apply at any point in the input is
721 uniquely determined by the preceding input and a fixed, finite portion
722 (called a @dfn{lookahead}) of the remaining input. A context-free
723 grammar can be @dfn{ambiguous}, meaning that there are multiple ways to
724 apply the grammar rules to get the same inputs. Even unambiguous
725 grammars can be @dfn{nondeterministic}, meaning that no fixed
726 lookahead always suffices to determine the next grammar rule to apply.
727 With the proper declarations, Bison is also able to parse these more
728 general context-free grammars, using a technique known as GLR
729 parsing (for Generalized LR). Bison's GLR parsers
730 are able to handle any context-free grammar for which the number of
731 possible parses of any given string is finite.
733 @cindex symbols (abstract)
735 @cindex syntactic grouping
736 @cindex grouping, syntactic
737 In the formal grammatical rules for a language, each kind of syntactic unit
738 or grouping is named by a @dfn{symbol}. Those which are built by grouping
739 smaller constructs according to grammatical rules are called
740 @dfn{nonterminal symbols}; those which can't be subdivided are called
741 @dfn{terminal symbols} or @dfn{token kinds}. We call a piece of input
742 corresponding to a single terminal symbol a @dfn{token}, and a piece
743 corresponding to a single nonterminal symbol a @dfn{grouping}.
745 We can use the C language as an example of what symbols, terminal and
746 nonterminal, mean. The tokens of C are identifiers, constants (numeric
747 and string), and the various keywords, arithmetic operators and
748 punctuation marks. So the terminal symbols of a grammar for C include
749 `identifier', `number', `string', plus one symbol for each keyword,
750 operator or punctuation mark: `if', `return', `const', `static', `int',
751 `char', `plus-sign', `open-brace', `close-brace', `comma' and many more.
752 (These tokens can be subdivided into characters, but that is a matter of
753 lexicography, not grammar.)
755 Here is a simple C function subdivided into tokens:
758 int /* @r{keyword `int'} */
759 square (int x) /* @r{identifier, open-paren, keyword `int',}
760 @r{identifier, close-paren} */
761 @{ /* @r{open-brace} */
762 return x * x; /* @r{keyword `return', identifier, asterisk,}
763 @r{identifier, semicolon} */
764 @} /* @r{close-brace} */
767 The syntactic groupings of C include the expression, the statement, the
768 declaration, and the function definition. These are represented in the
769 grammar of C by nonterminal symbols `expression', `statement',
770 `declaration' and `function definition'. The full grammar uses dozens of
771 additional language constructs, each with its own nonterminal symbol, in
772 order to express the meanings of these four. The example above is a
773 function definition; it contains one declaration, and one statement. In
774 the statement, each @samp{x} is an expression and so is @samp{x * x}.
776 Each nonterminal symbol must have grammatical rules showing how it is made
777 out of simpler constructs. For example, one kind of C statement is the
778 @code{return} statement; this would be described with a grammar rule which
779 reads informally as follows:
782 A `statement' can be made of a `return' keyword, an `expression' and a
787 There would be many other rules for `statement', one for each kind of
791 One nonterminal symbol must be distinguished as the special one which
792 defines a complete utterance in the language. It is called the @dfn{start
793 symbol}. In a compiler, this means a complete input program. In the C
794 language, the nonterminal symbol `sequence of definitions and declarations'
797 For example, @samp{1 + 2} is a valid C expression---a valid part of a C
798 program---but it is not valid as an @emph{entire} C program. In the
799 context-free grammar of C, this follows from the fact that `expression' is
800 not the start symbol.
802 The Bison parser reads a sequence of tokens as its input, and groups the
803 tokens using the grammar rules. If the input is valid, the end result is
804 that the entire token sequence reduces to a single grouping whose symbol is
805 the grammar's start symbol. If we use a grammar for C, the entire input
806 must be a `sequence of definitions and declarations'. If not, the parser
807 reports a syntax error.
809 @node Grammar in Bison
810 @section From Formal Rules to Bison Input
811 @cindex Bison grammar
812 @cindex grammar, Bison
813 @cindex formal grammar
815 A formal grammar is a mathematical construct. To define the language
816 for Bison, you must write a file expressing the grammar in Bison syntax:
817 a @dfn{Bison grammar} file. @xref{Grammar File}.
819 A nonterminal symbol in the formal grammar is represented in Bison input
820 as an identifier, like an identifier in C@. By convention, it should be
821 in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
823 The Bison representation for a terminal symbol is also called a @dfn{token
824 kind}. Token kinds as well can be represented as C-like identifiers. By
825 convention, these identifiers should be upper case to distinguish them from
826 nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
827 @code{RETURN}. A terminal symbol that stands for a particular keyword in
828 the language should be named after that keyword converted to upper case.
829 The terminal symbol @code{error} is reserved for error recovery.
832 A terminal symbol can also be represented as a character literal, just like
833 a C character constant. You should do this whenever a token is just a
834 single character (parenthesis, plus-sign, etc.): use that same character in
835 a literal as the terminal symbol for that token.
837 A third way to represent a terminal symbol is with a C string constant
838 containing several characters. @xref{Symbols}, for more information.
840 The grammar rules also have an expression in Bison syntax. For example,
841 here is the Bison rule for a C @code{return} statement. The semicolon in
842 quotes is a literal character token, representing part of the C syntax for
843 the statement; the naked semicolon, and the colon, are Bison punctuation
847 stmt: RETURN expr ';' ;
853 @node Semantic Values
854 @section Semantic Values
855 @cindex semantic value
856 @cindex value, semantic
858 A formal grammar selects tokens only by their classifications: for example,
859 if a rule mentions the terminal symbol `integer constant', it means that
860 @emph{any} integer constant is grammatically valid in that position. The
861 precise value of the constant is irrelevant to how to parse the input: if
862 @samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
865 But the precise value is very important for what the input means once it is
866 parsed. A compiler is useless if it fails to distinguish between 4, 1 and
867 3989 as constants in the program! Therefore, each token in a Bison grammar
868 has both a token kind and a @dfn{semantic value}. @xref{Semantics}, for
871 The token kind is a terminal symbol defined in the grammar, such as
872 @code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything you
873 need to know to decide where the token may validly appear and how to group
874 it with other tokens. The grammar rules know nothing about tokens except
877 The semantic value has all the rest of the information about the
878 meaning of the token, such as the value of an integer, or the name of an
879 identifier. (A token such as @code{','} which is just punctuation doesn't
880 need to have any semantic value.)
882 For example, an input token might be classified as token kind @code{INTEGER}
883 and have the semantic value 4. Another input token might have the same
884 token kind @code{INTEGER} but value 3989. When a grammar rule says that
885 @code{INTEGER} is allowed, either of these tokens is acceptable because each
886 is an @code{INTEGER}. When the parser accepts the token, it keeps track of
887 the token's semantic value.
889 Each grouping can also have a semantic value as well as its nonterminal
890 symbol. For example, in a calculator, an expression typically has a
891 semantic value that is a number. In a compiler for a programming
892 language, an expression typically has a semantic value that is a tree
893 structure describing the meaning of the expression.
895 @node Semantic Actions
896 @section Semantic Actions
897 @cindex semantic actions
898 @cindex actions, semantic
900 In order to be useful, a program must do more than parse input; it must
901 also produce some output based on the input. In a Bison grammar, a grammar
902 rule can have an @dfn{action} made up of C statements. Each time the
903 parser recognizes a match for that rule, the action is executed.
906 Most of the time, the purpose of an action is to compute the semantic value
907 of the whole construct from the semantic values of its parts. For example,
908 suppose we have a rule which says an expression can be the sum of two
909 expressions. When the parser recognizes such a sum, each of the
910 subexpressions has a semantic value which describes how it was built up.
911 The action for this rule should create a similar sort of value for the
912 newly recognized larger expression.
914 For example, here is a rule that says an expression can be the sum of
918 expr: expr '+' expr @{ $$ = $1 + $3; @} ;
922 The action says how to produce the semantic value of the sum expression
923 from the values of the two subexpressions.
926 @section Writing GLR Parsers
928 @cindex generalized LR (GLR) parsing
931 @cindex shift/reduce conflicts
932 @cindex reduce/reduce conflicts
934 In some grammars, Bison's deterministic
935 LR(1) parsing algorithm cannot decide whether to apply a
936 certain grammar rule at a given point. That is, it may not be able to
937 decide (on the basis of the input read so far) which of two possible
938 reductions (applications of a grammar rule) applies, or whether to apply
939 a reduction or read more of the input and apply a reduction later in the
940 input. These are known respectively as @dfn{reduce/reduce} conflicts
941 (@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts
942 (@pxref{Shift/Reduce}).
944 To use a grammar that is not easily modified to be LR(1), a more general
945 parsing algorithm is sometimes necessary. If you include @code{%glr-parser}
946 among the Bison declarations in your file (@pxref{Grammar Outline}), the
947 result is a Generalized LR (GLR) parser. These parsers handle Bison
948 grammars that contain no unresolved conflicts (i.e., after applying
949 precedence declarations) identically to deterministic parsers. However,
950 when faced with unresolved shift/reduce and reduce/reduce conflicts, GLR
951 parsers use the simple expedient of doing both, effectively cloning the
952 parser to follow both possibilities. Each of the resulting parsers can
953 again split, so that at any given time, there can be any number of possible
954 parses being explored. The parsers proceed in lockstep; that is, all of
955 them consume (shift) a given input symbol before any of them proceed to the
956 next. Each of the cloned parsers eventually meets one of two possible
957 fates: either it runs into a parsing error, in which case it simply
958 vanishes, or it merges with another parser, because the two of them have
959 reduced the input to an identical set of symbols.
961 During the time that there are multiple parsers, semantic actions are
962 recorded, but not performed. When a parser disappears, its recorded
963 semantic actions disappear as well, and are never performed. When a
964 reduction makes two parsers identical, causing them to merge, Bison records
965 both sets of semantic actions. Whenever the last two parsers merge,
966 reverting to the single-parser case, Bison resolves all the outstanding
967 actions either by precedences given to the grammar rules involved, or by
968 performing both actions, and then calling a designated user-defined function
969 on the resulting values to produce an arbitrary merged result.
972 * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.
973 * Merging GLR Parses:: Using GLR parsers to resolve ambiguities.
974 * GLR Semantic Actions:: Considerations for semantic values and deferred actions.
975 * Semantic Predicates:: Controlling a parse with arbitrary computations.
978 @node Simple GLR Parsers
979 @subsection Using GLR on Unambiguous Grammars
980 @cindex GLR parsing, unambiguous grammars
981 @cindex generalized LR (GLR) parsing, unambiguous grammars
985 @cindex reduce/reduce conflicts
986 @cindex shift/reduce conflicts
988 In the simplest cases, you can use the GLR algorithm
989 to parse grammars that are unambiguous but fail to be LR(1).
990 Such grammars typically require more than one symbol of lookahead.
992 Consider a problem that
993 arises in the declaration of enumerated and subrange types in the
994 programming language Pascal. Here are some examples:
997 type subrange = lo .. hi;
998 type enum = (a, b, c);
1002 The original language standard allows only numeric literals and constant
1003 identifiers for the subrange bounds (@samp{lo} and @samp{hi}), but Extended
1004 Pascal (ISO/IEC 10206) and many other Pascal implementations allow arbitrary
1005 expressions there. This gives rise to the following situation, containing a
1006 superfluous pair of parentheses:
1009 type subrange = (a) .. b;
1013 Compare this to the following declaration of an enumerated
1014 type with only one value:
1021 (These declarations are contrived, but they are syntactically valid, and
1022 more-complicated cases can come up in practical programs.)
1024 These two declarations look identical until the @samp{..} token. With
1025 normal LR(1) one-token lookahead it is not possible to decide between the
1026 two forms when the identifier @samp{a} is parsed. It is, however, desirable
1027 for a parser to decide this, since in the latter case @samp{a} must become a
1028 new identifier to represent the enumeration value, while in the former case
1029 @samp{a} must be evaluated with its current meaning, which may be a constant
1030 or even a function call.
1032 You could parse @samp{(a)} as an ``unspecified identifier in parentheses'',
1033 to be resolved later, but this typically requires substantial contortions in
1034 both semantic actions and large parts of the grammar, where the parentheses
1035 are nested in the recursive rules for expressions.
1037 You might think of using the lexer to distinguish between the two forms by
1038 returning different tokens for currently defined and undefined identifiers.
1039 But if these declarations occur in a local scope, and @samp{a} is defined in
1040 an outer scope, then both forms are possible---either locally redefining
1041 @samp{a}, or using the value of @samp{a} from the outer scope. So this
1042 approach cannot work.
1044 A simple solution to this problem is to declare the parser to use the GLR
1045 algorithm. When the GLR parser reaches the critical state, it merely splits
1046 into two branches and pursues both syntax rules simultaneously. Sooner or
1047 later, one of them runs into a parsing error. If there is a @samp{..} token
1048 before the next @samp{;}, the rule for enumerated types fails since it
1049 cannot accept @samp{..} anywhere; otherwise, the subrange type rule fails
1050 since it requires a @samp{..} token. So one of the branches fails silently,
1051 and the other one continues normally, performing all the intermediate
1052 actions that were postponed during the split.
1054 If the input is syntactically incorrect, both branches fail and the parser
1055 reports a syntax error as usual.
1057 The effect of all this is that the parser seems to ``guess'' the correct
1058 branch to take, or in other words, it seems to use more lookahead than the
1059 underlying LR(1) algorithm actually allows for. In this example, LR(2)
1060 would suffice, but also some cases that are not LR(@math{k}) for any
1061 @math{k} can be handled this way.
1063 In general, a GLR parser can take quadratic or cubic worst-case time, and
1064 the current Bison parser even takes exponential time and space for some
1065 grammars. In practice, this rarely happens, and for many grammars it is
1066 possible to prove that it cannot happen. The present example contains only
1067 one conflict between two rules, and the type-declaration context containing
1068 the conflict cannot be nested. So the number of branches that can exist at
1069 any time is limited by the constant 2, and the parsing time is still linear.
1071 Here is a Bison grammar corresponding to the example above. It
1072 parses a vastly simplified form of Pascal type declarations.
1075 %token TYPE DOTDOT ID
1083 type_decl: TYPE ID '=' type ';' ;
1111 When used as a normal LR(1) grammar, Bison correctly complains
1112 about one reduce/reduce conflict. In the conflicting situation the
1113 parser chooses one of the alternatives, arbitrarily the one
1114 declared first. Therefore the following correct input is not
1121 The parser can be turned into a GLR parser, while also telling Bison
1122 to be silent about the one known reduce/reduce conflict, by adding
1123 these two declarations to the Bison grammar file (before the first
1132 No change in the grammar itself is required. Now the parser recognizes all
1133 valid declarations, according to the limited syntax above, transparently.
1134 In fact, the user does not even notice when the parser splits.
1136 So here we have a case where we can use the benefits of GLR, almost without
1137 disadvantages. Even in simple cases like this, however, there are at least
1138 two potential problems to beware. First, always analyze the conflicts
1139 reported by Bison to make sure that GLR splitting is only done where it is
1140 intended. A GLR parser splitting inadvertently may cause problems less
1141 obvious than an LR parser statically choosing the wrong alternative in a
1142 conflict. Second, consider interactions with the lexer (@pxref{Semantic
1143 Tokens}) with great care. Since a split parser consumes tokens without
1144 performing any actions during the split, the lexer cannot obtain information
1145 via parser actions. Some cases of lexer interactions can be eliminated by
1146 using GLR to shift the complications from the lexer to the parser. You must
1147 check the remaining cases for correctness.
1149 In our example, it would be safe for the lexer to return tokens based on
1150 their current meanings in some symbol table, because no new symbols are
1151 defined in the middle of a type declaration. Though it is possible for a
1152 parser to define the enumeration constants as they are parsed, before the
1153 type declaration is completed, it actually makes no difference since they
1154 cannot be used within the same enumerated type declaration.
1156 @node Merging GLR Parses
1157 @subsection Using GLR to Resolve Ambiguities
1158 @cindex GLR parsing, ambiguous grammars
1159 @cindex generalized LR (GLR) parsing, ambiguous grammars
1163 @cindex reduce/reduce conflicts
1165 Let's consider an example, vastly simplified from a C++
1166 grammar.@footnote{The sources of an extended version of this example are
1167 available in C as @file{examples/c/glr}, and in C++ as
1168 @file{examples/c++/glr}.}
1174 void yyerror (char const *);
1177 %define api.value.type @{char const *@}
1190 | prog stmt @{ printf ("\n"); @}
1199 ID @{ printf ("%s ", $$); @}
1200 | TYPENAME '(' expr ')'
1201 @{ printf ("%s <cast> ", $1); @}
1202 | expr '+' expr @{ printf ("+ "); @}
1203 | expr '=' expr @{ printf ("= "); @}
1207 TYPENAME declarator ';'
1208 @{ printf ("%s <declare> ", $1); @}
1209 | TYPENAME declarator '=' expr ';'
1210 @{ printf ("%s <init-declare> ", $1); @}
1214 ID @{ printf ("\"%s\" ", $1); @}
1215 | '(' declarator ')'
1220 This models a problematic part of the C++ grammar---the ambiguity between
1221 certain declarations and statements. For example,
1228 parses as either an @code{expr} or a @code{stmt}
1229 (assuming that @samp{T} is recognized as a @code{TYPENAME} and
1230 @samp{x} as an @code{ID}).
1231 Bison detects this as a reduce/reduce conflict between the rules
1232 @code{expr : ID} and @code{declarator : ID}, which it cannot resolve at the
1233 time it encounters @code{x} in the example above. Since this is a
1234 GLR parser, it therefore splits the problem into two parses, one for
1235 each choice of resolving the reduce/reduce conflict.
1236 Unlike the example from the previous section (@pxref{Simple GLR Parsers}),
1237 however, neither of these parses ``dies,'' because the grammar as it stands is
1238 ambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} and
1239 the other reduces @code{stmt : decl}, after which both parsers are in an
1240 identical state: they've seen @samp{prog stmt} and have the same unprocessed
1241 input remaining. We say that these parses have @dfn{merged.}
1243 At this point, the GLR parser requires a specification in the
1244 grammar of how to choose between the competing parses.
1245 In the example above, the two @code{%dprec}
1246 declarations specify that Bison is to give precedence
1247 to the parse that interprets the example as a
1248 @code{decl}, which implies that @code{x} is a declarator.
1249 The parser therefore prints
1252 "x" y z + T <init-declare>
1255 The @code{%dprec} declarations only come into play when more than one
1256 parse survives. Consider a different input string for this parser:
1263 This is another example of using GLR to parse an unambiguous
1264 construct, as shown in the previous section (@pxref{Simple GLR Parsers}).
1265 Here, there is no ambiguity (this cannot be parsed as a declaration).
1266 However, at the time the Bison parser encounters @code{x}, it does not
1267 have enough information to resolve the reduce/reduce conflict (again,
1268 between @code{x} as an @code{expr} or a @code{declarator}). In this
1269 case, no precedence declaration is used. Again, the parser splits
1270 into two, one assuming that @code{x} is an @code{expr}, and the other
1271 assuming @code{x} is a @code{declarator}. The second of these parsers
1272 then vanishes when it sees @code{+}, and the parser prints
1278 Suppose that instead of resolving the ambiguity, you wanted to see all
1279 the possibilities. For this purpose, you must merge the semantic
1280 actions of the two possible parsers, rather than choosing one over the
1281 other. To do so, you could change the declaration of @code{stmt} as
1286 expr ';' %merge <stmt_merge>
1287 | decl %merge <stmt_merge>
1292 and define the @code{stmt_merge} function as:
1296 stmt_merge (YYSTYPE x0, YYSTYPE x1)
1304 with an accompanying forward declaration
1305 in the C declarations at the beginning of the file:
1309 static YYSTYPE stmt_merge (YYSTYPE x0, YYSTYPE x1);
1314 With these declarations, the resulting parser parses the first example
1315 as both an @code{expr} and a @code{decl}, and prints
1318 "x" y z + T <init-declare> x T <cast> y z + = <OR>
1321 Bison requires that all of the
1322 productions that participate in any particular merge have identical
1323 @samp{%merge} clauses. Otherwise, the ambiguity would be unresolvable,
1324 and the parser will report an error during any parse that results in
1325 the offending merge.
1329 The signature of the merger depends on the type of the symbol. In the
1330 previous example, the merged-to symbol (@code{stmt}) does not have a
1331 specific type, and the merger is
1334 YYSTYPE stmt_merge (YYSTYPE x0, YYSTYPE x1);
1338 However, if @code{stmt} had a declared type, e.g.,
1341 %type <Node *> stmt;
1358 then the prototype of the merger must be:
1361 Node *stmt_merge (YYSTYPE x0, YYSTYPE x1);
1365 (This signature might be a mistake originally, and maybe it should have been
1366 @samp{Node *stmt_merge (Node *x0, Node *x1)}. If you have an opinion about
1367 it, please let us know.)
1369 @node GLR Semantic Actions
1370 @subsection GLR Semantic Actions
1372 The nature of GLR parsing and the structure of the generated
1373 parsers give rise to certain restrictions on semantic values and actions.
1375 @subsubsection Deferred semantic actions
1376 @cindex deferred semantic actions
1377 By definition, a deferred semantic action is not performed at the same time as
1378 the associated reduction.
1379 This raises caveats for several Bison features you might use in a semantic
1380 action in a GLR parser.
1383 @cindex GLR parsers and @code{yychar}
1385 @cindex GLR parsers and @code{yylval}
1387 @cindex GLR parsers and @code{yylloc}
1388 In any semantic action, you can examine @code{yychar} to determine the kind
1389 of the lookahead token present at the time of the associated reduction.
1390 After checking that @code{yychar} is not set to @code{YYEMPTY} or
1391 @code{YYEOF}, you can then examine @code{yylval} and @code{yylloc} to
1392 determine the lookahead token's semantic value and location, if any. In a
1393 nondeferred semantic action, you can also modify any of these variables to
1394 influence syntax analysis. @xref{Lookahead}.
1397 @cindex GLR parsers and @code{yyclearin}
1398 In a deferred semantic action, it's too late to influence syntax analysis.
1399 In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to
1400 shallow copies of the values they had at the time of the associated reduction.
1401 For this reason alone, modifying them is dangerous.
1402 Moreover, the result of modifying them is undefined and subject to change with
1403 future versions of Bison.
1404 For example, if a semantic action might be deferred, you should never write it
1405 to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free
1406 memory referenced by @code{yylval}.
1408 @subsubsection YYERROR
1410 @cindex GLR parsers and @code{YYERROR}
1411 Another Bison feature requiring special consideration is @code{YYERROR}
1412 (@pxref{Action Features}), which you can invoke in a semantic action to
1413 initiate error recovery.
1414 During deterministic GLR operation, the effect of @code{YYERROR} is
1415 the same as its effect in a deterministic parser.
1416 The effect in a deferred action is similar, but the precise point of the
1417 error is undefined; instead, the parser reverts to deterministic operation,
1418 selecting an unspecified stack on which to continue with a syntax error.
1419 In a semantic predicate (see @ref{Semantic Predicates}) during nondeterministic
1420 parsing, @code{YYERROR} silently prunes
1421 the parse that invoked the test.
1423 @subsubsection Restrictions on semantic values and locations
1424 GLR parsers require that you use POD (Plain Old Data) types for
1425 semantic values and location types when using the generated parsers as
1428 @node Semantic Predicates
1429 @subsection Controlling a Parse with Arbitrary Predicates
1431 @cindex Semantic predicates in GLR parsers
1433 In addition to the @code{%dprec} and @code{%merge} directives,
1435 allow you to reject parses on the basis of arbitrary computations executed
1436 in user code, without having Bison treat this rejection as an error
1437 if there are alternative parses. For example,
1441 %?@{ new_syntax @} "widget" id new_args @{ $$ = f($3, $4); @}
1442 | %?@{ !new_syntax @} "widget" id old_args @{ $$ = f($3, $4); @}
1447 is one way to allow the same parser to handle two different syntaxes for
1448 widgets. The clause preceded by @code{%?} is treated like an ordinary
1449 midrule action, except that its text is handled as an expression and is always
1450 evaluated immediately (even when in nondeterministic mode). If the
1451 expression yields 0 (false), the clause is treated as a syntax error,
1452 which, in a nondeterministic parser, causes the stack in which it is reduced
1453 to die. In a deterministic parser, it acts like @code{YYERROR}.
1455 As the example shows, predicates otherwise look like semantic actions, and
1456 therefore you must take them into account when determining the numbers
1457 to use for denoting the semantic values of right-hand side symbols.
1458 Predicate actions, however, have no defined value, and may not be given
1461 There is a subtle difference between semantic predicates and ordinary
1462 actions in nondeterministic mode, since the latter are deferred.
1463 For example, we could try to rewrite the previous example as
1467 @{ if (!new_syntax) YYERROR; @}
1468 "widget" id new_args @{ $$ = f($3, $4); @}
1469 | @{ if (new_syntax) YYERROR; @}
1470 "widget" id old_args @{ $$ = f($3, $4); @}
1475 (reversing the sense of the predicate tests to cause an error when they are
1476 false). However, this
1477 does @emph{not} have the same effect if @code{new_args} and @code{old_args}
1478 have overlapping syntax.
1479 Since the midrule actions testing @code{new_syntax} are deferred,
1480 a GLR parser first encounters the unresolved ambiguous reduction
1481 for cases where @code{new_args} and @code{old_args} recognize the same string
1482 @emph{before} performing the tests of @code{new_syntax}. It therefore
1485 Finally, be careful in writing predicates: deferred actions have not been
1486 evaluated, so that using them in a predicate will have undefined effects.
1491 @cindex textual location
1492 @cindex location, textual
1494 Many applications, like interpreters or compilers, have to produce verbose
1495 and useful error messages. To achieve this, one must be able to keep track of
1496 the @dfn{textual location}, or @dfn{location}, of each syntactic construct.
1497 Bison provides a mechanism for handling these locations.
1499 Each token has a semantic value. In a similar fashion, each token has an
1500 associated location, but the type of locations is the same for all tokens
1501 and groupings. Moreover, the output parser is equipped with a default data
1502 structure for storing locations (@pxref{Tracking Locations}, for more
1505 Like semantic values, locations can be reached in actions using a dedicated
1506 set of constructs. In the example above, the location of the whole grouping
1507 is @code{@@$}, while the locations of the subexpressions are @code{@@1} and
1510 When a rule is matched, a default action is used to compute the semantic value
1511 of its left hand side (@pxref{Actions}). In the same way, another default
1512 action is used for locations. However, the action for locations is general
1513 enough for most cases, meaning there is usually no need to describe for each
1514 rule how @code{@@$} should be formed. When building a new location for a given
1515 grouping, the default behavior of the output parser is to take the beginning
1516 of the first symbol, and the end of the last symbol.
1519 @section Bison Output: the Parser Implementation File
1520 @cindex Bison parser
1521 @cindex Bison utility
1522 @cindex lexical analyzer, purpose
1525 When you run Bison, you give it a Bison grammar file as input. The
1526 most important output is a C source file that implements a parser for
1527 the language described by the grammar. This parser is called a
1528 @dfn{Bison parser}, and this file is called a @dfn{Bison parser
1529 implementation file}. Keep in mind that the Bison utility and the
1530 Bison parser are two distinct programs: the Bison utility is a program
1531 whose output is the Bison parser implementation file that becomes part
1534 The job of the Bison parser is to group tokens into groupings according to
1535 the grammar rules---for example, to build identifiers and operators into
1536 expressions. As it does this, it runs the actions for the grammar rules it
1539 The tokens come from a function called the @dfn{lexical analyzer} that
1540 you must supply in some fashion (such as by writing it in C). The Bison
1541 parser calls the lexical analyzer each time it wants a new token. It
1542 doesn't know what is ``inside'' the tokens (though their semantic values
1543 may reflect this). Typically the lexical analyzer makes the tokens by
1544 parsing characters of text, but Bison does not depend on this.
1547 The Bison parser implementation file is C code which defines a
1548 function named @code{yyparse} which implements that grammar. This
1549 function does not make a complete C program: you must supply some
1550 additional functions. One is the lexical analyzer. Another is an
1551 error-reporting function which the parser calls to report an error.
1552 In addition, a complete C program must start with a function called
1553 @code{main}; you have to provide this, and arrange for it to call
1554 @code{yyparse} or the parser will never run. @xref{Interface}.
1556 Aside from the token kind names and the symbols in the actions you
1557 write, all symbols defined in the Bison parser implementation file
1558 itself begin with @samp{yy} or @samp{YY}. This includes interface
1559 functions such as the lexical analyzer function @code{yylex}, the
1560 error reporting function @code{yyerror} and the parser function
1561 @code{yyparse} itself. This also includes numerous identifiers used
1562 for internal purposes. Therefore, you should avoid using C
1563 identifiers starting with @samp{yy} or @samp{YY} in the Bison grammar
1564 file except for the ones defined in this manual. Also, you should
1565 avoid using the C identifiers @samp{malloc} and @samp{free} for
1566 anything other than their usual meanings.
1568 In some cases the Bison parser implementation file includes system
1569 headers, and in those cases your code should respect the identifiers
1570 reserved by those headers. On some non-GNU hosts, @code{<limits.h>},
1571 @code{<stddef.h>}, @code{<stdint.h>} (if available), and @code{<stdlib.h>}
1572 are included to declare memory allocators and integer types and constants.
1573 @code{<libintl.h>} is included if message translation is in use
1574 (@pxref{Internationalization}). Other system headers may be included
1575 if you define @code{YYDEBUG} (@pxref{Tracing}) or
1576 @code{YYSTACK_USE_ALLOCA} (@pxref{Table of Symbols}) to a nonzero value.
1579 @section Stages in Using Bison
1580 @cindex stages in using Bison
1583 The actual language-design process using Bison, from grammar specification
1584 to a working compiler or interpreter, has these parts:
1588 Formally specify the grammar in a form recognized by Bison
1589 (@pxref{Grammar File}). For each grammatical rule
1590 in the language, describe the action that is to be taken when an
1591 instance of that rule is recognized. The action is described by a
1592 sequence of C statements.
1595 Write a lexical analyzer to process input and pass tokens to the parser.
1596 The lexical analyzer may be written by hand in C (@pxref{Lexical}). It
1597 could also be produced using Lex, but the use of Lex is not discussed in
1601 Write a controlling function that calls the Bison-produced parser.
1604 Write error-reporting routines.
1607 To turn this source code as written into a runnable program, you
1608 must follow these steps:
1612 Run Bison on the grammar to produce the parser.
1615 Compile the code output by Bison, as well as any other source files.
1618 Link the object files to produce the finished product.
1621 @node Grammar Layout
1622 @section The Overall Layout of a Bison Grammar
1623 @cindex grammar file
1625 @cindex format of grammar file
1626 @cindex layout of Bison grammar
1628 The input file for the Bison utility is a @dfn{Bison grammar file}. The
1629 general form of a Bison grammar file is as follows:
1636 @var{Bison declarations}
1645 The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
1646 in every Bison grammar file to separate the sections.
1648 The prologue may define types and variables used in the actions. You can
1649 also use preprocessor commands to define macros used there, and use
1650 @code{#include} to include header files that do any of these things.
1651 You need to declare the lexical analyzer @code{yylex} and the error
1652 printer @code{yyerror} here, along with any other global identifiers
1653 used by the actions in the grammar rules.
1655 The Bison declarations declare the names of the terminal and nonterminal
1656 symbols, and may also describe operator precedence and the data types of
1657 semantic values of various symbols.
1659 The grammar rules define how to construct each nonterminal symbol from its
1662 The epilogue can contain any code you want to use. Often the
1663 definitions of functions declared in the prologue go here. In a
1664 simple program, all the rest of the program can go here.
1668 @cindex simple examples
1669 @cindex examples, simple
1671 Now we show and explain several sample programs written using Bison: a
1672 Reverse Polish Notation calculator, an algebraic (infix) notation
1673 calculator --- later extended to track ``locations'' ---
1674 and a multi-function calculator. All
1675 produce usable, though limited, interactive desk-top calculators.
1677 These examples are simple, but Bison grammars for real programming
1678 languages are written the same way. You can copy these examples into a
1679 source file to try them.
1683 Bison comes with several examples (including for the different target
1684 languages). If this package is properly installed, you shall find them in
1685 @file{@var{prefix}/share/doc/bison/examples}, where @var{prefix} is the root
1686 of the installation, probably something like @file{/usr/local} or
1690 * RPN Calc:: Reverse Polish Notation Calculator;
1691 a first example with no operator precedence.
1692 * Infix Calc:: Infix (algebraic) notation calculator.
1693 Operator precedence is introduced.
1694 * Simple Error Recovery:: Continuing after syntax errors.
1695 * Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
1696 * Multi-function Calc:: Calculator with memory and trig functions.
1697 It uses multiple data-types for semantic values.
1698 * Exercises:: Ideas for improving the multi-function calculator.
1702 @section Reverse Polish Notation Calculator
1703 @cindex Reverse Polish Notation
1704 @cindex @code{rpcalc}
1705 @cindex calculator, simple
1707 The first example@footnote{The sources of @command{rpcalc} are available as
1708 @file{examples/c/rpcalc}.} is that of a simple double-precision @dfn{Reverse
1710 Notation} calculator (a calculator using postfix operators). This example
1711 provides a good starting point, since operator precedence is not an issue.
1712 The second example will illustrate how operator precedence is handled.
1714 The source code for this calculator is named @file{rpcalc.y}. The
1715 @samp{.y} extension is a convention used for Bison grammar files.
1718 * Rpcalc Declarations:: Prologue (declarations) for rpcalc.
1719 * Rpcalc Rules:: Grammar Rules for rpcalc, with explanation.
1720 * Rpcalc Lexer:: The lexical analyzer.
1721 * Rpcalc Main:: The controlling function.
1722 * Rpcalc Error:: The error reporting function.
1723 * Rpcalc Generate:: Running Bison on the grammar file.
1724 * Rpcalc Compile:: Run the C compiler on the output code.
1727 @node Rpcalc Declarations
1728 @subsection Declarations for @code{rpcalc}
1730 Here are the C and Bison declarations for the Reverse Polish Notation
1731 calculator. As in C, comments are placed between @samp{/*@dots{}*/} or
1735 @comment file: c/rpcalc/rpcalc.y
1737 /* Parser for rpcalc. -*- C -*-
1739 Copyright (C) 1988-1993, 1995, 1998-2015, 2018-2021 Free Software
1742 This file is part of Bison, the GNU Compiler Compiler.
1744 This program is free software: you can redistribute it and/or modify
1745 it under the terms of the GNU General Public License as published by
1746 the Free Software Foundation, either version 3 of the License, or
1747 (at your option) any later version.
1749 This program is distributed in the hope that it will be useful,
1750 but WITHOUT ANY WARRANTY; without even the implied warranty of
1751 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
1752 GNU General Public License for more details.
1754 You should have received a copy of the GNU General Public License
1755 along with this program. If not, see <https://www.gnu.org/licenses/>. */
1759 @comment file: c/rpcalc/rpcalc.y
1761 /* Reverse Polish Notation calculator. */
1768 void yyerror (char const *);
1772 %define api.value.type @{double@}
1775 %% /* Grammar rules and actions follow. */
1778 The declarations section (@pxref{Prologue}) contains two
1779 preprocessor directives and two forward declarations.
1781 The @code{#include} directive is used to declare the exponentiation
1782 function @code{pow}.
1784 The forward declarations for @code{yylex} and @code{yyerror} are
1785 needed because the C language requires that functions be declared
1786 before they are used. These functions will be defined in the
1787 epilogue, but the parser calls them so they must be declared in the
1790 The second section, Bison declarations, provides information to Bison about
1791 the tokens and their types (@pxref{Bison Declarations}).
1793 The @code{%define} directive defines the variable @code{api.value.type},
1794 thus specifying the C data type for semantic values of both tokens and
1795 groupings (@pxref{Value Type}). The Bison
1796 parser will use whatever type @code{api.value.type} is defined as; if you
1797 don't define it, @code{int} is the default. Because we specify
1798 @samp{@{double@}}, each token and each expression has an associated value,
1799 which is a floating point number. C code can use @code{YYSTYPE} to refer to
1800 the value @code{api.value.type}.
1802 Each terminal symbol that is not a single-character literal must be
1803 declared. (Single-character literals normally don't need to be declared.)
1804 In this example, all the arithmetic operators are designated by
1805 single-character literals, so the only terminal symbol that needs to be
1806 declared is @code{NUM}, the token kind for numeric constants.
1809 @subsection Grammar Rules for @code{rpcalc}
1811 Here are the grammar rules for the Reverse Polish Notation calculator.
1813 @comment file: c/rpcalc/rpcalc.y
1825 | exp '\n' @{ printf ("%.10g\n", $1); @}
1832 | exp exp '+' @{ $$ = $1 + $2; @}
1833 | exp exp '-' @{ $$ = $1 - $2; @}
1834 | exp exp '*' @{ $$ = $1 * $2; @}
1835 | exp exp '/' @{ $$ = $1 / $2; @}
1836 | exp exp '^' @{ $$ = pow ($1, $2); @} /* Exponentiation */
1837 | exp 'n' @{ $$ = -$1; @} /* Unary minus */
1843 The groupings of the rpcalc ``language'' defined here are the expression
1844 (given the name @code{exp}), the line of input (@code{line}), and the
1845 complete input transcript (@code{input}). Each of these nonterminal
1846 symbols has several alternate rules, joined by the vertical bar @samp{|}
1847 which is read as ``or''. The following sections explain what these rules
1850 The semantics of the language is determined by the actions taken when a
1851 grouping is recognized. The actions are the C code that appears inside
1852 braces. @xref{Actions}.
1854 You must specify these actions in C, but Bison provides the means for
1855 passing semantic values between the rules. In each action, the
1856 pseudo-variable @code{$$} stands for the semantic value for the grouping
1857 that the rule is going to construct. Assigning a value to @code{$$} is the
1858 main job of most actions. The semantic values of the components of the
1859 rule are referred to as @code{$1}, @code{$2}, and so on.
1862 * Rpcalc Input:: Explanation of the @code{input} nonterminal
1863 * Rpcalc Line:: Explanation of the @code{line} nonterminal
1864 * Rpcalc Exp:: Explanation of the @code{exp} nonterminal
1868 @subsubsection Explanation of @code{input}
1870 Consider the definition of @code{input}:
1879 This definition reads as follows: ``A complete input is either an empty
1880 string, or a complete input followed by an input line''. Notice that
1881 ``complete input'' is defined in terms of itself. This definition is said
1882 to be @dfn{left recursive} since @code{input} appears always as the
1883 leftmost symbol in the sequence. @xref{Recursion}.
1885 The first alternative is empty because there are no symbols between the
1886 colon and the first @samp{|}; this means that @code{input} can match an
1887 empty string of input (no tokens). We write the rules this way because it
1888 is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
1889 It's conventional to put an empty alternative first and to use the
1890 (optional) @code{%empty} directive, or to write the comment @samp{/* empty
1891 */} in it (@pxref{Empty Rules}).
1893 The second alternate rule (@code{input line}) handles all nontrivial input.
1894 It means, ``After reading any number of lines, read one more line if
1895 possible.'' The left recursion makes this rule into a loop. Since the
1896 first alternative matches empty input, the loop can be executed zero or
1899 The parser function @code{yyparse} continues to process input until a
1900 grammatical error is seen or the lexical analyzer says there are no more
1901 input tokens; we will arrange for the latter to happen at end-of-input.
1904 @subsubsection Explanation of @code{line}
1906 Now consider the definition of @code{line}:
1911 | exp '\n' @{ printf ("%.10g\n", $1); @}
1915 The first alternative is a token which is a newline character; this means
1916 that rpcalc accepts a blank line (and ignores it, since there is no
1917 action). The second alternative is an expression followed by a newline.
1918 This is the alternative that makes rpcalc useful. The semantic value of
1919 the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
1920 question is the first symbol in the alternative. The action prints this
1921 value, which is the result of the computation the user asked for.
1923 This action is unusual because it does not assign a value to @code{$$}. As
1924 a consequence, the semantic value associated with the @code{line} is
1925 uninitialized (its value will be unpredictable). This would be a bug if
1926 that value were ever used, but we don't use it: once rpcalc has printed the
1927 value of the user's input line, that value is no longer needed.
1930 @subsubsection Explanation of @code{exp}
1932 The @code{exp} grouping has several rules, one for each kind of expression.
1933 The first rule handles the simplest expressions: those that are just
1934 numbers. The second handles an addition-expression, which looks like two
1935 expressions followed by a plus-sign. The third handles subtraction, and so
1941 | exp exp '+' @{ $$ = $1 + $2; @}
1942 | exp exp '-' @{ $$ = $1 - $2; @}
1947 We have used @samp{|} to join all the rules for @code{exp}, but we could
1948 equally well have written them separately:
1952 exp: exp exp '+' @{ $$ = $1 + $2; @};
1953 exp: exp exp '-' @{ $$ = $1 - $2; @};
1957 Most of the rules have actions that compute the value of the expression in
1958 terms of the value of its parts. For example, in the rule for addition,
1959 @code{$1} refers to the first component @code{exp} and @code{$2} refers to
1960 the second one. The third component, @code{'+'}, has no meaningful
1961 associated semantic value, but if it had one you could refer to it as
1962 @code{$3}. The first rule relies on the implicit default action: @samp{@{
1966 When @code{yyparse} recognizes a sum expression using this rule, the sum of
1967 the two subexpressions' values is produced as the value of the entire
1968 expression. @xref{Actions}.
1970 You don't have to give an action for every rule. When a rule has no action,
1971 Bison by default copies the value of @code{$1} into @code{$$}. This is what
1972 happens in the first rule (the one that uses @code{NUM}).
1974 The formatting shown here is the recommended convention, but Bison does not
1975 require it. You can add or change white space as much as you wish. For
1979 exp: NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ;
1983 means the same thing as this:
1988 | exp exp '+' @{ $$ = $1 + $2; @}
1994 The latter, however, is much more readable.
1997 @subsection The @code{rpcalc} Lexical Analyzer
1998 @cindex writing a lexical analyzer
1999 @cindex lexical analyzer, writing
2001 The lexical analyzer's job is low-level parsing: converting characters
2002 or sequences of characters into tokens. The Bison parser gets its
2003 tokens by calling the lexical analyzer. @xref{Lexical}.
2005 Only a simple lexical analyzer is needed for the RPN
2007 lexical analyzer skips blanks and tabs, then reads in numbers as
2008 @code{double} and returns them as @code{NUM} tokens. Any other character
2009 that isn't part of a number is a separate token. Note that the token-code
2010 for such a single-character token is the character itself.
2012 The return value of the lexical analyzer function is a numeric code which
2013 represents a token kind. The same text used in Bison rules to stand for
2014 this token kind is also a C expression for the numeric code of the kind.
2015 This works in two ways. If the token kind is a character literal, then its
2016 numeric code is that of the character; you can use the same character
2017 literal in the lexical analyzer to express the number. If the token kind is
2018 an identifier, that identifier is defined by Bison as a C enum whose
2019 definition is the appropriate code. In this example, therefore, @code{NUM}
2020 becomes an enum for @code{yylex} to use.
2022 The semantic value of the token (if it has one) is stored into the global
2023 variable @code{yylval}, which is where the Bison parser will look for it.
2024 (The C data type of @code{yylval} is @code{YYSTYPE}, whose value was defined
2025 at the beginning of the grammar via @samp{%define api.value.type
2026 @{double@}}; @pxref{Rpcalc Declarations}.)
2028 A token kind code of zero is returned if the end-of-input is encountered.
2029 (Bison recognizes any nonpositive value as indicating end-of-input.)
2031 Here is the code for the lexical analyzer:
2033 @comment file: c/rpcalc/rpcalc.y
2036 /* The lexical analyzer returns a double floating point
2037 number on the stack and the token NUM, or the numeric code
2038 of the character read if not a number. It skips all blanks
2039 and tabs, and returns 0 for end-of-input. */
2050 /* Skip white space. */
2051 while (c == ' ' || c == '\t')
2055 /* Process numbers. */
2056 if (c == '.' || isdigit (c))
2059 if (scanf ("%lf", &yylval) != 1)
2065 /* Return end-of-input. */
2068 /* Return a single char. */
2076 @subsection The Controlling Function
2077 @cindex controlling function
2078 @cindex main function in simple example
2080 In keeping with the spirit of this example, the controlling function is
2081 kept to the bare minimum. The only requirement is that it call
2082 @code{yyparse} to start the process of parsing.
2084 @comment file: c/rpcalc/rpcalc.y
2096 @subsection The Error Reporting Routine
2097 @cindex error reporting routine
2099 When @code{yyparse} detects a syntax error, it calls the error reporting
2100 function @code{yyerror} to print an error message (usually but not
2101 always @code{"syntax error"}). It is up to the programmer to supply
2102 @code{yyerror} (@pxref{Interface}), so
2103 here is the definition we will use:
2105 @comment file: c/rpcalc/rpcalc.y
2110 /* Called by yyparse on error. */
2112 yyerror (char const *s)
2114 fprintf (stderr, "%s\n", s);
2119 After @code{yyerror} returns, the Bison parser may recover from the error
2120 and continue parsing if the grammar contains a suitable error rule
2121 (@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We
2122 have not written any error rules in this example, so any invalid input will
2123 cause the calculator program to exit. This is not clean behavior for a
2124 real calculator, but it is adequate for the first example.
2126 @node Rpcalc Generate
2127 @subsection Running Bison to Make the Parser
2128 @cindex running Bison (introduction)
2130 Before running Bison to produce a parser, we need to decide how to
2131 arrange all the source code in one or more source files. For such a
2132 simple example, the easiest thing is to put everything in one file,
2133 the grammar file. The definitions of @code{yylex}, @code{yyerror} and
2134 @code{main} go at the end, in the epilogue of the grammar file
2135 (@pxref{Grammar Layout}).
2137 For a large project, you would probably have several source files, and use
2138 @code{make} to arrange to recompile them.
2140 With all the source in the grammar file, you use the following command
2141 to convert it into a parser implementation file:
2144 $ @kbd{bison @var{file}.y}
2148 In this example, the grammar file is called @file{rpcalc.y} (for
2149 ``Reverse Polish @sc{calc}ulator''). Bison produces a parser
2150 implementation file named @file{@var{file}.tab.c}, removing the
2151 @samp{.y} from the grammar file name. The parser implementation file
2152 contains the source code for @code{yyparse}. The additional functions
2153 in the grammar file (@code{yylex}, @code{yyerror} and @code{main}) are
2154 copied verbatim to the parser implementation file.
2156 @node Rpcalc Compile
2157 @subsection Compiling the Parser Implementation File
2158 @cindex compiling the parser
2160 Here is how to compile and run the parser implementation file:
2164 # @r{List files in current directory.}
2166 rpcalc.tab.c rpcalc.y
2170 # @r{Compile the Bison parser.}
2171 # @r{@option{-lm} tells compiler to search math library for @code{pow}.}
2172 $ @kbd{cc -lm -o rpcalc rpcalc.tab.c}
2176 # @r{List files again.}
2178 rpcalc rpcalc.tab.c rpcalc.y
2182 The file @file{rpcalc} now contains the executable code. Here is an
2183 example session using @code{rpcalc}.
2189 @kbd{3 7 + 3 4 5 *+-}
2191 @kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}}
2194 @result{} -3.166666667
2195 @kbd{3 4 ^} @r{Exponentiation}
2197 @kbd{^D} @r{End-of-file indicator}
2202 @section Infix Notation Calculator: @code{calc}
2203 @cindex infix notation calculator
2205 @cindex calculator, infix notation
2207 We now modify rpcalc to handle infix operators instead of
2208 postfix.@footnote{A similar example, but using an unambiguous grammar rather
2209 than precedence and associativity annotations, is available as
2210 @file{examples/c/calc}.} Infix
2211 notation involves the concept of operator precedence and the need for
2212 parentheses nested to arbitrary depth. Here is the Bison code for
2213 @file{calc.y}, an infix desk-top calculator.
2216 /* Infix notation calculator. */
2223 void yyerror (char const *);
2228 /* Bison declarations. */
2229 %define api.value.type @{double@}
2233 %precedence NEG /* negation--unary minus */
2234 %right '^' /* exponentiation */
2237 %% /* The grammar follows. */
2248 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
2255 | exp '+' exp @{ $$ = $1 + $3; @}
2256 | exp '-' exp @{ $$ = $1 - $3; @}
2257 | exp '*' exp @{ $$ = $1 * $3; @}
2258 | exp '/' exp @{ $$ = $1 / $3; @}
2259 | '-' exp %prec NEG @{ $$ = -$2; @}
2260 | exp '^' exp @{ $$ = pow ($1, $3); @}
2261 | '(' exp ')' @{ $$ = $2; @}
2268 The functions @code{yylex}, @code{yyerror} and @code{main} can be the
2271 There are two important new features shown in this code.
2273 In the second section (Bison declarations), @code{%left} declares token
2274 kinds and says they are left-associative operators. The declarations
2275 @code{%left} and @code{%right} (right associativity) take the place of
2276 @code{%token} which is used to declare a token kind name without
2277 associativity/precedence. (These tokens are single-character literals,
2278 which ordinarily don't need to be declared. We declare them here to specify
2279 the associativity/precedence.)
2281 Operator precedence is determined by the line ordering of the
2282 declarations; the higher the line number of the declaration (lower on
2283 the page or screen), the higher the precedence. Hence, exponentiation
2284 has the highest precedence, unary minus (@code{NEG}) is next, followed
2285 by @samp{*} and @samp{/}, and so on. Unary minus is not associative,
2286 only precedence matters (@code{%precedence}. @xref{Precedence}.
2288 The other important new feature is the @code{%prec} in the grammar
2289 section for the unary minus operator. The @code{%prec} simply instructs
2290 Bison that the rule @samp{| '-' exp} has the same precedence as
2291 @code{NEG}---in this case the next-to-highest. @xref{Contextual
2294 Here is a sample run of @file{calc.y}:
2299 @kbd{4 + 4.5 - (34/(8*3+-3))}
2307 @node Simple Error Recovery
2308 @section Simple Error Recovery
2309 @cindex error recovery, simple
2311 Up to this point, this manual has not addressed the issue of @dfn{error
2312 recovery}---how to continue parsing after the parser detects a syntax
2313 error. All we have handled is error reporting with @code{yyerror}.
2314 Recall that by default @code{yyparse} returns after calling
2315 @code{yyerror}. This means that an erroneous input line causes the
2316 calculator program to exit. Now we show how to rectify this deficiency.
2318 The Bison language itself includes the reserved word @code{error}, which
2319 may be included in the grammar rules. In the example below it has
2320 been added to one of the alternatives for @code{line}:
2326 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
2327 | error '\n' @{ yyerrok; @}
2332 This addition to the grammar allows for simple error recovery in the
2333 event of a syntax error. If an expression that cannot be evaluated is
2334 read, the error will be recognized by the third rule for @code{line},
2335 and parsing will continue. (The @code{yyerror} function is still called
2336 upon to print its message as well.) The action executes the statement
2337 @code{yyerrok}, a macro defined automatically by Bison; its meaning is
2338 that error recovery is complete (@pxref{Error Recovery}). Note the
2339 difference between @code{yyerrok} and @code{yyerror}; neither one is a
2342 This form of error recovery deals with syntax errors. There are other
2343 kinds of errors; for example, division by zero, which raises an exception
2344 signal that is normally fatal. A real calculator program must handle this
2345 signal and use @code{longjmp} to return to @code{main} and resume parsing
2346 input lines; it would also have to discard the rest of the current line of
2347 input. We won't discuss this issue further because it is not specific to
2350 @node Location Tracking Calc
2351 @section Location Tracking Calculator: @code{ltcalc}
2352 @cindex location tracking calculator
2353 @cindex @code{ltcalc}
2354 @cindex calculator, location tracking
2356 This example extends the infix notation calculator with location
2357 tracking. This feature will be used to improve the error messages. For
2358 the sake of clarity, this example is a simple integer calculator, since
2359 most of the work needed to use locations will be done in the lexical
2363 * Ltcalc Declarations:: Bison and C declarations for ltcalc.
2364 * Ltcalc Rules:: Grammar rules for ltcalc, with explanations.
2365 * Ltcalc Lexer:: The lexical analyzer.
2368 See @ref{Tracking Locations} for details about locations.
2370 @node Ltcalc Declarations
2371 @subsection Declarations for @code{ltcalc}
2373 The C and Bison declarations for the location tracking calculator are
2374 the same as the declarations for the infix notation calculator.
2377 /* Location tracking calculator. */
2382 void yyerror (char const *);
2385 /* Bison declarations. */
2386 %define api.value.type @{int@}
2394 %% /* The grammar follows. */
2398 Note there are no declarations specific to locations. Defining a data type
2399 for storing locations is not needed: we will use the type provided by
2400 default (@pxref{Location Type}), which is a four member structure with the
2401 following integer fields: @code{first_line}, @code{first_column},
2402 @code{last_line} and @code{last_column}. By conventions, and in accordance
2403 with the GNU Coding Standards and common practice, the line and column count
2407 @subsection Grammar Rules for @code{ltcalc}
2409 Whether handling locations or not has no effect on the syntax of your
2410 language. Therefore, grammar rules for this example will be very close
2411 to those of the previous example: we will only modify them to benefit
2412 from the new information.
2414 Here, we will use locations to report divisions by zero, and locate the
2415 wrong expressions or subexpressions.
2428 | exp '\n' @{ printf ("%d\n", $1); @}
2435 | exp '+' exp @{ $$ = $1 + $3; @}
2436 | exp '-' exp @{ $$ = $1 - $3; @}
2437 | exp '*' exp @{ $$ = $1 * $3; @}
2447 fprintf (stderr, "%d.%d-%d.%d: division by zero",
2448 @@3.first_line, @@3.first_column,
2449 @@3.last_line, @@3.last_column);
2454 | '-' exp %prec NEG @{ $$ = -$2; @}
2455 | exp '^' exp @{ $$ = pow ($1, $3); @}
2456 | '(' exp ')' @{ $$ = $2; @}
2460 This code shows how to reach locations inside of semantic actions, by
2461 using the pseudo-variables @code{@@@var{n}} for rule components, and the
2462 pseudo-variable @code{@@$} for groupings.
2464 We don't need to assign a value to @code{@@$}: the output parser does it
2465 automatically. By default, before executing the C code of each action,
2466 @code{@@$} is set to range from the beginning of @code{@@1} to the end of
2467 @code{@@@var{n}}, for a rule with @var{n} components. This behavior can be
2468 redefined (@pxref{Location Default Action}), and for very specific rules,
2469 @code{@@$} can be computed by hand.
2472 @subsection The @code{ltcalc} Lexical Analyzer.
2474 Until now, we relied on Bison's defaults to enable location
2475 tracking. The next step is to rewrite the lexical analyzer, and make it
2476 able to feed the parser with the token locations, as it already does for
2479 To this end, we must take into account every single character of the
2480 input text, to avoid the computed locations of being fuzzy or wrong:
2491 /* Skip white space. */
2492 while ((c = getchar ()) == ' ' || c == '\t')
2493 yylloc.last_column += c == '\t' ? 8 - ((yylloc.last_column - 1) & 7) : 1;
2498 yylloc.first_line = yylloc.last_line;
2499 yylloc.first_column = yylloc.last_column;
2503 /* Process numbers. */
2507 ++yylloc.last_column;
2508 while (isdigit (c = getchar ()))
2510 ++yylloc.last_column;
2511 yylval = yylval * 10 + c - '0';
2518 /* Return end-of-input. */
2523 /* Return a single char, and update location. */
2527 yylloc.last_column = 0;
2530 ++yylloc.last_column;
2536 Basically, the lexical analyzer performs the same processing as before: it
2537 skips blanks and tabs, and reads numbers or single-character tokens. In
2538 addition, it updates @code{yylloc}, the global variable (of type
2539 @code{YYLTYPE}) containing the token's location.
2541 Now, each time this function returns a token, the parser has its kind as
2542 well as its semantic value, and its location in the text. The last needed
2543 change is to initialize @code{yylloc}, for example in the controlling
2551 yylloc.first_line = yylloc.last_line = 1;
2552 yylloc.first_column = yylloc.last_column = 0;
2558 Remember that computing locations is not a matter of syntax. Every
2559 character must be associated to a location update, whether it is in
2560 valid input, in comments, in literal strings, and so on.
2562 @node Multi-function Calc
2563 @section Multi-Function Calculator: @code{mfcalc}
2564 @cindex multi-function calculator
2565 @cindex @code{mfcalc}
2566 @cindex calculator, multi-function
2568 Now that the basics of Bison have been discussed, it is time to move on to a
2569 more advanced problem.@footnote{The sources of @command{mfcalc} are
2570 available as @file{examples/c/mfcalc}.} The above calculators provided only
2571 five functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It
2572 would be nice to have a calculator that provides other mathematical
2573 functions such as @code{sin}, @code{cos}, etc.
2575 It is easy to add new operators to the infix calculator as long as they are
2576 only single-character literals. The lexical analyzer @code{yylex} passes
2577 back all nonnumeric characters as tokens, so new grammar rules suffice for
2578 adding a new operator. But we want something more flexible: built-in
2579 functions whose syntax has this form:
2582 @var{function_name} (@var{argument})
2586 At the same time, we will add memory to the calculator, by allowing you
2587 to create named variables, store values in them, and use them later.
2588 Here is a sample session with the multi-function calculator:
2593 @kbd{pi = 3.141592653589}
2594 @result{} 3.1415926536
2598 @result{} 0.0000000000
2600 @kbd{alpha = beta1 = 2.3}
2601 @result{} 2.3000000000
2603 @result{} 2.3000000000
2605 @result{} 0.8329091229
2606 @kbd{exp(ln(beta1))}
2607 @result{} 2.3000000000
2611 Note that multiple assignment and nested function calls are permitted.
2614 * Mfcalc Declarations:: Bison declarations for multi-function calculator.
2615 * Mfcalc Rules:: Grammar rules for the calculator.
2616 * Mfcalc Symbol Table:: Symbol table management subroutines.
2617 * Mfcalc Lexer:: The lexical analyzer.
2618 * Mfcalc Main:: The controlling function.
2621 @node Mfcalc Declarations
2622 @subsection Declarations for @code{mfcalc}
2624 Here are the C and Bison declarations for the multi-function
2628 @comment file: c/mfcalc/mfcalc.y
2630 /* Parser for mfcalc. -*- C -*-
2632 Copyright (C) 1988-1993, 1995, 1998-2015, 2018-2021 Free Software
2635 This file is part of Bison, the GNU Compiler Compiler.
2637 This program is free software: you can redistribute it and/or modify
2638 it under the terms of the GNU General Public License as published by
2639 the Free Software Foundation, either version 3 of the License, or
2640 (at your option) any later version.
2642 This program is distributed in the hope that it will be useful,
2643 but WITHOUT ANY WARRANTY; without even the implied warranty of
2644 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2645 GNU General Public License for more details.
2647 You should have received a copy of the GNU General Public License
2648 along with this program. If not, see <https://www.gnu.org/licenses/>. */
2651 /* Portability issues for strdup. */
2652 #ifndef _XOPEN_SOURCE
2653 # define _XOPEN_SOURCE 600
2659 @comment file: c/mfcalc/mfcalc.y: 1
2663 #include <stdio.h> /* For printf, etc. */
2664 #include <math.h> /* For pow, used in the grammar. */
2665 #include "calc.h" /* Contains definition of 'symrec'. */
2667 void yyerror (char const *);
2671 %define api.value.type union /* Generate YYSTYPE from these types: */
2672 %token <double> NUM /* Double precision number. */
2673 %token <symrec*> VAR FUN /* Symbol table pointer: variable/function. */
2680 %precedence NEG /* negation--unary minus */
2681 %right '^' /* exponentiation */
2685 The above grammar introduces only two new features of the Bison language.
2686 These features allow semantic values to have various data types
2687 (@pxref{Multiple Types}).
2689 The special @code{union} value assigned to the @code{%define} variable
2690 @code{api.value.type} specifies that the symbols are defined with their data
2691 types. Bison will generate an appropriate definition of @code{YYSTYPE} to
2694 Since values can now have various types, it is necessary to associate a type
2695 with each grammar symbol whose semantic value is used. These symbols are
2696 @code{NUM}, @code{VAR}, @code{FUN}, and @code{exp}. Their declarations are
2697 augmented with their data type (placed between angle brackets). For
2698 instance, values of @code{NUM} are stored in @code{double}.
2700 The Bison construct @code{%nterm} is used for declaring nonterminal symbols,
2701 just as @code{%token} is used for declaring token kinds. Previously we did
2702 not use @code{%nterm} before because nonterminal symbols are normally
2703 declared implicitly by the rules that define them. But @code{exp} must be
2704 declared explicitly so we can specify its value type. @xref{Type Decl}.
2707 @subsection Grammar Rules for @code{mfcalc}
2709 Here are the grammar rules for the multi-function calculator.
2710 Most of them are copied directly from @code{calc}; three rules,
2711 those which mention @code{VAR} or @code{FUN}, are new.
2713 @comment file: c/mfcalc/mfcalc.y: 3
2715 %% /* The grammar follows. */
2726 | exp '\n' @{ printf ("%.10g\n", $1); @}
2727 | error '\n' @{ yyerrok; @}
2734 | VAR @{ $$ = $1->value.var; @}
2735 | VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
2736 | FUN '(' exp ')' @{ $$ = $1->value.fun ($3); @}
2737 | exp '+' exp @{ $$ = $1 + $3; @}
2738 | exp '-' exp @{ $$ = $1 - $3; @}
2739 | exp '*' exp @{ $$ = $1 * $3; @}
2740 | exp '/' exp @{ $$ = $1 / $3; @}
2741 | '-' exp %prec NEG @{ $$ = -$2; @}
2742 | exp '^' exp @{ $$ = pow ($1, $3); @}
2743 | '(' exp ')' @{ $$ = $2; @}
2746 /* End of grammar. */
2750 @node Mfcalc Symbol Table
2751 @subsection The @code{mfcalc} Symbol Table
2752 @cindex symbol table example
2754 The multi-function calculator requires a symbol table to keep track of the
2755 names and meanings of variables and functions. This doesn't affect the
2756 grammar rules (except for the actions) or the Bison declarations, but it
2757 requires some additional C functions for support.
2759 The symbol table itself consists of a linked list of records. Its
2760 definition, which is kept in the header @file{calc.h}, is as follows. It
2761 provides for either functions or variables to be placed in the table.
2764 @comment file: c/mfcalc/calc.h
2766 /* Functions for mfcalc. -*- C -*-
2768 Copyright (C) 1988-1993, 1995, 1998-2015, 2018-2021 Free Software
2771 This file is part of Bison, the GNU Compiler Compiler.
2773 This program is free software: you can redistribute it and/or modify
2774 it under the terms of the GNU General Public License as published by
2775 the Free Software Foundation, either version 3 of the License, or
2776 (at your option) any later version.
2778 This program is distributed in the hope that it will be useful,
2779 but WITHOUT ANY WARRANTY; without even the implied warranty of
2780 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2781 GNU General Public License for more details.
2783 You should have received a copy of the GNU General Public License
2784 along with this program. If not, see <https://www.gnu.org/licenses/>. */
2788 @comment file: c/mfcalc/calc.h
2791 /* Function type. */
2792 typedef double (func_t) (double);
2796 /* Data type for links in the chain of symbols. */
2799 char *name; /* name of symbol */
2800 int type; /* type of symbol: either VAR or FUN */
2803 double var; /* value of a VAR */
2804 func_t *fun; /* value of a FUN */
2806 struct symrec *next; /* link field */
2811 typedef struct symrec symrec;
2813 /* The symbol table: a chain of 'struct symrec'. */
2814 extern symrec *sym_table;
2816 symrec *putsym (char const *name, int sym_type);
2817 symrec *getsym (char const *name);
2821 The new version of @code{main} will call @code{init_table} to initialize
2824 @comment file: c/mfcalc/mfcalc.y: 3
2835 struct init const funs[] =
2848 /* The symbol table: a chain of 'struct symrec'. */
2853 /* Put functions in table. */
2859 for (int i = 0; funs[i].name; i++)
2861 symrec *ptr = putsym (funs[i].name, FUN);
2862 ptr->value.fun = funs[i].fun;
2868 By simply editing the initialization list and adding the necessary include
2869 files, you can add additional functions to the calculator.
2871 Two important functions allow look-up and installation of symbols in the
2872 symbol table. The function @code{putsym} is passed a name and the kind
2873 (@code{VAR} or @code{FUN}) of the object to be installed. The object is
2874 linked to the front of the list, and a pointer to the object is returned.
2875 The function @code{getsym} is passed the name of the symbol to look up. If
2876 found, a pointer to that symbol is returned; otherwise zero is returned.
2878 @comment file: c/mfcalc/mfcalc.y: 3
2881 /* The mfcalc code assumes that malloc and realloc
2882 always succeed, and that integer calculations
2883 never overflow. Production-quality code should
2884 not make these assumptions. */
2886 #include <stdlib.h> /* malloc, realloc. */
2887 #include <string.h> /* strdup, strlen. */
2892 putsym (char const *name, int sym_type)
2894 symrec *res = (symrec *) malloc (sizeof (symrec));
2895 res->name = strdup (name);
2896 res->type = sym_type;
2897 res->value.var = 0; /* Set value to 0 even if fun. */
2898 res->next = sym_table;
2906 getsym (char const *name)
2908 for (symrec *p = sym_table; p; p = p->next)
2909 if (strcmp (p->name, name) == 0)
2917 @subsection The @code{mfcalc} Lexer
2919 The function @code{yylex} must now recognize variables, numeric values, and
2920 the single-character arithmetic operators. Strings of alphanumeric
2921 characters with a leading letter are recognized as either variables or
2922 functions depending on what the symbol table says about them.
2924 The string is passed to @code{getsym} for look up in the symbol table. If
2925 the name appears in the table, a pointer to its location and its type
2926 (@code{VAR} or @code{FUN}) is returned to @code{yyparse}. If it is not
2927 already in the table, then it is installed as a @code{VAR} using
2928 @code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is
2929 returned to @code{yyparse}.
2931 No change is needed in the handling of numeric values and arithmetic
2932 operators in @code{yylex}.
2934 @comment file: c/mfcalc/mfcalc.y: 3
2945 /* Ignore white space, get first nonwhite character. */
2946 while (c == ' ' || c == '\t')
2954 /* Char starts a number => parse the number. */
2955 if (c == '.' || isdigit (c))
2958 if (scanf ("%lf", &yylval.NUM) != 1)
2966 Bison generated a definition of @code{YYSTYPE} with a member named
2967 @code{NUM} to store value of @code{NUM} symbols.
2969 @comment file: c/mfcalc/mfcalc.y: 3
2972 /* Char starts an identifier => read the name. */
2975 static ptrdiff_t bufsize = 0;
2976 static char *symbuf = 0;
2982 /* If buffer is full, make it bigger. */
2985 bufsize = 2 * bufsize + 40;
2986 symbuf = realloc (symbuf, (size_t) bufsize);
2988 /* Add this character to the buffer. */
2989 symbuf[i++] = (char) c;
2990 /* Get another character. */
2995 while (isalnum (c));
3002 symrec *s = getsym (symbuf);
3004 s = putsym (symbuf, VAR);
3005 yylval.VAR = s; /* or yylval.FUN = s. */
3009 /* Any other character is a token by itself. */
3016 @subsection The @code{mfcalc} Main
3018 The error reporting function is unchanged, and the new version of
3019 @code{main} includes a call to @code{init_table} and sets the @code{yydebug}
3020 on user demand (@xref{Tracing}, for details):
3022 @comment file: c/mfcalc/mfcalc.y: 3
3025 /* Called by yyparse on error. */
3026 void yyerror (char const *s)
3028 fprintf (stderr, "%s\n", s);
3033 int main (int argc, char const* argv[])
3037 /* Enable parse traces on option -p. */
3038 if (argc == 2 && strcmp(argv[1], "-p") == 0)
3048 This program is both powerful and flexible. You may easily add new
3049 functions, and it is a simple job to modify this code to install
3050 predefined variables such as @code{pi} or @code{e} as well.
3058 Add some new functions from @file{math.h} to the initialization list.
3061 Add another array that contains constants and their values. Then modify
3062 @code{init_table} to add these constants to the symbol table. It will be
3063 easiest to give the constants type @code{VAR}.
3066 Make the program report an error if the user refers to an uninitialized
3067 variable in any way except to store a value in it.
3071 @chapter Bison Grammar Files
3073 Bison takes as input a context-free grammar specification and produces a
3074 C-language function that recognizes correct instances of the grammar.
3076 The Bison grammar file conventionally has a name ending in @samp{.y}.
3080 * Grammar Outline:: Overall layout of the grammar file.
3081 * Symbols:: Terminal and nonterminal symbols.
3082 * Rules:: How to write grammar rules.
3083 * Semantics:: Semantic values and actions.
3084 * Tracking Locations:: Locations and actions.
3085 * Named References:: Using named references in actions.
3086 * Declarations:: All kinds of Bison declarations are described here.
3087 * Multiple Parsers:: Putting more than one Bison parser in one program.
3090 @node Grammar Outline
3091 @section Outline of a Bison Grammar
3094 @findex /* @dots{} */
3096 A Bison grammar file has four main sections, shown here with the
3097 appropriate delimiters:
3104 @var{Bison declarations}
3113 Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
3114 As a GNU extension, @samp{//} introduces a comment that continues until end
3118 * Prologue:: Syntax and usage of the prologue.
3119 * Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
3120 * Bison Declarations:: Syntax and usage of the Bison declarations section.
3121 * Grammar Rules:: Syntax and usage of the grammar rules section.
3122 * Epilogue:: Syntax and usage of the epilogue.
3126 @subsection The prologue
3127 @cindex declarations section
3129 @cindex declarations
3131 The @var{Prologue} section contains macro definitions and declarations of
3132 functions and variables that are used in the actions in the grammar rules.
3133 These are copied to the beginning of the parser implementation file so that
3134 they precede the definition of @code{yyparse}. You can use @samp{#include}
3135 to get the declarations from a header file. If you don't need any C
3136 declarations, you may omit the @samp{%@{} and @samp{%@}} delimiters that
3137 bracket this section.
3139 The @var{Prologue} section is terminated by the first occurrence of
3140 @samp{%@}} that is outside a comment, a string literal, or a character
3143 You may have more than one @var{Prologue} section, intermixed with the
3144 @var{Bison declarations}. This allows you to have C and Bison declarations
3145 that refer to each other. For example, the @code{%union} declaration may
3146 use types defined in a header file, and you may wish to prototype functions
3147 that take arguments of type @code{YYSTYPE}. This can be done with two
3148 @var{Prologue} blocks, one before and one after the @code{%union}
3163 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
3169 static void print_token (yytoken_kind_t token, YYSTYPE val);
3176 When in doubt, it is usually safer to put prologue code before all Bison
3177 declarations, rather than after. For example, any definitions of feature
3178 test macros like @code{_GNU_SOURCE} or @code{_POSIX_C_SOURCE} should appear
3179 before all Bison declarations, as feature test macros can affect the
3180 behavior of Bison-generated @code{#include} directives.
3182 @node Prologue Alternatives
3183 @subsection Prologue Alternatives
3184 @cindex Prologue Alternatives
3187 @findex %code requires
3188 @findex %code provides
3191 The functionality of @var{Prologue} sections can often be subtle and
3192 inflexible. As an alternative, Bison provides a @code{%code} directive with
3193 an explicit qualifier field, which identifies the purpose of the code and
3194 thus the location(s) where Bison should generate it. For C/C++, the
3195 qualifier can be omitted for the default location, or it can be one of
3196 @code{requires}, @code{provides}, @code{top}. @xref{%code Summary}.
3198 Look again at the example of the previous section:
3212 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
3218 static void print_token (yytoken_kind_t token, YYSTYPE val);
3226 Notice that there are two @var{Prologue} sections here, but there's a subtle
3227 distinction between their functionality. For example, if you decide to
3228 override Bison's default definition for @code{YYLTYPE}, in which
3229 @var{Prologue} section should you write your new
3230 definition?@footnote{However, defining @code{YYLTYPE} via a C macro is not
3231 the recommended way. @xref{Location Type}}
3233 write it in the first since Bison will insert that code into the parser
3234 implementation file @emph{before} the default @code{YYLTYPE} definition. In
3235 which @var{Prologue} section should you prototype an internal function,
3236 @code{trace_token}, that accepts @code{YYLTYPE} and @code{yytoken_kind_t} as
3237 arguments? You should prototype it in the second since Bison will insert
3238 that code @emph{after} the @code{YYLTYPE} and @code{yytoken_kind_t}
3241 This distinction in functionality between the two @var{Prologue} sections is
3242 established by the appearance of the @code{%union} between them. This
3243 behavior raises a few questions. First, why should the position of a
3244 @code{%union} affect definitions related to @code{YYLTYPE} and
3245 @code{yytoken_kind_t}? Second, what if there is no @code{%union}? In that
3246 case, the second kind of @var{Prologue} section is not available. This
3247 behavior is not intuitive.
3249 To avoid this subtle @code{%union} dependency, rewrite the example using a
3250 @code{%code top} and an unqualified @code{%code}. Let's go ahead and add
3251 the new @code{YYLTYPE} definition and the @code{trace_token} prototype at
3259 /* WARNING: The following code really belongs
3260 * in a '%code requires'; see below. */
3263 #define YYLTYPE YYLTYPE
3264 typedef struct YYLTYPE
3277 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
3283 static void print_token (yytoken_kind_t token, YYSTYPE val);
3284 static void trace_token (yytoken_kind_t token, YYLTYPE loc);
3292 In this way, @code{%code top} and the unqualified @code{%code} achieve the
3293 same functionality as the two kinds of @var{Prologue} sections, but it's
3294 always explicit which kind you intend. Moreover, both kinds are always
3295 available even in the absence of @code{%union}.
3297 The @code{%code top} block above logically contains two parts. The first
3298 two lines before the warning need to appear near the top of the parser
3299 implementation file. The first line after the warning is required by
3300 @code{YYSTYPE} and thus also needs to appear in the parser implementation
3301 file. However, if you've instructed Bison to generate a parser header file
3302 (@pxref{Decl Summary}), you probably want that line to appear
3303 before the @code{YYSTYPE} definition in that header file as well. The
3304 @code{YYLTYPE} definition should also appear in the parser header file to
3305 override the default @code{YYLTYPE} definition there.
3307 In other words, in the @code{%code top} block above, all but the first two
3308 lines are dependency code required by the @code{YYSTYPE} and @code{YYLTYPE}
3310 Thus, they belong in one or more @code{%code requires}:
3328 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
3334 #define YYLTYPE YYLTYPE
3335 typedef struct YYLTYPE
3348 static void print_token (yytoken_kind_t token, YYSTYPE val);
3349 static void trace_token (yytoken_kind_t token, YYLTYPE loc);
3357 Now Bison will insert @code{#include "ptypes.h"} and the new @code{YYLTYPE}
3358 definition before the Bison-generated @code{YYSTYPE} and @code{YYLTYPE}
3359 definitions in both the parser implementation file and the parser header
3360 file. (By the same reasoning, @code{%code requires} would also be the
3361 appropriate place to write your own definition for @code{YYSTYPE}.)
3363 When you are writing dependency code for @code{YYSTYPE} and @code{YYLTYPE},
3364 you should prefer @code{%code requires} over @code{%code top} regardless of
3365 whether you instruct Bison to generate a parser header file. When you are
3366 writing code that you need Bison to insert only into the parser
3367 implementation file and that has no special need to appear at the top of
3368 that file, you should prefer the unqualified @code{%code} over @code{%code
3369 top}. These practices will make the purpose of each block of your code
3370 explicit to Bison and to other developers reading your grammar file.
3371 Following these practices, we expect the unqualified @code{%code} and
3372 @code{%code requires} to be the most important of the four @var{Prologue}
3375 At some point while developing your parser, you might decide to provide
3376 @code{trace_token} to modules that are external to your parser. Thus, you
3377 might wish for Bison to insert the prototype into both the parser header
3378 file and the parser implementation file. Since this function is not a
3379 dependency required by @code{YYSTYPE} or @code{YYLTYPE}, it doesn't make
3380 sense to move its prototype to a @code{%code requires}. More importantly,
3381 since it depends upon @code{YYLTYPE} and @code{yytoken_kind_t}, @code{%code
3382 requires} is not sufficient. Instead, move its prototype from the
3383 unqualified @code{%code} to a @code{%code provides}:
3401 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
3407 #define YYLTYPE YYLTYPE
3408 typedef struct YYLTYPE
3421 void trace_token (yytoken_kind_t token, YYLTYPE loc);
3427 static void print_token (FILE *file, int token, YYSTYPE val);
3435 Bison will insert the @code{trace_token} prototype into both the parser
3436 header file and the parser implementation file after the definitions for
3437 @code{yytoken_kind_t}, @code{YYLTYPE}, and @code{YYSTYPE}.
3439 The above examples are careful to write directives in an order that reflects
3440 the layout of the generated parser implementation and header files:
3441 @code{%code top}, @code{%code requires}, @code{%code provides}, and then
3442 @code{%code}. While your grammar files may generally be easier to read if
3443 you also follow this order, Bison does not require it. Instead, Bison lets
3444 you choose an organization that makes sense to you.
3446 You may declare any of these directives multiple times in the grammar file.
3447 In that case, Bison concatenates the contained code in declaration order.
3448 This is the only way in which the position of one of these directives within
3449 the grammar file affects its functionality.
3451 The result of the previous two properties is greater flexibility in how you may
3452 organize your grammar file.
3453 For example, you may organize semantic-type-related directives by semantic
3458 %code requires @{ #include "type1.h" @}
3459 %union @{ type1 field1; @}
3460 %destructor @{ type1_free ($$); @} <field1>
3461 %printer @{ type1_print (yyo, $$); @} <field1>
3465 %code requires @{ #include "type2.h" @}
3466 %union @{ type2 field2; @}
3467 %destructor @{ type2_free ($$); @} <field2>
3468 %printer @{ type2_print (yyo, $$); @} <field2>
3473 You could even place each of the above directive groups in the rules section of
3474 the grammar file next to the set of rules that uses the associated semantic
3476 (In the rules section, you must terminate each of those directives with a
3478 And you don't have to worry that some directive (like a @code{%union}) in the
3479 definitions section is going to adversely affect their functionality in some
3480 counter-intuitive manner just because it comes first.
3481 Such an organization is not possible using @var{Prologue} sections.
3483 This section has been concerned with explaining the advantages of the four
3484 @var{Prologue} alternatives over the original Yacc @var{Prologue}.
3485 However, in most cases when using these directives, you shouldn't need to
3486 think about all the low-level ordering issues discussed here.
3487 Instead, you should simply use these directives to label each block of your
3488 code according to its purpose and let Bison handle the ordering.
3489 @code{%code} is the most generic label.
3490 Move code to @code{%code requires}, @code{%code provides}, or @code{%code top}
3493 @node Bison Declarations
3494 @subsection The Bison Declarations Section
3495 @cindex Bison declarations (introduction)
3496 @cindex declarations, Bison (introduction)
3498 The @var{Bison declarations} section contains declarations that define
3499 terminal and nonterminal symbols, specify precedence, and so on.
3500 In some simple grammars you may not need any declarations.
3501 @xref{Declarations}.
3504 @subsection The Grammar Rules Section
3505 @cindex grammar rules section
3506 @cindex rules section for grammar
3508 The @dfn{grammar rules} section contains one or more Bison grammar
3509 rules, and nothing else. @xref{Rules}.
3511 There must always be at least one grammar rule, and the first
3512 @samp{%%} (which precedes the grammar rules) may never be omitted even
3513 if it is the first thing in the file.
3516 @subsection The epilogue
3517 @cindex additional C code section
3519 @cindex C code, section for additional
3521 The @var{Epilogue} is copied verbatim to the end of the parser
3522 implementation file, just as the @var{Prologue} is copied to the
3523 beginning. This is the most convenient place to put anything that you
3524 want to have in the parser implementation file but which need not come
3525 before the definition of @code{yyparse}. For example, the definitions
3526 of @code{yylex} and @code{yyerror} often go here. Because C requires
3527 functions to be declared before being used, you often need to declare
3528 functions like @code{yylex} and @code{yyerror} in the Prologue, even
3529 if you define them in the Epilogue. @xref{Interface}.
3531 If the last section is empty, you may omit the @samp{%%} that separates it
3532 from the grammar rules.
3534 The Bison parser itself contains many macros and identifiers whose names
3535 start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using
3536 any such names (except those documented in this manual) in the epilogue
3537 of the grammar file.
3540 @section Symbols, Terminal and Nonterminal
3541 @cindex nonterminal symbol
3542 @cindex terminal symbol
3546 @dfn{Symbols} in Bison grammars represent the grammatical classifications
3549 A @dfn{terminal symbol} (also known as a @dfn{token kind}) represents a
3550 class of syntactically equivalent tokens. You use the symbol in grammar
3551 rules to mean that a token in that class is allowed. The symbol is
3552 represented in the Bison parser by a numeric code, and the @code{yylex}
3553 function returns a token kind code to indicate what kind of token has been
3554 read. You don't need to know what the code value is; you can use the symbol
3557 A @dfn{nonterminal symbol} stands for a class of syntactically
3558 equivalent groupings. The symbol name is used in writing grammar rules.
3559 By convention, it should be all lower case.
3561 Symbol names can contain letters, underscores, periods, and non-initial
3562 digits and dashes. Dashes in symbol names are a GNU extension, incompatible
3563 with POSIX Yacc. Periods and dashes make symbol names less convenient to
3564 use with named references, which require brackets around such names
3565 (@pxref{Named References}). Terminal symbols that contain periods or dashes
3566 make little sense: since they are not valid symbols (in most programming
3567 languages) they are not exported as token names.
3569 There are three ways of writing terminal symbols in the grammar:
3573 A @dfn{named token kind} is written with an identifier, like an identifier
3574 in C@. By convention, it should be all upper case. Each such name must be
3575 defined with a Bison declaration such as @code{%token}. @xref{Token Decl}.
3578 @cindex character token
3579 @cindex literal token
3580 @cindex single-character literal
3581 A @dfn{character token kind} (or @dfn{literal character token}) is written
3582 in the grammar using the same syntax used in C for character constants; for
3583 example, @code{'+'} is a character token kind. A character token kind
3584 doesn't need to be declared unless you need to specify its semantic value
3585 data type (@pxref{Value Type}), associativity, or precedence
3586 (@pxref{Precedence}).
3588 By convention, a character token kind is used only to represent a token that
3589 consists of that particular character. Thus, the token kind @code{'+'} is
3590 used to represent the character @samp{+} as a token. Nothing enforces this
3591 convention, but if you depart from it, your program will confuse other
3594 All the usual escape sequences used in character literals in C can be used
3595 in Bison as well, but you must not use the null character as a character
3596 literal because its numeric code, zero, signifies end-of-input
3597 (@pxref{Calling Convention}). Also, unlike standard C, trigraphs have no
3598 special meaning in Bison character literals, nor is backslash-newline
3602 @cindex string token
3603 @cindex literal string token
3604 @cindex multicharacter literal
3605 A @dfn{literal string token} is written like a C string constant; for
3606 example, @code{"<="} is a literal string token. A literal string token
3607 doesn't need to be declared unless you need to specify its semantic
3608 value data type (@pxref{Value Type}), associativity, or precedence
3609 (@pxref{Precedence}).
3611 You can associate the literal string token with a symbolic name as an alias,
3612 using the @code{%token} declaration (@pxref{Token Decl}). If you don't do
3613 that, the lexical analyzer has to retrieve the token code for the literal
3614 string token from the @code{yytname} table (@pxref{Calling Convention}).
3616 @strong{Warning}: literal string tokens do not work in Yacc.
3618 By convention, a literal string token is used only to represent a token
3619 that consists of that particular string. Thus, you should use the token
3620 kind @code{"<="} to represent the string @samp{<=} as a token. Bison
3621 does not enforce this convention, but if you depart from it, people who
3622 read your program will be confused.
3624 All the escape sequences used in string literals in C can be used in
3625 Bison as well, except that you must not use a null character within a
3626 string literal. Also, unlike Standard C, trigraphs have no special
3627 meaning in Bison string literals, nor is backslash-newline allowed. A
3628 literal string token must contain two or more characters; for a token
3629 containing just one character, use a character token (see above).
3632 How you choose to write a terminal symbol has no effect on its
3633 grammatical meaning. That depends only on where it appears in rules and
3634 on when the parser function returns that symbol.
3636 The value returned by @code{yylex} is always one of the terminal
3637 symbols, except that a zero or negative value signifies end-of-input.
3638 Whichever way you write the token kind in the grammar rules, you write
3639 it the same way in the definition of @code{yylex}. The numeric code
3640 for a character token kind is simply the positive numeric code of the
3641 character, so @code{yylex} can use the identical value to generate the
3642 requisite code, though you may need to convert it to @code{unsigned
3643 char} to avoid sign-extension on hosts where @code{char} is signed.
3644 Each named token kind becomes a C macro in the parser implementation
3645 file, so @code{yylex} can use the name to stand for the code. (This
3646 is why periods don't make sense in terminal symbols.) @xref{Calling
3649 If @code{yylex} is defined in a separate file, you need to arrange for the
3650 token-kind definitions to be available there. Use the @option{-d} option
3651 when you run Bison, so that it will write these definitions into a separate
3652 header file @file{@var{name}.tab.h} which you can include in the other
3653 source files that need it. @xref{Invocation}.
3655 If you want to write a grammar that is portable to any Standard C
3656 host, you must use only nonnull character tokens taken from the basic
3657 execution character set of Standard C@. This set consists of the ten
3658 digits, the 52 lower- and upper-case English letters, and the
3659 characters in the following C-language string:
3662 "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~"
3665 The @code{yylex} function and Bison must use a consistent character set
3666 and encoding for character tokens. For example, if you run Bison in an
3667 ASCII environment, but then compile and run the resulting
3668 program in an environment that uses an incompatible character set like
3669 EBCDIC, the resulting program may not work because the tables
3670 generated by Bison will assume ASCII numeric values for
3671 character tokens. It is standard practice for software distributions to
3672 contain C source files that were generated by Bison in an
3673 ASCII environment, so installers on platforms that are
3674 incompatible with ASCII must rebuild those files before
3677 The symbol @code{error} is a terminal symbol reserved for error recovery
3678 (@pxref{Error Recovery}); you shouldn't use it for any other purpose.
3679 In particular, @code{yylex} should never return this value. The default
3680 value of the error token is 256, unless you explicitly assigned 256 to
3681 one of your tokens with a @code{%token} declaration.
3684 @section Grammar Rules
3686 A Bison grammar is a list of rules.
3689 * Rules Syntax:: Syntax of the rules.
3690 * Empty Rules:: Symbols that can match the empty string.
3691 * Recursion:: Writing recursive rules.
3695 @subsection Syntax of Grammar Rules
3697 @cindex grammar rule syntax
3698 @cindex syntax of grammar rules
3700 A Bison grammar rule has the following general form:
3703 @var{result}: @var{components}@dots{};
3707 where @var{result} is the nonterminal symbol that this rule describes,
3708 and @var{components} are various terminal and nonterminal symbols that
3709 are put together by this rule (@pxref{Symbols}).
3718 says that two groupings of type @code{exp}, with a @samp{+} token in between,
3719 can be combined into a larger grouping of type @code{exp}.
3721 White space in rules is significant only to separate symbols. You can add
3722 extra white space as you wish.
3724 Scattered among the components can be @var{actions} that determine
3725 the semantics of the rule. An action looks like this:
3728 @{@var{C statements}@}
3733 This is an example of @dfn{braced code}, that is, C code surrounded by
3734 braces, much like a compound statement in C@. Braced code can contain
3735 any sequence of C tokens, so long as its braces are balanced. Bison
3736 does not check the braced code for correctness directly; it merely
3737 copies the code to the parser implementation file, where the C
3738 compiler can check it.
3740 Within braced code, the balanced-brace count is not affected by braces
3741 within comments, string literals, or character constants, but it is
3742 affected by the C digraphs @samp{<%} and @samp{%>} that represent
3743 braces. At the top level braced code must be terminated by @samp{@}}
3744 and not by a digraph. Bison does not look for trigraphs, so if braced
3745 code uses trigraphs you should ensure that they do not affect the
3746 nesting of braces or the boundaries of comments, string literals, or
3747 character constants.
3749 Usually there is only one action and it follows the components.
3753 Multiple rules for the same @var{result} can be written separately or can
3754 be joined with the vertical-bar character @samp{|} as follows:
3759 @var{rule1-components}@dots{}
3760 | @var{rule2-components}@dots{}
3767 They are still considered distinct rules even when joined in this way.
3770 @subsection Empty Rules
3775 A rule is said to be @dfn{empty} if its right-hand side (@var{components})
3776 is empty. It means that @var{result} in the previous example can match the
3777 empty string. As another example, here is how to define an optional
3781 semicolon.opt: | ";";
3785 It is easy not to see an empty rule, especially when @code{|} is used. The
3786 @code{%empty} directive allows to make explicit that a rule is empty on
3798 Flagging a non-empty rule with @code{%empty} is an error. If run with
3799 @option{-Wempty-rule}, @command{bison} will report empty rules without
3800 @code{%empty}. Using @code{%empty} enables this warning, unless
3801 @option{-Wno-empty-rule} was specified.
3803 The @code{%empty} directive is a Bison extension, it does not work with
3804 Yacc. To remain compatible with POSIX Yacc, it is customary to write a
3805 comment @samp{/* empty */} in each rule with no components:
3818 @subsection Recursive Rules
3819 @cindex recursive rule
3820 @cindex rule, recursive
3822 A rule is called @dfn{recursive} when its @var{result} nonterminal
3823 appears also on its right hand side. Nearly all Bison grammars need to
3824 use recursion, because that is the only way to define a sequence of any
3825 number of a particular thing. Consider this recursive definition of a
3826 comma-separated sequence of one or more expressions:
3837 @cindex left recursion
3838 @cindex right recursion
3840 Since the recursive use of @code{expseq1} is the leftmost symbol in the
3841 right hand side, we call this @dfn{left recursion}. By contrast, here
3842 the same construct is defined using @dfn{right recursion}:
3854 Any kind of sequence can be defined using either left recursion or right
3855 recursion, but you should always use left recursion, because it can
3856 parse a sequence of any number of elements with bounded stack space.
3857 Right recursion uses up space on the Bison stack in proportion to the
3858 number of elements in the sequence, because all the elements must be
3859 shifted onto the stack before the rule can be applied even once.
3860 @xref{Algorithm}, for further explanation
3863 @cindex mutual recursion
3864 @dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
3865 rule does not appear directly on its right hand side, but does appear
3866 in rules for other nonterminals which do appear on its right hand
3875 | primary '+' primary
3888 defines two mutually-recursive nonterminals, since each refers to the
3892 @section Defining Language Semantics
3893 @cindex defining language semantics
3894 @cindex language semantics, defining
3896 The grammar rules for a language determine only the syntax. The semantics
3897 are determined by the semantic values associated with various tokens and
3898 groupings, and by the actions taken when various groupings are recognized.
3900 For example, the calculator calculates properly because the value
3901 associated with each expression is the proper number; it adds properly
3902 because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
3903 the numbers associated with @var{x} and @var{y}.
3906 * Value Type:: Specifying one data type for all semantic values.
3907 * Multiple Types:: Specifying several alternative data types.
3908 * Type Generation:: Generating the semantic value type.
3909 * Union Decl:: Declaring the set of all semantic value types.
3910 * Structured Value Type:: Providing a structured semantic value type.
3911 * Actions:: An action is the semantic definition of a grammar rule.
3912 * Action Types:: Specifying data types for actions to operate on.
3913 * Midrule Actions:: Most actions go at the end of a rule.
3914 This says when, why and how to use the exceptional
3915 action in the middle of a rule.
3919 @subsection Data Types of Semantic Values
3920 @cindex semantic value type
3921 @cindex value type, semantic
3922 @cindex data types of semantic values
3923 @cindex default data type
3925 In a simple program it may be sufficient to use the same data type for
3926 the semantic values of all language constructs. This was true in the
3927 RPN and infix calculator examples (@pxref{RPN Calc}).
3929 Bison normally uses the type @code{int} for semantic values if your program
3930 uses the same data type for all language constructs. To specify some other
3931 type, define the @code{%define} variable @code{api.value.type} like this:
3934 %define api.value.type @{double@}
3941 %define api.value.type @{struct semantic_value_type@}
3944 The value of @code{api.value.type} should be a type name that does not
3945 contain parentheses or square brackets.
3947 Alternatively in C, instead of relying of Bison's @code{%define} support,
3948 you may rely on the C preprocessor and define @code{YYSTYPE} as a macro:
3951 #define YYSTYPE double
3955 This macro definition must go in the prologue of the grammar file
3956 (@pxref{Grammar Outline}). If compatibility with POSIX Yacc matters to you,
3957 use this. Note however that Bison cannot know @code{YYSTYPE}'s value, not
3958 even whether it is defined, so there are services it cannot provide.
3959 Besides this works only for C.
3961 @node Multiple Types
3962 @subsection More Than One Value Type
3964 In most programs, you will need different data types for different kinds
3965 of tokens and groupings. For example, a numeric constant may need type
3966 @code{int} or @code{long}, while a string constant needs type
3967 @code{char *}, and an identifier might need a pointer to an entry in the
3970 To use more than one data type for semantic values in one parser, Bison
3971 requires you to do two things:
3975 Specify the entire collection of possible data types. There are several
3979 let Bison compute the union type from the tags you assign to symbols;
3982 use the @code{%union} Bison declaration (@pxref{Union Decl});
3985 define the @code{%define} variable @code{api.value.type} to be a union type
3986 whose members are the type tags (@pxref{Structured Value Type});
3989 use a @code{typedef} or a @code{#define} to define @code{YYSTYPE} to be a
3990 union type whose member names are the type tags.
3994 Choose one of those types for each symbol (terminal or nonterminal) for
3995 which semantic values are used. This is done for tokens with the
3996 @code{%token} Bison declaration (@pxref{Token Decl}) and
3997 for groupings with the @code{%nterm}/@code{%type} Bison declarations
3998 (@pxref{Type Decl}).
4001 @node Type Generation
4002 @subsection Generating the Semantic Value Type
4003 @cindex declaring value types
4004 @cindex value types, declaring
4005 @findex %define api.value.type union
4007 The special value @code{union} of the @code{%define} variable
4008 @code{api.value.type} instructs Bison that the type tags (used with the
4009 @code{%token}, @code{%nterm} and @code{%type} directives) are genuine types,
4010 not names of members of @code{YYSTYPE}.
4015 %define api.value.type union
4016 %token <int> INT "integer"
4019 %token <char const *> ID "identifier"
4023 generates an appropriate value of @code{YYSTYPE} to support each symbol
4024 type. The name of the member of @code{YYSTYPE} for tokens than have a
4025 declared identifier @var{id} (such as @code{INT} and @code{ID} above, but
4026 not @code{'n'}) is @code{@var{id}}. The other symbols have unspecified
4027 names on which you should not depend; instead, relying on C casts to access
4028 the semantic value with the appropriate type:
4031 /* For an "integer". */
4035 /* For an 'n', also declared as int. */
4036 *((int*)&yylval) = 42;
4039 /* For an "identifier". */
4044 If the @code{%define} variable @code{api.token.prefix} is defined
4045 (@pxref{%define Summary}), then it is also used to prefix
4046 the union member names. For instance, with @samp{%define api.token.prefix
4050 /* For an "integer". */
4051 yylval.TOK_INT = 42;
4055 This Bison extension cannot work if @code{%yacc} (or
4056 @option{-y}/@option{--yacc}) is enabled, as POSIX mandates that Yacc
4057 generate tokens as macros (e.g., @samp{#define INT 258}, or @samp{#define
4060 A similar feature is provided for C++ that in addition overcomes C++
4061 limitations (that forbid non-trivial objects to be part of a @code{union}):
4062 @samp{%define api.value.type variant}, see @ref{C++ Variants}.
4065 @subsection The Union Declaration
4066 @cindex declaring value types
4067 @cindex value types, declaring
4070 The @code{%union} declaration specifies the entire collection of possible
4071 data types for semantic values. The keyword @code{%union} is followed by
4072 braced code containing the same thing that goes inside a @code{union} in C@.
4086 This says that the two alternative types are @code{double} and @code{symrec
4087 *}. They are given names @code{val} and @code{tptr}; these names are used
4088 in the @code{%token}, @code{%nterm} and @code{%type} declarations to pick
4089 one of the types for a terminal or nonterminal symbol (@pxref{Type Decl}).
4091 As an extension to POSIX, a tag is allowed after the @code{%union}. For
4104 specifies the union tag @code{value}, so the corresponding C type is
4105 @code{union value}. If you do not specify a tag, it defaults to
4106 @code{YYSTYPE} (@pxref{%define Summary}).
4108 As another extension to POSIX, you may specify multiple @code{%union}
4109 declarations; their contents are concatenated. However, only the first
4110 @code{%union} declaration can specify a tag.
4112 Note that, unlike making a @code{union} declaration in C, you need not write
4113 a semicolon after the closing brace.
4115 @node Structured Value Type
4116 @subsection Providing a Structured Semantic Value Type
4117 @cindex declaring value types
4118 @cindex value types, declaring
4121 Instead of @code{%union}, you can define and use your own union type
4122 @code{YYSTYPE} if your grammar contains at least one @samp{<@var{type}>}
4123 tag. For example, you can put the following into a header file
4136 and then your grammar can use the following instead of @code{%union}:
4143 %define api.value.type @{union YYSTYPE@}
4149 Actually, you may also provide a @code{struct} rather that a @code{union},
4150 which may be handy if you want to track information for every symbol (such
4151 as preceding comments).
4153 The type you provide may even be structured and include pointers, in which
4154 case the type tags you provide may be composite, with @samp{.} and @samp{->}
4163 @vindex $[@var{name}]
4165 An action accompanies a syntactic rule and contains C code to be executed
4166 each time an instance of that rule is recognized. The task of most actions
4167 is to compute a semantic value for the grouping built by the rule from the
4168 semantic values associated with tokens or smaller groupings.
4170 An action consists of braced code containing C statements, and can be
4171 placed at any position in the rule;
4172 it is executed at that position. Most rules have just one action at the
4173 end of the rule, following all the components. Actions in the middle of
4174 a rule are tricky and used only for special purposes (@pxref{Midrule
4177 The C code in an action can refer to the semantic values of the
4178 components matched by the rule with the construct @code{$@var{n}},
4179 which stands for the value of the @var{n}th component. The semantic
4180 value for the grouping being constructed is @code{$$}. In addition,
4181 the semantic values of symbols can be accessed with the named
4182 references construct @code{$@var{name}} or @code{$[@var{name}]}.
4183 Bison translates both of these constructs into expressions of the
4184 appropriate type when it copies the actions into the parser
4185 implementation file. @code{$$} (or @code{$@var{name}}, when it stands
4186 for the current grouping) is translated to a modifiable lvalue, so it
4189 Here is a typical example:
4195 | exp '+' exp @{ $$ = $1 + $3; @}
4199 Or, in terms of named references:
4205 | exp[left] '+' exp[right] @{ $result = $left + $right; @}
4210 This rule constructs an @code{exp} from two smaller @code{exp} groupings
4211 connected by a plus-sign token. In the action, @code{$1} and @code{$3}
4212 (@code{$left} and @code{$right})
4213 refer to the semantic values of the two component @code{exp} groupings,
4214 which are the first and third symbols on the right hand side of the rule.
4215 The sum is stored into @code{$$} (@code{$result}) so that it becomes the
4217 the addition-expression just recognized by the rule. If there were a
4218 useful semantic value associated with the @samp{+} token, it could be
4219 referred to as @code{$2}.
4221 @xref{Named References}, for more information about using the named
4222 references construct.
4224 Note that the vertical-bar character @samp{|} is really a rule
4225 separator, and actions are attached to a single rule. This is a
4226 difference with tools like Flex, for which @samp{|} stands for either
4227 ``or'', or ``the same action as that of the next rule''. In the
4228 following example, the action is triggered only when @samp{b} is found:
4231 a-or-b: 'a'|'b' @{ a_or_b_found = 1; @};
4234 @cindex default action
4235 If you don't specify an action for a rule, Bison supplies a default:
4236 @w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule
4237 becomes the value of the whole rule. Of course, the default action is
4238 valid only if the two data types match. There is no meaningful default
4239 action for an empty rule; every empty rule must have an explicit action
4240 unless the rule's value does not matter.
4242 @code{$@var{n}} with @var{n} zero or negative is allowed for reference
4243 to tokens and groupings on the stack @emph{before} those that match the
4244 current rule. This is a very risky practice, and to use it reliably
4245 you must be certain of the context in which the rule is applied. Here
4246 is a case in which you can use this reliably:
4251 expr bar '+' expr @{ @dots{} @}
4252 | expr bar '-' expr @{ @dots{} @}
4258 %empty @{ previous_expr = $0; @}
4263 As long as @code{bar} is used only in the fashion shown here, @code{$0}
4264 always refers to the @code{expr} which precedes @code{bar} in the
4265 definition of @code{foo}.
4268 It is also possible to access the semantic value of the lookahead token, if
4269 any, from a semantic action.
4270 This semantic value is stored in @code{yylval}.
4271 @xref{Action Features}.
4274 @subsection Data Types of Values in Actions
4275 @cindex action data types
4276 @cindex data types in actions
4278 If you have chosen a single data type for semantic values, the @code{$$}
4279 and @code{$@var{n}} constructs always have that data type.
4281 If you have used @code{%union} to specify a variety of data types, then you
4282 must declare a choice among these types for each terminal or nonterminal
4283 symbol that can have a semantic value. Then each time you use @code{$$} or
4284 @code{$@var{n}}, its data type is determined by which symbol it refers to
4285 in the rule. In this example,
4291 | exp '+' exp @{ $$ = $1 + $3; @}
4296 @code{$1} and @code{$3} refer to instances of @code{exp}, so they all
4297 have the data type declared for the nonterminal symbol @code{exp}. If
4298 @code{$2} were used, it would have the data type declared for the
4299 terminal symbol @code{'+'}, whatever that might be.
4301 Alternatively, you can specify the data type when you refer to the value,
4302 by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
4303 reference. For example, if you have defined types as shown here:
4315 then you can write @code{$<itype>1} to refer to the first subunit of the
4316 rule as an integer, or @code{$<dtype>1} to refer to it as a double.
4318 @node Midrule Actions
4319 @subsection Actions in Midrule
4320 @cindex actions in midrule
4321 @cindex midrule actions
4323 Occasionally it is useful to put an action in the middle of a rule.
4324 These actions are written just like usual end-of-rule actions, but they
4325 are executed before the parser even recognizes the following components.
4328 * Using Midrule Actions:: Putting an action in the middle of a rule.
4329 * Typed Midrule Actions:: Specifying the semantic type of their values.
4330 * Midrule Action Translation:: How midrule actions are actually processed.
4331 * Midrule Conflicts:: Midrule actions can cause conflicts.
4334 @node Using Midrule Actions
4335 @subsubsection Using Midrule Actions
4337 A midrule action may refer to the components preceding it using
4338 @code{$@var{n}}, but it may not refer to subsequent components because
4339 it is run before they are parsed.
4341 The midrule action itself counts as one of the components of the rule.
4342 This makes a difference when there is another action later in the same rule
4343 (and usually there is another at the end): you have to count the actions
4344 along with the symbols when working out which number @var{n} to use in
4347 The midrule action can also have a semantic value. The action can set
4348 its value with an assignment to @code{$$}, and actions later in the rule
4349 can refer to the value using @code{$@var{n}}. Since there is no symbol
4350 to name the action, there is no way to declare a data type for the value
4351 in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to
4352 specify a data type each time you refer to this value.
4354 There is no way to set the value of the entire rule with a midrule
4355 action, because assignments to @code{$$} do not have that effect. The
4356 only way to set the value for the entire rule is with an ordinary action
4357 at the end of the rule.
4359 Here is an example from a hypothetical compiler, handling a @code{let}
4360 statement that looks like @samp{let (@var{variable}) @var{statement}} and
4361 serves to create a variable named @var{variable} temporarily for the
4362 duration of @var{statement}. To parse this construct, we must put
4363 @var{variable} into the symbol table while @var{statement} is parsed, then
4364 remove it afterward. Here is how it is done:
4371 $<context>$ = push_context ();
4372 declare_variable ($3);
4377 pop_context ($<context>5);
4383 As soon as @samp{let (@var{variable})} has been recognized, the first
4384 action is run. It saves a copy of the current semantic context (the
4385 list of accessible variables) as its semantic value, using alternative
4386 @code{context} in the data-type union. Then it calls
4387 @code{declare_variable} to add the new variable to that list. Once the
4388 first action is finished, the embedded statement @code{stmt} can be
4391 Note that the midrule action is component number 5, so the @samp{stmt} is
4392 component number 6. Named references can be used to improve the readability
4393 and maintainability (@pxref{Named References}):
4400 $<context>let = push_context ();
4401 declare_variable ($3);
4406 pop_context ($<context>let);
4411 After the embedded statement is parsed, its semantic value becomes the
4412 value of the entire @code{let}-statement. Then the semantic value from the
4413 earlier action is used to restore the prior list of variables. This
4414 removes the temporary @code{let}-variable from the list so that it won't
4415 appear to exist while the rest of the program is parsed.
4417 Because the types of the semantic values of midrule actions are unknown to
4418 Bison, type-based features (e.g., @samp{%printer}, @samp{%destructor}) do
4419 not work, which could result in memory leaks. They also forbid the use of
4420 the @code{variant} implementation of the @code{api.value.type} in C++
4421 (@pxref{C++ Variants}).
4423 @xref{Typed Midrule Actions}, for one way to address this issue, and
4424 @ref{Midrule Action Translation}, for another: turning mid-action actions
4425 into regular actions.
4428 @node Typed Midrule Actions
4429 @subsubsection Typed Midrule Actions
4432 @cindex discarded symbols, midrule actions
4433 @cindex error recovery, midrule actions
4434 In the above example, if the parser initiates error recovery (@pxref{Error
4435 Recovery}) while parsing the tokens in the embedded statement @code{stmt},
4436 it might discard the previous semantic context @code{$<context>5} without
4437 restoring it. Thus, @code{$<context>5} needs a destructor
4438 (@pxref{Destructor Decl}), and Bison needs the
4439 type of the semantic value (@code{context}) to select the right destructor.
4441 As an extension to Yacc's midrule actions, Bison offers a means to type
4442 their semantic value: specify its type tag (@samp{<...>} before the midrule
4445 Consider the previous example, with an untyped midrule action:
4452 $<context>$ = push_context (); // ***
4453 declare_variable ($3);
4458 pop_context ($<context>5); // ***
4464 If instead you write:
4471 $$ = push_context (); // ***
4472 declare_variable ($3);
4477 pop_context ($5); // ***
4483 then @code{%printer} and @code{%destructor} work properly (no more leaks!),
4484 C++ @code{variant}s can be used, and redundancy is reduced (@code{<context>}
4488 @node Midrule Action Translation
4489 @subsubsection Midrule Action Translation
4493 Midrule actions are actually transformed into regular rules and actions.
4494 The various reports generated by Bison (textual, graphical, etc., see
4495 @ref{Understanding}) reveal this translation,
4496 best explained by means of an example. The following rule:
4499 exp: @{ a(); @} "b" @{ c(); @} @{ d(); @} "e" @{ f(); @};
4506 $@@1: %empty @{ a(); @};
4507 $@@2: %empty @{ c(); @};
4508 $@@3: %empty @{ d(); @};
4509 exp: $@@1 "b" $@@2 $@@3 "e" @{ f(); @};
4513 with new nonterminal symbols @code{$@@@var{n}}, where @var{n} is a number.
4515 A midrule action is expected to generate a value if it uses @code{$$}, or
4516 the (final) action uses @code{$@var{n}} where @var{n} denote the midrule
4517 action. In that case its nonterminal is rather named @code{@@@var{n}}:
4520 exp: @{ a(); @} "b" @{ $$ = c(); @} @{ d(); @} "e" @{ f = $1; @};
4527 @@1: %empty @{ a(); @};
4528 @@2: %empty @{ $$ = c(); @};
4529 $@@3: %empty @{ d(); @};
4530 exp: @@1 "b" @@2 $@@3 "e" @{ f = $1; @}
4533 There are probably two errors in the above example: the first midrule action
4534 does not generate a value (it does not use @code{$$} although the final
4535 action uses it), and the value of the second one is not used (the final
4536 action does not use @code{$3}). Bison reports these errors when the
4537 @code{midrule-value} warnings are enabled (@pxref{Invocation}):
4540 $ @kbd{bison -Wmidrule-value mid.y}
4542 mid.y:2.6-13: @dwarning{warning}: unset value: $$
4543 2 | exp: @dwarning{@{ a(); @}} "b" @{ $$ = c(); @} @{ d(); @} "e" @{ f = $1; @};
4544 | @dwarning{^~~~~~~~}
4547 mid.y:2.19-31: @dwarning{warning}: unused value: $3
4548 2 | exp: @{ a(); @} "b" @dwarning{@{ $$ = c(); @}} @{ d(); @} "e" @{ f = $1; @};
4549 | @dwarning{^~~~~~~~~~~~~}
4555 It is sometimes useful to turn midrule actions into regular actions, e.g.,
4556 to factor them, or to escape from their limitations. For instance, as an
4557 alternative to @emph{typed} midrule action, you may bury the midrule action
4558 inside a nonterminal symbol and to declare a printer and a destructor for
4563 %nterm <context> let
4564 %destructor @{ pop_context ($$); @} let
4565 %printer @{ print_context (yyo, $$); @} let
4583 $let = push_context ();
4584 declare_variable ($var);
4593 @node Midrule Conflicts
4594 @subsubsection Conflicts due to Midrule Actions
4595 Taking action before a rule is completely recognized often leads to
4596 conflicts since the parser must commit to a parse in order to execute the
4597 action. For example, the following two rules, without midrule actions,
4598 can coexist in a working parser because the parser can shift the open-brace
4599 token and look at what follows before deciding whether there is a
4605 '@{' declarations statements '@}'
4606 | '@{' statements '@}'
4612 But when we add a midrule action as follows, the rules become nonfunctional:
4617 @{ prepare_for_local_variables (); @}
4618 '@{' declarations statements '@}'
4621 | '@{' statements '@}'
4627 Now the parser is forced to decide whether to run the midrule action
4628 when it has read no farther than the open-brace. In other words, it
4629 must commit to using one rule or the other, without sufficient
4630 information to do it correctly. (The open-brace token is what is called
4631 the @dfn{lookahead} token at this time, since the parser is still
4632 deciding what to do about it. @xref{Lookahead}.)
4634 You might think that you could correct the problem by putting identical
4635 actions into the two rules, like this:
4640 @{ prepare_for_local_variables (); @}
4641 '@{' declarations statements '@}'
4642 | @{ prepare_for_local_variables (); @}
4643 '@{' statements '@}'
4649 But this does not help, because Bison does not realize that the two actions
4650 are identical. (Bison never tries to understand the C code in an action.)
4652 If the grammar is such that a declaration can be distinguished from a
4653 statement by the first token (which is true in C), then one solution which
4654 does work is to put the action after the open-brace, like this:
4659 '@{' @{ prepare_for_local_variables (); @}
4660 declarations statements '@}'
4661 | '@{' statements '@}'
4667 Now the first token of the following declaration or statement,
4668 which would in any case tell Bison which rule to use, can still do so.
4670 Another solution is to bury the action inside a nonterminal symbol which
4671 serves as a subroutine:
4676 %empty @{ prepare_for_local_variables (); @}
4682 subroutine '@{' declarations statements '@}'
4683 | subroutine '@{' statements '@}'
4689 Now Bison can execute the action in the rule for @code{subroutine} without
4690 deciding which rule for @code{compound} it will eventually use.
4693 @node Tracking Locations
4694 @section Tracking Locations
4696 @cindex textual location
4697 @cindex location, textual
4699 Though grammar rules and semantic actions are enough to write a fully
4700 functional parser, it can be useful to process some additional information,
4701 especially symbol locations.
4703 The way locations are handled is defined by providing a data type, and
4704 actions to take when rules are matched.
4707 * Location Type:: Specifying a data type for locations.
4708 * Actions and Locations:: Using locations in actions.
4709 * Printing Locations:: Defining how locations are printed.
4710 * Location Default Action:: Defining a general way to compute locations.
4714 @subsection Data Type of Locations
4715 @cindex data type of locations
4716 @cindex default location type
4718 Defining a data type for locations is much simpler than for semantic values,
4719 since all tokens and groupings always use the same type. The location type
4720 is specified using @samp{%define api.location.type}:
4723 %define api.location.type @{location_t@}
4726 This defines, in the C generated code, the @code{YYLTYPE} type name. When
4727 @code{YYLTYPE} is not defined, Bison uses a default structure type with four
4731 typedef struct YYLTYPE
4740 In C, you may also specify the type of locations by defining a macro called
4741 @code{YYLTYPE}, just as you can specify the semantic value type by defining
4742 a @code{YYSTYPE} macro (@pxref{Value Type}). However, rather than using
4743 macros, we recommend the @code{api.value.type} and @code{api.location.type}
4744 @code{%define} variables.
4746 Default locations represent a range in the source file(s), but this is not a
4747 requirement. It could be a single point or just a line number, or even more
4750 When the default location type is used, Bison initializes all these fields
4751 to 1 for @code{yylloc} at the beginning of the parsing. To initialize
4752 @code{yylloc} with a custom location type (or to chose a different
4753 initialization), use the @code{%initial-action} directive. @xref{Initial
4759 The meaning of ``column'' is deliberately left vague since there are several
4760 options, depending on the use cases.
4762 With multibyte input (say UTF-8), simply counting the number of bytes does
4763 not match character positions on the screen. One needs advanced functions
4764 mapping multibyte characters to their visual width (see for instance
4765 Gnulib's @code{mbswidth} and @code{mbsnwidth} functions). Tabulation
4766 characters probably need a dedicated implementation, to match the ``go to
4767 next multiple of 8'' behavior.
4769 However to quote input in error messages, as @command{bison} does:
4773 1.10-12: @derror{error}: invalid identifier: ‘3.8’
4774 1 | %require @derror{3.8}
4780 then byte positions are more handy. So in some cases, tracking both visual
4781 character position @emph{and} byte position is the best option. This is
4782 what @command{bison} does.
4784 @node Actions and Locations
4785 @subsection Actions and Locations
4786 @cindex location actions
4787 @cindex actions, location
4790 @vindex @@@var{name}
4791 @vindex @@[@var{name}]
4793 Actions are not only useful for defining language semantics, but also for
4794 describing the behavior of the output parser with locations.
4796 The most obvious way for building locations of syntactic groupings is very
4797 similar to the way semantic values are computed. In a given rule, several
4798 constructs can be used to access the locations of the elements being matched.
4799 The location of the @var{n}th component of the right hand side is
4800 @code{@@@var{n}}, while the location of the left hand side grouping is
4803 In addition, the named references construct @code{@@@var{name}} and
4804 @code{@@[@var{name}]} may also be used to address the symbol locations.
4805 @xref{Named References}, for more information about using the named
4806 references construct.
4808 Here is a basic example using the default data type for locations:
4816 @@$.first_column = @@1.first_column;
4817 @@$.first_line = @@1.first_line;
4818 @@$.last_column = @@3.last_column;
4819 @@$.last_line = @@3.last_line;
4825 fprintf (stderr, "%d.%d-%d.%d: division by zero",
4826 @@3.first_line, @@3.first_column,
4827 @@3.last_line, @@3.last_column);
4833 As for semantic values, there is a default action for locations that is
4834 run each time a rule is matched. It sets the beginning of @code{@@$} to the
4835 beginning of the first symbol, and the end of @code{@@$} to the end of the
4838 With this default action, the location tracking can be fully automatic. The
4839 example above simply rewrites this way:
4852 fprintf (stderr, "%d.%d-%d.%d: division by zero",
4853 @@3.first_line, @@3.first_column,
4854 @@3.last_line, @@3.last_column);
4861 It is also possible to access the location of the lookahead token, if any,
4862 from a semantic action.
4863 This location is stored in @code{yylloc}.
4864 @xref{Action Features}.
4866 @node Printing Locations
4867 @subsection Printing Locations
4868 @vindex YYLOCATION_PRINT
4870 When using the default location type, the debug traces report the symbols'
4871 location. The generated parser does so using the @code{YYLOCATION_PRINT}
4874 @deffn {Macro} YYLOCATION_PRINT (@var{file}, @var{loc})@code{;}
4875 When traces are enabled, print @var{loc} (of type @samp{YYLTYPE const *}) on
4876 @var{file} (of type @samp{FILE *}). Do nothing when traces are disabled, or
4877 if the location type is user defined.
4880 To get locations in the debug traces with your user-defined location types,
4881 define the @code{YYLOCATION_PRINT} macro. For instance:
4884 #define YYLOCATION_PRINT location_print
4889 @node Location Default Action
4890 @subsection Default Action for Locations
4891 @vindex YYLLOC_DEFAULT
4892 @cindex GLR parsers and @code{YYLLOC_DEFAULT}
4894 Actually, actions are not the best place to compute locations. Since
4895 locations are much more general than semantic values, there is room in
4896 the output parser to redefine the default action to take for each
4897 rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is
4898 matched, before the associated action is run. It is also invoked
4899 while processing a syntax error, to compute the error's location.
4900 Before reporting an unresolvable syntactic ambiguity, a GLR
4901 parser invokes @code{YYLLOC_DEFAULT} recursively to compute the location
4904 Most of the time, this macro is general enough to suppress location
4905 dedicated code from semantic actions.
4907 The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is
4908 the location of the grouping (the result of the computation). When a
4909 rule is matched, the second parameter identifies locations of
4910 all right hand side elements of the rule being matched, and the third
4911 parameter is the size of the rule's right hand side.
4912 When a GLR parser reports an ambiguity, which of multiple candidate
4913 right hand sides it passes to @code{YYLLOC_DEFAULT} is undefined.
4914 When processing a syntax error, the second parameter identifies locations
4915 of the symbols that were discarded during error processing, and the third
4916 parameter is the number of discarded symbols.
4918 By default, @code{YYLLOC_DEFAULT} is defined this way:
4922 # define YYLLOC_DEFAULT(Cur, Rhs, N) \
4926 (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \
4927 (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \
4928 (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \
4929 (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \
4933 (Cur).first_line = (Cur).last_line = \
4934 YYRHSLOC(Rhs, 0).last_line; \
4935 (Cur).first_column = (Cur).last_column = \
4936 YYRHSLOC(Rhs, 0).last_column; \
4943 where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol
4944 in @var{rhs} when @var{k} is positive, and the location of the symbol
4945 just before the reduction when @var{k} and @var{n} are both zero.
4947 When defining @code{YYLLOC_DEFAULT}, you should consider that:
4951 All arguments are free of side-effects. However, only the first one (the
4952 result) should be modified by @code{YYLLOC_DEFAULT}.
4955 For consistency with semantic actions, valid indexes within the
4956 right hand side range from 1 to @var{n}. When @var{n} is zero, only 0 is a
4957 valid index, and it refers to the symbol just before the reduction.
4958 During error processing @var{n} is always positive.
4961 Your macro should parenthesize its arguments, if need be, since the
4962 actual arguments may not be surrounded by parentheses. Also, your
4963 macro should expand to something that can be used as a single
4964 statement when it is followed by a semicolon.
4967 @node Named References
4968 @section Named References
4969 @cindex named references
4971 As described in the preceding sections, the traditional way to refer to any
4972 semantic value or location is a @dfn{positional reference}, which takes the
4973 form @code{$@var{n}}, @code{$$}, @code{@@@var{n}}, and @code{@@$}. However,
4974 such a reference is not very descriptive. Moreover, if you later decide to
4975 insert or remove symbols in the right-hand side of a grammar rule, the need
4976 to renumber such references can be tedious and error-prone.
4978 To avoid these issues, you can also refer to a semantic value or location
4979 using a @dfn{named reference}. First of all, original symbol names may be
4980 used as named references. For example:
4984 invocation: op '(' args ')'
4985 @{ $invocation = new_invocation ($op, $args, @@invocation); @}
4990 Positional and named references can be mixed arbitrarily. For example:
4994 invocation: op '(' args ')'
4995 @{ $$ = new_invocation ($op, $args, @@$); @}
5000 However, sometimes regular symbol names are not sufficient due to
5006 @{ $exp = $exp / $exp; @} // $exp is ambiguous.
5009 @{ $$ = $1 / $exp; @} // One usage is ambiguous.
5012 @{ $$ = $1 / $3; @} // No error.
5017 When ambiguity occurs, explicitly declared names may be used for values and
5018 locations. Explicit names are declared as a bracketed name after a symbol
5019 appearance in rule definitions. For example:
5022 exp[result]: exp[left] '/' exp[right]
5023 @{ $result = $left / $right; @}
5027 Like symbol names (@pxref{Symbols}), reference names can contain letters,
5028 underscores, periods, and non-initial digits and dashes. In bracketed
5029 reference names, leading and trailing blanks and comments are ignored:
5030 @samp{[ name ]} and @samp{[/* A */ name /* for references. */]} are
5031 equivalent to @samp{[name]}.
5033 In order to access a semantic value generated by a midrule action, an
5034 explicit name may also be declared by putting a bracketed name after the
5035 closing brace of the midrule action code:
5038 exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right]
5039 @{ $res = $left + $right; @}
5045 In references, in order to specify names containing dots and dashes, an explicit
5046 bracketed syntax @code{$[name]} and @code{@@[name]} must be used:
5049 if-stmt: "if" '(' expr ')' "then" then.stmt ';'
5050 @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @}
5054 It often happens that named references are followed by a dot, dash or other
5055 C punctuation marks and operators. By default, Bison will read
5056 @samp{$name.suffix} as a reference to symbol value @code{$name} followed by
5057 @samp{.suffix}, i.e., an access to the @code{suffix} field of the semantic
5058 value. In order to force Bison to recognize @samp{name.suffix} in its
5059 entirety as the name of a semantic value, the bracketed syntax
5060 @samp{$[name.suffix]} must be used.
5063 @section Bison Declarations
5064 @cindex declarations, Bison
5065 @cindex Bison declarations
5067 The @dfn{Bison declarations} section of a Bison grammar defines the symbols
5068 used in formulating the grammar and the data types of semantic values.
5071 All token kind names (but not single-character literal tokens such as
5072 @code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
5073 declared if you need to specify which data type to use for the semantic
5074 value (@pxref{Multiple Types}).
5076 The first rule in the grammar file also specifies the start symbol, by
5077 default. If you want some other symbol to be the start symbol, you
5078 must declare it explicitly (@pxref{Language and Grammar}).
5081 * Require Decl:: Requiring a Bison version.
5082 * Token Decl:: Declaring terminal symbols.
5083 * Precedence Decl:: Declaring terminals with precedence and associativity.
5084 * Type Decl:: Declaring the choice of type for a nonterminal symbol.
5085 * Symbol Decls:: Summary of the Syntax of Symbol Declarations.
5086 * Initial Action Decl:: Code run before parsing starts.
5087 * Destructor Decl:: Declaring how symbols are freed.
5088 * Printer Decl:: Declaring how symbol values are displayed.
5089 * Expect Decl:: Suppressing warnings about parsing conflicts.
5090 * Start Decl:: Specifying the start symbol.
5091 * Pure Decl:: Requesting a reentrant parser.
5092 * Push Decl:: Requesting a push parser.
5093 * Decl Summary:: Table of all Bison declarations.
5094 * %define Summary:: Defining variables to adjust Bison's behavior.
5095 * %code Summary:: Inserting code into the parser source.
5099 @subsection Require a Version of Bison
5100 @cindex version requirement
5101 @cindex requiring a version of Bison
5104 You may require the minimum version of Bison to process the grammar. If
5105 the requirement is not met, @command{bison} exits with an error (exit
5109 %require "@var{version}"
5112 Some deprecated behaviors are disabled for some required @var{version}:
5114 @item @code{"3.2"} (or better)
5115 The C++ deprecated files @file{position.hh} and @file{stack.hh} are no
5121 @subsection Token Kind Names
5122 @cindex declaring token kind names
5123 @cindex token kind names, declaring
5124 @cindex declaring literal string tokens
5127 The basic way to declare a token kind name (terminal symbol) is as follows:
5133 Bison will convert this into a definition in the parser, so that the
5134 function @code{yylex} (if it is in this file) can use the name @var{name} to
5135 stand for this token kind's code.
5137 Alternatively, you can use @code{%left}, @code{%right}, @code{%precedence},
5138 or @code{%nonassoc} instead of @code{%token}, if you wish to specify
5139 associativity and precedence. @xref{Precedence Decl}. However, for
5140 clarity, we recommend to use these directives only to declare associativity
5141 and precedence, and not to add string aliases, semantic types, etc.
5143 You can explicitly specify the numeric code for a token kind by appending a
5144 nonnegative decimal or hexadecimal integer value in the field immediately
5145 following the token name:
5149 %token XNUM 0x12d // a GNU extension
5153 It is generally best, however, to let Bison choose the numeric codes for all
5154 token kinds. Bison will automatically select codes that don't conflict with
5155 each other or with normal characters.
5157 In the event that the stack type is a union, you must augment the
5158 @code{%token} or other token declaration to include the data type
5159 alternative delimited by angle-brackets (@pxref{Multiple Types}).
5165 %union @{ /* define stack type */
5169 %token <val> NUM /* define token NUM and its type */
5173 You can associate a literal string token with a token kind name by writing
5174 the literal string at the end of a @code{%token} declaration which declares
5175 the name. For example:
5182 For example, a grammar for the C language might specify these names with
5183 equivalent literal string tokens:
5186 %token <operator> OR "||"
5187 %token <operator> LE 134 "<="
5192 Once you equate the literal string and the token kind name, you can use them
5193 interchangeably in further declarations or the grammar rules. The
5194 @code{yylex} function can use the token name or the literal string to obtain
5195 the token kind code (@pxref{Calling Convention}).
5197 String aliases allow for better error messages using the literal strings
5198 instead of the token names, such as @samp{syntax error, unexpected ||,
5199 expecting number or (} rather than @samp{syntax error, unexpected OR,
5200 expecting NUM or LPAREN}.
5202 String aliases may also be marked for internationalization (@pxref{Token
5210 '\n' _("end of line")
5216 would produce in French @samp{erreur de syntaxe, || inattendu, attendait
5217 nombre ou (} rather than @samp{erreur de syntaxe, || inattendu, attendait
5220 @node Precedence Decl
5221 @subsection Operator Precedence
5222 @cindex precedence declarations
5223 @cindex declaring operator precedence
5224 @cindex operator precedence, declaring
5226 Use the @code{%left}, @code{%right}, @code{%nonassoc}, or @code{%precedence}
5227 declaration to declare a token and specify its precedence and associativity,
5228 all at once. These are called @dfn{precedence declarations}.
5229 @xref{Precedence}, for general information on operator
5232 The syntax of a precedence declaration is nearly the same as that of
5233 @code{%token}: either
5236 %left @var{symbols}@dots{}
5243 %left <@var{type}> @var{symbols}@dots{}
5246 And indeed any of these declarations serves the purposes of @code{%token}.
5247 But in addition, they specify the associativity and relative precedence for
5248 all the @var{symbols}:
5252 The associativity of an operator @var{op} determines how repeated uses of
5253 the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op} @var{z}}
5254 is parsed by grouping @var{x} with @var{y} first or by grouping @var{y} with
5255 @var{z} first. @code{%left} specifies left-associativity (grouping @var{x}
5256 with @var{y} first) and @code{%right} specifies right-associativity
5257 (grouping @var{y} with @var{z} first). @code{%nonassoc} specifies no
5258 associativity, which means that @samp{@var{x} @var{op} @var{y} @var{op}
5259 @var{z}} is considered a syntax error.
5261 @code{%precedence} gives only precedence to the @var{symbols}, and defines
5262 no associativity at all. Use this to define precedence only, and leave any
5263 potential conflict due to associativity enabled.
5266 The precedence of an operator determines how it nests with other operators.
5267 All the tokens declared in a single precedence declaration have equal
5268 precedence and nest together according to their associativity. When two
5269 tokens declared in different precedence declarations associate, the one
5270 declared later has the higher precedence and is grouped first.
5273 For backward compatibility, there is a confusing difference between the
5274 argument lists of @code{%token} and precedence declarations. Only a
5275 @code{%token} can associate a literal string with a token kind name. A
5276 precedence declaration always interprets a literal string as a reference to
5277 a separate token. For example:
5280 %left OR "<=" // Does not declare an alias.
5281 %left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=".
5285 @subsection Nonterminal Symbols
5286 @cindex declaring value types, nonterminals
5287 @cindex value types, nonterminals, declaring
5292 When you use @code{%union} to specify multiple value types, you must
5293 declare the value type of each nonterminal symbol for which values are
5294 used. This is done with a @code{%type} declaration, like this:
5297 %type <@var{type}> @var{nonterminal}@dots{}
5301 Here @var{nonterminal} is the name of a nonterminal symbol, and @var{type}
5302 is the name given in the @code{%union} to the alternative that you want
5303 (@pxref{Union Decl}). You can give any number of nonterminal symbols in the
5304 same @code{%type} declaration, if they have the same value type. Use spaces
5305 to separate the symbol names.
5307 While POSIX Yacc allows @code{%type} only for nonterminals, Bison accepts
5308 that this directive be also applied to terminal symbols. To declare
5309 exclusively nonterminal symbols, use the safer @code{%nterm}:
5312 %nterm <@var{type}> @var{nonterminal}@dots{}
5317 @subsection Syntax of Symbol Declarations
5323 The syntax of the various directives to declare symbols is as follows.
5327 %token @var{tag}? ( (@var{id}|@var{char}) @var{number}? @var{string}? )+ \
5328 ( @var{tag} ( (@var{id}|@var{char}) @var{number}? @var{string}? )+ )*
5331 %left @var{tag}? ( (@var{id}|@var{char}|@var{string}) @var{number}? )+ \
5332 ( @var{tag} ( (@var{id}|@var{char}|@var{string}) @var{number}? )+ )*
5335 %type @var{tag}? (@var{id}|@var{char}|@var{string})+ \
5336 ( @var{tag} (@var{id}|@var{char}|@var{string})+ )*
5339 %nterm @var{tag}? @var{id}+ \
5340 ( @var{tag} @var{id}+ )*
5345 where @var{tag} denotes a type tag such as @samp{<ival>}, @var{id} denotes
5346 an identifier such as @samp{NUM} or @samp{exp}, @var{number} a decimal or hexadecimal
5347 integer such as @samp{300} or @samp{0x12d}, @var{char} a character literal
5348 such as @samp{'+'}, and @var{string} a string literal such as
5349 @samp{"number"}. The postfix quantifiers are @samp{?} (zero or one),
5350 @samp{*} (zero or more) and @samp{+} (one or more).
5352 The directives @code{%precedence}, @code{%right} and @code{%nonassoc} behave
5355 @node Initial Action Decl
5356 @subsection Performing Actions before Parsing
5357 @findex %initial-action
5359 Sometimes your parser needs to perform some initializations before parsing.
5360 The @code{%initial-action} directive allows for such arbitrary code.
5362 @deffn {Directive} %initial-action @{ @var{code} @}
5363 @findex %initial-action
5364 Declare that the braced @var{code} must be invoked before parsing each time
5365 @code{yyparse} is called. The @var{code} may use @code{$$} (or
5366 @code{$<@var{tag}>$}) and @code{@@$} --- initial value and location of the
5367 lookahead --- and the @code{%parse-param}.
5370 For instance, if your locations use a file name, you may use
5373 %parse-param @{ char const *file_name @};
5376 @@$.initialize (file_name);
5381 @node Destructor Decl
5382 @subsection Freeing Discarded Symbols
5383 @cindex freeing discarded symbols
5387 During error recovery (@pxref{Error Recovery}), symbols already pushed on
5388 the stack and tokens coming from the rest of the file are discarded until
5389 the parser falls on its feet. If the parser runs out of memory, or if it
5390 returns via @code{YYABORT}, @code{YYACCEPT} or @code{YYNOMEM}, all the
5391 symbols on the stack must be discarded. Even if the parser succeeds, it
5392 must discard the start symbol.
5394 When discarded symbols convey heap based information, this memory is
5395 lost. While this behavior can be tolerable for batch parsers, such as
5396 in traditional compilers, it is unacceptable for programs like shells or
5397 protocol implementations that may parse and execute indefinitely.
5399 The @code{%destructor} directive defines code that is called when a
5400 symbol is automatically discarded.
5402 @deffn {Directive} %destructor @{ @var{code} @} @var{symbols}
5404 Invoke the braced @var{code} whenever the parser discards one of the
5405 @var{symbols}. Within @var{code}, @code{$$} (or @code{$<@var{tag}>$})
5406 designates the semantic value associated with the discarded symbol, and
5407 @code{@@$} designates its location. The additional parser parameters are
5408 also available (@pxref{Parser Function}).
5410 When a symbol is listed among @var{symbols}, its @code{%destructor} is called a
5411 per-symbol @code{%destructor}.
5412 You may also define a per-type @code{%destructor} by listing a semantic type
5413 tag among @var{symbols}.
5414 In that case, the parser will invoke this @var{code} whenever it discards any
5415 grammar symbol that has that semantic type tag unless that symbol has its own
5416 per-symbol @code{%destructor}.
5418 Finally, you can define two different kinds of default @code{%destructor}s.
5419 You can place each of @code{<*>} and @code{<>} in the @var{symbols} list of
5420 exactly one @code{%destructor} declaration in your grammar file.
5421 The parser will invoke the @var{code} associated with one of these whenever it
5422 discards any user-defined grammar symbol that has no per-symbol and no per-type
5424 The parser uses the @var{code} for @code{<*>} in the case of such a grammar
5425 symbol for which you have formally declared a semantic type tag (@code{%token},
5426 @code{%nterm}, and @code{%type}
5427 count as such a declaration, but @code{$<tag>$} does not).
5428 The parser uses the @var{code} for @code{<>} in the case of such a grammar
5429 symbol that has no declared semantic type tag.
5436 %union @{ char *string; @}
5437 %token <string> STRING1 STRING2
5438 %nterm <string> string1 string2
5439 %union @{ char character; @}
5440 %token <character> CHR
5441 %nterm <character> chr
5444 %destructor @{ @} <character>
5445 %destructor @{ free ($$); @} <*>
5446 %destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
5447 %destructor @{ printf ("Discarding tagless symbol.\n"); @} <>
5451 guarantees that, when the parser discards any user-defined symbol that has a
5452 semantic type tag other than @code{<character>}, it passes its semantic value
5453 to @code{free} by default.
5454 However, when the parser discards a @code{STRING1} or a @code{string1},
5455 it uses the third @code{%destructor}, which frees it and
5456 prints its line number to @code{stdout} (@code{free} is invoked only once).
5457 Finally, the parser merely prints a message whenever it discards any symbol,
5458 such as @code{TAGLESS}, that has no semantic type tag.
5460 A Bison-generated parser invokes the default @code{%destructor}s only for
5461 user-defined as opposed to Bison-defined symbols.
5462 For example, the parser will not invoke either kind of default
5463 @code{%destructor} for the special Bison-defined symbols @code{$accept},
5464 @code{$undefined}, or @code{$end} (@pxref{Table of Symbols}),
5465 none of which you can reference in your grammar.
5466 It also will not invoke either for the @code{error} token (@pxref{Table of
5467 Symbols}), which is always defined by Bison regardless of whether you
5468 reference it in your grammar.
5469 However, it may invoke one of them for the end token (token 0) if you
5470 redefine it from @code{$end} to, for example, @code{END}:
5476 @cindex actions in midrule
5477 @cindex midrule actions
5478 Finally, Bison will never invoke a @code{%destructor} for an unreferenced
5479 midrule semantic value (@pxref{Midrule Actions}).
5480 That is, Bison does not consider a midrule to have a semantic value if you
5481 do not reference @code{$$} in the midrule's action or @code{$@var{n}}
5482 (where @var{n} is the right-hand side symbol position of the midrule) in
5483 any later action in that rule. However, if you do reference either, the
5484 Bison-generated parser will invoke the @code{<>} @code{%destructor} whenever
5485 it discards the midrule symbol.
5489 In the future, it may be possible to redefine the @code{error} token as a
5490 nonterminal that captures the discarded symbols.
5491 In that case, the parser will invoke the default destructor for it as well.
5496 @cindex discarded symbols
5497 @dfn{Discarded symbols} are the following:
5501 stacked symbols popped during the first phase of error recovery,
5503 incoming terminals during the second phase of error recovery,
5505 the current lookahead and the entire stack (except the current
5506 right-hand side symbols) when the parser returns immediately, and
5508 the current lookahead and the entire stack (including the current right-hand
5509 side symbols) when the C++ parser (@file{lalr1.cc}) catches an exception in
5512 the start symbol, when the parser succeeds.
5515 The parser can @dfn{return immediately} because of an explicit call to
5516 @code{YYABORT}, @code{YYACCEPT} or @code{YYNOMEM}, or failed error recovery,
5517 or memory exhaustion.
5519 Right-hand side symbols of a rule that explicitly triggers a syntax
5520 error via @code{YYERROR} are not discarded automatically. As a rule
5521 of thumb, destructors are invoked only when user actions cannot manage
5525 @subsection Printing Semantic Values
5526 @cindex printing semantic values
5530 When run-time traces are enabled (@pxref{Tracing}),
5531 the parser reports its actions, such as reductions. When a symbol involved
5532 in an action is reported, only its kind is displayed, as the parser cannot
5533 know how semantic values should be formatted.
5535 The @code{%printer} directive defines code that is called when a symbol is
5536 reported. Its syntax is the same as @code{%destructor} (@pxref{Destructor
5539 @deffn {Directive} %printer @{ @var{code} @} @var{symbols}
5542 @c This is the same text as for %destructor.
5543 Invoke the braced @var{code} whenever the parser displays one of the
5544 @var{symbols}. Within @var{code}, @code{yyo} denotes the output stream (a
5545 @code{FILE*} in C, an @code{std::ostream&} in C++, and @code{stdout} in D), @code{$$} (or
5546 @code{$<@var{tag}>$}) designates the semantic value associated with the
5547 symbol, and @code{@@$} its location. The additional parser parameters are
5548 also available (@pxref{Parser Function}).
5550 The @var{symbols} are defined as for @code{%destructor} (@pxref{Destructor
5551 Decl}.): they can be per-type (e.g.,
5552 @samp{<ival>}), per-symbol (e.g., @samp{exp}, @samp{NUM}, @samp{"float"}),
5553 typed per-default (i.e., @samp{<*>}, or untyped per-default (i.e.,
5561 %union @{ char *string; @}
5562 %token <string> STRING1 STRING2
5563 %nterm <string> string1 string2
5564 %union @{ char character; @}
5565 %token <character> CHR
5566 %nterm <character> chr
5569 %printer @{ fprintf (yyo, "'%c'", $$); @} <character>
5570 %printer @{ fprintf (yyo, "&%p", $$); @} <*>
5571 %printer @{ fprintf (yyo, "\"%s\"", $$); @} STRING1 string1
5572 %printer @{ fprintf (yyo, "<>"); @} <>
5576 guarantees that, when the parser print any symbol that has a semantic type
5577 tag other than @code{<character>}, it display the address of the semantic
5578 value by default. However, when the parser displays a @code{STRING1} or a
5579 @code{string1}, it formats it as a string in double quotes. It performs
5580 only the second @code{%printer} in this case, so it prints only once.
5581 Finally, the parser print @samp{<>} for any symbol, such as @code{TAGLESS},
5582 that has no semantic type tag. @xref{Mfcalc Traces}, for a complete example.
5587 @subsection Suppressing Conflict Warnings
5588 @cindex suppressing conflict warnings
5589 @cindex preventing warnings about conflicts
5590 @cindex warnings, preventing
5591 @cindex conflicts, suppressing warnings of
5595 Bison normally warns if there are any conflicts in the grammar
5596 (@pxref{Shift/Reduce}), but most real grammars
5597 have harmless shift/reduce conflicts which are resolved in a predictable
5598 way and would be difficult to eliminate. It is desirable to suppress
5599 the warning about these conflicts unless the number of conflicts
5600 changes. You can do this with the @code{%expect} declaration.
5602 The declaration looks like this:
5608 Here @var{n} is a decimal integer. The declaration says there should
5609 be @var{n} shift/reduce conflicts and no reduce/reduce conflicts.
5610 Bison reports an error if the number of shift/reduce conflicts differs
5611 from @var{n}, or if there are any reduce/reduce conflicts.
5613 For deterministic parsers, reduce/reduce conflicts are more
5614 serious, and should be eliminated entirely. Bison will always report
5615 reduce/reduce conflicts for these parsers. With GLR
5616 parsers, however, both kinds of conflicts are routine; otherwise,
5617 there would be no need to use GLR parsing. Therefore, it is
5618 also possible to specify an expected number of reduce/reduce conflicts
5619 in GLR parsers, using the declaration:
5625 You may wish to be more specific in your
5626 specification of expected conflicts. To this end, you can also attach
5627 @code{%expect} and @code{%expect-rr} modifiers to individual rules.
5628 The interpretation of these modifiers differs from their use as
5629 declarations. When attached to rules, they indicate the number of states
5630 in which the rule is involved in a conflict. You will need to consult the
5631 output resulting from @option{-v} to determine appropriate numbers to use.
5632 For example, for the following grammar fragment, the first rule for
5633 @code{empty_dims} appears in two states in which the @samp{[} token is a
5634 lookahead. Having determined that, you can document this fact with an
5635 @code{%expect} modifier as follows:
5645 | empty_dims '[' ']'
5649 Mid-rule actions generate implicit rules that are also subject to conflicts
5650 (@pxref{Midrule Conflicts}). To attach
5651 an @code{%expect} or @code{%expect-rr} annotation to an implicit
5652 mid-rule action's rule, put it before the action. For example,
5661 "condition" %expect-rr 1 @{ value_mode(); @} '(' exprs ')'
5662 | "condition" %expect-rr 1 @{ class_mode(); @} '(' types ')'
5667 Here, the appropriate mid-rule action will not be determined until after
5668 the @samp{(} token is shifted. Thus,
5669 the two actions will clash with each other, and we should expect one
5670 reduce/reduce conflict for each.
5672 In general, using @code{%expect} involves these steps:
5676 Compile your grammar without @code{%expect}. Use the @option{-v} option
5677 to get a verbose list of where the conflicts occur. Bison will also
5678 print the number of conflicts.
5681 Check each of the conflicts to make sure that Bison's default
5682 resolution is what you really want. If not, rewrite the grammar and
5683 go back to the beginning.
5686 Add an @code{%expect} declaration, copying the number @var{n} from the
5687 number that Bison printed. With GLR parsers, add an
5688 @code{%expect-rr} declaration as well.
5691 Optionally, count up the number of states in which one or more
5692 conflicted reductions for particular rules appear and add these numbers
5693 to the affected rules as @code{%expect-rr} or @code{%expect} modifiers
5694 as appropriate. Rules that are in conflict appear in the output listing
5695 surrounded by square brackets or, in the case of reduce/reduce conflicts,
5696 as reductions having the same lookahead symbol as a square-bracketed
5697 reduction in the same state.
5700 Now Bison will report an error if you introduce an unexpected conflict,
5701 but will keep silent otherwise.
5704 @subsection The Start-Symbol
5705 @cindex declaring the start symbol
5706 @cindex start symbol, declaring
5707 @cindex default start symbol
5710 Bison assumes by default that the start symbol for the grammar is the first
5711 nonterminal specified in the grammar specification section. The programmer
5712 may override this restriction with the @code{%start} declaration as follows:
5719 @subsection A Pure (Reentrant) Parser
5720 @cindex reentrant parser
5722 @findex %define api.pure
5724 A @dfn{reentrant} program is one which does not alter in the course of
5725 execution; in other words, it consists entirely of @dfn{pure} (read-only)
5726 code. Reentrancy is important whenever asynchronous execution is possible;
5727 for example, a nonreentrant program may not be safe to call from a signal
5728 handler. In systems with multiple threads of control, a nonreentrant
5729 program must be called only within interlocks.
5731 Normally, Bison generates a parser which is not reentrant. This is
5732 suitable for most uses, and it permits compatibility with Yacc. (The
5733 standard Yacc interfaces are inherently nonreentrant, because they use
5734 statically allocated variables for communication with @code{yylex},
5735 including @code{yylval} and @code{yylloc}.)
5737 Alternatively, you can generate a pure, reentrant parser. The Bison
5738 declaration @samp{%define api.pure} says that you want the parser to be
5739 reentrant. It looks like this:
5742 %define api.pure full
5745 The result is that the communication variables @code{yylval} and
5746 @code{yylloc} become local variables in @code{yyparse}, and a different
5747 calling convention is used for the lexical analyzer function @code{yylex}.
5748 @xref{Pure Calling}, for the details of this. The variable @code{yynerrs}
5749 becomes local in @code{yyparse} in pull mode but it becomes a member of
5750 @code{yypstate} in push mode. (@pxref{Error Reporting Function}). The
5751 convention for calling @code{yyparse} itself is unchanged.
5753 Whether the parser is pure has nothing to do with the grammar rules.
5754 You can generate either a pure parser or a nonreentrant parser from any
5758 @subsection A Push Parser
5761 @findex %define api.push-pull
5763 A pull parser is called once and it takes control until all its input
5764 is completely parsed. A push parser, on the other hand, is called
5765 each time a new token is made available.
5767 A push parser is typically useful when the parser is part of a
5768 main event loop in the client's application. This is typically
5769 a requirement of a GUI, when the main event loop needs to be triggered
5770 within a certain time period.
5772 Normally, Bison generates a pull parser.
5773 The following Bison declaration says that you want the parser to be a push
5774 parser (@pxref{%define Summary}):
5777 %define api.push-pull push
5780 In almost all cases, you want to ensure that your push parser is also
5781 a pure parser (@pxref{Pure Decl}). The only
5782 time you should create an impure push parser is to have backwards
5783 compatibility with the impure Yacc pull mode interface. Unless you know
5784 what you are doing, your declarations should look like this:
5787 %define api.pure full
5788 %define api.push-pull push
5791 There is a major notable functional difference between the pure push parser
5792 and the impure push parser. It is acceptable for a pure push parser to have
5793 many parser instances, of the same type of parser, in memory at the same time.
5794 An impure push parser should only use one parser at a time.
5796 When a push parser is selected, Bison will generate some new symbols in
5797 the generated parser. @code{yypstate} is a structure that the generated
5798 parser uses to store the parser's state. @code{yypstate_new} is the
5799 function that will create a new parser instance. @code{yypstate_delete}
5800 will free the resources associated with the corresponding parser instance.
5801 Finally, @code{yypush_parse} is the function that should be called whenever a
5802 token is available to provide the parser. A trivial example
5803 of using a pure push parser would look like this:
5807 yypstate *ps = yypstate_new ();
5809 status = yypush_parse (ps, yylex (), NULL);
5810 @} while (status == YYPUSH_MORE);
5811 yypstate_delete (ps);
5814 If the user decided to use an impure push parser, a few things about the
5815 generated parser will change. The @code{yychar} variable becomes a global
5816 variable instead of a local one in the @code{yypush_parse} function. For
5817 this reason, the signature of the @code{yypush_parse} function is changed to
5818 remove the token as a parameter. A nonreentrant push parser example would
5819 thus look like this:
5824 yypstate *ps = yypstate_new ();
5827 status = yypush_parse (ps);
5828 @} while (status == YYPUSH_MORE);
5829 yypstate_delete (ps);
5832 That's it. Notice the next token is put into the global variable @code{yychar}
5833 for use by the next invocation of the @code{yypush_parse} function.
5835 Bison also supports both the push parser interface along with the pull parser
5836 interface in the same generated parser. In order to get this functionality,
5837 you should replace the @samp{%define api.push-pull push} declaration with the
5838 @samp{%define api.push-pull both} declaration. Doing this will create all of
5839 the symbols mentioned earlier along with the two extra symbols, @code{yyparse}
5840 and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally
5841 would be used. However, the user should note that it is implemented in the
5842 generated parser by calling @code{yypull_parse}.
5843 This makes the @code{yyparse} function that is generated with the
5844 @samp{%define api.push-pull both} declaration slower than the normal
5845 @code{yyparse} function. If the user
5846 calls the @code{yypull_parse} function it will parse the rest of the input
5847 stream. It is possible to @code{yypush_parse} tokens to select a subgrammar
5848 and then @code{yypull_parse} the rest of the input stream. If you would like
5849 to switch back and forth between between parsing styles, you would have to
5850 write your own @code{yypull_parse} function that knows when to quit looking
5851 for input. An example of using the @code{yypull_parse} function would look
5855 yypstate *ps = yypstate_new ();
5856 yypull_parse (ps); /* Will call the lexer */
5857 yypstate_delete (ps);
5860 Adding the @samp{%define api.pure} declaration does exactly the same thing to
5861 the generated parser with @samp{%define api.push-pull both} as it did for
5862 @samp{%define api.push-pull push}.
5865 @subsection Bison Declaration Summary
5866 @cindex Bison declaration summary
5867 @cindex declaration summary
5868 @cindex summary, Bison declaration
5870 Here is a summary of the declarations used to define a grammar:
5872 @deffn {Directive} %union
5873 Declare the collection of data types that semantic values may have
5874 (@pxref{Union Decl}).
5877 @deffn {Directive} %token
5878 Declare a terminal symbol (token kind name) with no precedence
5879 or associativity specified (@pxref{Token Decl}).
5882 @deffn {Directive} %right
5883 Declare a terminal symbol (token kind name) that is right-associative
5884 (@pxref{Precedence Decl}).
5887 @deffn {Directive} %left
5888 Declare a terminal symbol (token kind name) that is left-associative
5889 (@pxref{Precedence Decl}).
5892 @deffn {Directive} %nonassoc
5893 Declare a terminal symbol (token kind name) that is nonassociative
5894 (@pxref{Precedence Decl}).
5895 Using it in a way that would be associative is a syntax error.
5899 @deffn {Directive} %default-prec
5900 Assign a precedence to rules lacking an explicit @code{%prec} modifier
5901 (@pxref{Contextual Precedence}).
5905 @deffn {Directive} %nterm
5906 Declare the type of semantic values for a nonterminal symbol (@pxref{Type
5910 @deffn {Directive} %type
5911 Declare the type of semantic values for a symbol (@pxref{Type Decl}).
5914 @deffn {Directive} %start
5915 Specify the grammar's start symbol (@pxref{Start Decl}).
5918 @deffn {Directive} %expect
5919 Declare the expected number of shift/reduce conflicts, either overall or
5921 (@pxref{Expect Decl}).
5924 @deffn {Directive} %expect-rr
5925 Declare the expected number of reduce/reduce conflicts, either overall or
5927 (@pxref{Expect Decl}).
5933 In order to change the behavior of @command{bison}, use the following
5936 @deffn {Directive} %code @{@var{code}@}
5937 @deffnx {Directive} %code @var{qualifier} @{@var{code}@}
5939 Insert @var{code} verbatim into the output parser source at the
5940 default location or at the location specified by @var{qualifier}.
5941 @xref{%code Summary}.
5944 @deffn {Directive} %debug
5945 Instrument the parser for traces. Obsoleted by @samp{%define
5950 @deffn {Directive} %define @var{variable}
5951 @deffnx {Directive} %define @var{variable} @var{value}
5952 @deffnx {Directive} %define @var{variable} @{@var{value}@}
5953 @deffnx {Directive} %define @var{variable} "@var{value}"
5954 Define a variable to adjust Bison's behavior. @xref{%define Summary}.
5957 @deffn {Directive} %defines
5958 @deffnx {Directive} %defines @var{defines-file}
5959 Historical name for @code{%header}. @xref{%header,,@code{%header}}.
5962 @deffn {Directive} %destructor
5963 Specify how the parser should reclaim the memory associated to
5964 discarded symbols. @xref{Destructor Decl}.
5967 @deffn {Directive} %file-prefix "@var{prefix}"
5968 Specify a prefix to use for all Bison output file names. The names
5969 are chosen as if the grammar file were named @file{@var{prefix}.y}.
5973 @deffn {Directive} %header
5974 Write a parser header file containing definitions for the token kind names
5975 defined in the grammar as well as a few other declarations. If the parser
5976 implementation file is named @file{@var{name}.c} then the parser header file
5977 is named @file{@var{name}.h}.
5979 For C parsers, the parser header file declares @code{YYSTYPE} unless
5980 @code{YYSTYPE} is already defined as a macro or you have used a
5981 @code{<@var{type}>} tag without using @code{%union}. Therefore, if you are
5982 using a @code{%union} (@pxref{Multiple Types}) with components that require
5983 other definitions, or if you have defined a @code{YYSTYPE} macro or type
5984 definition (@pxref{Value Type}), you need to arrange for these definitions
5985 to be propagated to all modules, e.g., by putting them in a prerequisite
5986 header that is included both by your parser and by any other module that
5987 needs @code{YYSTYPE}.
5989 Unless your parser is pure, the parser header file declares
5990 @code{yylval} as an external variable. @xref{Pure Decl}.
5992 If you have also used locations, the parser header file declares
5993 @code{YYLTYPE} and @code{yylloc} using a protocol similar to that of the
5994 @code{YYSTYPE} macro and @code{yylval}. @xref{Tracking Locations}.
5996 This parser header file is normally essential if you wish to put the
5997 definition of @code{yylex} in a separate source file, because
5998 @code{yylex} typically needs to be able to refer to the
5999 above-mentioned declarations and to the token kind codes. @xref{Token
6002 @findex %code requires
6003 @findex %code provides
6004 If you have declared @code{%code requires} or @code{%code provides}, the output
6005 header also contains their code.
6006 @xref{%code Summary}.
6008 @cindex Header guard
6009 The generated header is protected against multiple inclusions with a C
6010 preprocessor guard: @samp{YY_@var{PREFIX}_@var{FILE}_INCLUDED}, where
6011 @var{PREFIX} and @var{FILE} are the prefix (@pxref{Multiple Parsers}) and
6012 generated file name turned uppercase, with each series of non alphanumerical
6013 characters converted to a single underscore.
6015 For instance with @samp{%define api.prefix @{calc@}} and @samp{%header
6016 "lib/parse.h"}, the header will be guarded as follows.
6018 #ifndef YY_CALC_LIB_PARSE_H_INCLUDED
6019 # define YY_CALC_LIB_PARSE_H_INCLUDED
6021 #endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */
6024 Introduced in Bison 3.8.
6027 @deffn {Directive} %header @var{header-file}
6028 Same as above, but save in the file @file{@var{header-file}}.
6031 @deffn {Directive} %language "@var{language}"
6032 Specify the programming language for the generated parser. Currently
6033 supported languages include C, C++, D and Java. @var{language} is
6037 @deffn {Directive} %locations
6038 Generate the code processing the locations (@pxref{Action Features}). This
6039 mode is enabled as soon as the grammar uses the special @samp{@@@var{n}}
6040 tokens, but if your grammar does not use it, using @samp{%locations} allows
6041 for more accurate syntax error messages.
6044 @deffn {Directive} %name-prefix "@var{prefix}"
6045 Obsoleted by @samp{%define api.prefix @{@var{prefix}@}}. @xref{Multiple
6046 Parsers}. For C++ parsers, see the
6047 @samp{%define api.namespace} documentation in this section.
6049 Rename the external symbols used in the parser so that they start with
6050 @var{prefix} instead of @samp{yy}. The precise list of symbols renamed in C
6051 parsers is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs},
6052 @code{yylval}, @code{yychar}, @code{yydebug}, and (if locations are used)
6053 @code{yylloc}. If you use a push parser, @code{yypush_parse},
6054 @code{yypull_parse}, @code{yypstate}, @code{yypstate_new} and
6055 @code{yypstate_delete} will also be renamed. For example, if you use
6056 @samp{%name-prefix "c_"}, the names become @code{c_parse}, @code{c_lex}, and
6059 Contrary to defining @code{api.prefix}, some symbols are @emph{not} renamed
6060 by @code{%name-prefix}, for instance @code{YYDEBUG}, @code{YYTOKENTYPE},
6061 @code{yytoken_kind_t}, @code{YYSTYPE}, @code{YYLTYPE}.
6065 @deffn {Directive} %no-default-prec
6066 Do not assign a precedence to rules lacking an explicit @code{%prec}
6067 modifier (@pxref{Contextual Precedence}).
6071 @deffn {Directive} %no-lines
6072 Don't generate any @code{#line} preprocessor commands in the parser
6073 implementation file. Ordinarily Bison writes these commands in the parser
6074 implementation file so that the C compiler and debuggers will associate
6075 errors and object code with your source file (the grammar file). This
6076 directive causes them to associate errors with the parser implementation
6077 file, treating it as an independent source file in its own right.
6080 @deffn {Directive} %output "@var{file}"
6081 Generate the parser implementation in @file{@var{file}}.
6084 @deffn {Directive} %pure-parser
6085 Deprecated version of @samp{%define api.pure} (@pxref{%define
6086 Summary}), for which Bison is more careful to warn about
6090 @deffn {Directive} %require "@var{version}"
6091 Require version @var{version} or higher of Bison. @xref{Require Decl}.
6094 @deffn {Directive} %skeleton "@var{file}"
6095 Specify the skeleton to use.
6097 @c You probably don't need this option unless you are developing Bison.
6098 @c You should use @code{%language} if you want to specify the skeleton for a
6099 @c different language, because it is clearer and because it will always choose the
6100 @c correct skeleton for non-deterministic or push parsers.
6102 If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton
6103 file in the Bison installation directory.
6104 If it does, @var{file} is an absolute file name or a file name relative to the
6105 directory of the grammar file.
6106 This is similar to how most shells resolve commands.
6109 @deffn {Directive} %token-table
6110 This feature is obsolescent, avoid it in new projects.
6112 Generate an array of token names in the parser implementation file. The
6113 name of the array is @code{yytname}; @code{yytname[@var{i}]} is the name of
6114 the token whose internal Bison token code is @var{i}. The first three
6115 elements of @code{yytname} correspond to the predefined tokens
6116 @code{"$end"}, @code{"error"}, and @code{"$undefined"}; after these come the
6117 symbols defined in the grammar file.
6119 The name in the table includes all the characters needed to represent the
6120 token in Bison. For single-character literals and literal strings, this
6121 includes the surrounding quoting characters and any escape sequences. For
6122 example, the Bison single-character literal @code{'+'} corresponds to a
6123 three-character name, represented in C as @code{"'+'"}; and the Bison
6124 two-character literal string @code{"\\/"} corresponds to a five-character
6125 name, represented in C as @code{"\"\\\\/\""}.
6127 When you specify @code{%token-table}, Bison also generates macro definitions
6128 for macros @code{YYNTOKENS}, @code{YYNNTS}, and @code{YYNRULES}, and
6133 The number of terminal symbols, i.e., the highest token code, plus one.
6135 The number of nonterminal symbols.
6137 The number of grammar rules,
6139 The number of parser states (@pxref{Parser States}).
6142 Here's code for looking up a multicharacter token in @code{yytname},
6143 assuming that the characters of the token are stored in @code{token_buffer},
6144 and assuming that the token does not contain any characters like @samp{"}
6145 that require escaping.
6148 for (int i = 0; i < YYNTOKENS; i++)
6150 && yytname[i][0] == '"'
6151 && ! strncmp (yytname[i] + 1, token_buffer,
6152 strlen (token_buffer))
6153 && yytname[i][strlen (token_buffer) + 1] == '"'
6154 && yytname[i][strlen (token_buffer) + 2] == 0)
6158 This method is discouraged: the primary purpose of string aliases is forging
6159 good error messages, not describing the spelling of keywords. In addition,
6160 looking for the token kind at runtime incurs a (small but noticeable) cost.
6162 Finally, @code{%token-table} is incompatible with the @code{custom} and
6163 @code{detailed} values of the @code{parse.error} @code{%define} variable.
6166 @deffn {Directive} %verbose
6167 Write an extra output file containing verbose descriptions of the parser
6168 states and what is done for each type of lookahead token in that state.
6169 @xref{Understanding}, for more information.
6172 @deffn {Directive} %yacc
6173 Pretend the option @option{--yacc} was given
6174 (@pxref{option-yacc,,@option{--yacc}}), i.e., imitate Yacc, including its
6175 naming conventions. Only makes sense with the @file{yacc.c}
6176 skeleton. @xref{Tuning the Parser}, for more.
6178 Of course, being a Bison extension, @code{%yacc} is somewhat
6179 self-contradictory@dots{}
6183 @node %define Summary
6184 @subsection %define Summary
6186 There are many features of Bison's behavior that can be controlled by
6187 assigning the feature a single value. For historical reasons, some such
6188 features are assigned values by dedicated directives, such as @code{%start},
6189 which assigns the start symbol. However, newer such features are associated
6190 with variables, which are assigned by the @code{%define} directive:
6192 @deffn {Directive} %define @var{variable}
6193 @deffnx {Directive} %define @var{variable} @var{value}
6194 @deffnx {Directive} %define @var{variable} @{@var{value}@}
6195 @deffnx {Directive} %define @var{variable} "@var{value}"
6196 Define @var{variable} to @var{value}.
6198 The type of the values depend on the syntax. Braces denote value in the
6199 target language (e.g., a namespace, a type, etc.). Keyword values (no
6200 delimiters) denote finite choice (e.g., a variation of a feature). String
6201 values denote remaining cases (e.g., a file name).
6203 It is an error if a @var{variable} is defined by @code{%define} multiple
6204 times, but see @ref{Tuning the Parser,,@option{-D @var{name}[=@var{value}]}}.
6207 The rest of this section summarizes variables and values that @code{%define}
6210 Some @var{variable}s take Boolean values. In this case, Bison will complain
6211 if the variable definition does not meet one of the following four
6215 @item @code{@var{value}} is @code{true}
6217 @item @code{@var{value}} is omitted (or @code{""} is specified).
6218 This is equivalent to @code{true}.
6220 @item @code{@var{value}} is @code{false}.
6222 @item @var{variable} is never defined.
6223 In this case, Bison selects a default value.
6226 What @var{variable}s are accepted, as well as their meanings and default
6227 values, depend on the selected target language and/or the parser skeleton
6228 (@pxref{Decl Summary}, @pxref{Decl Summary}).
6229 Unaccepted @var{variable}s produce an error. Some of the accepted
6230 @var{variable}s are described below.
6233 @c ================================================== api.filename.file
6234 @anchor{api-filename-type}
6235 @deffn {Directive} {%define api.filename.type} @{@var{type}@}
6238 @item Language(s): C++
6241 Define the type of file names in Bison's default location and position
6242 types. @xref{Exposing the Location Classes}.
6244 @item Accepted Values:
6245 Any type that is printable (via streams) and comparable (with @code{==} and
6248 @item Default Value: @code{const std::string}.
6251 Introduced in Bison 2.0 as @code{filename_type} (with @code{std::string} as
6252 default), renamed as @code{api.filename.type} in Bison 3.7 (with @code{const
6253 std::string} as default).
6258 @c ================================================== api.header.include
6259 @deffn Directive {%define api.header.include} @{"header.h"@}
6260 @deffnx Directive {%define api.header.include} @{<header.h>@}
6262 @item Languages(s): C (@file{yacc.c})
6264 @item Purpose: Specify how the generated parser should include the generated header.
6266 Historically, when option @option{-d} or @option{--header} was used,
6267 @command{bison} generated a header and pasted an exact copy of it into the
6268 generated parser implementation file. Since Bison 3.6, it is
6269 @code{#include}d as @samp{"@var{basename}.h"}, instead of duplicated, unless
6270 @var{file} is @samp{y.tab}, see below.
6272 The @code{api.header.include} variable allows to control how the generated
6273 parser @code{#include}s the generated header. For instance:
6276 %define api.header.include @{"parse.h"@}
6283 %define api.header.include @{<parser/parse.h>@}
6286 Using @code{api.header.include} does not change the name of the generated
6287 header, only how it is included.
6289 To work around limitations of Automake's @command{ylwrap} (which runs
6290 @command{bison} with @option{--yacc}), @code{api.header.include} is
6291 @emph{not} predefined when the output file is @file{y.tab.c}. Define it to
6292 avoid the duplication.
6294 @item Accepted Values:
6295 An argument for @code{#include}.
6297 @item Default Value:
6298 @samp{"@var{header-basename}"}, unless the header file is @file{y.tab.h},
6299 where @var{header-basename} is the name of the generated header, without
6300 directory part. For instance with @samp{bison -d calc/parse.y},
6301 @code{api.header.include} defaults to @samp{"parse.h"}, not
6302 @samp{"calc/parse.h"}.
6305 Introduced in Bison 3.4. Defaults to @samp{"@var{basename}.h"} since Bison
6306 3.7, unless the header file is @file{y.tab.h}.
6309 @c api.header.include
6312 @c ================================================== api.location.file
6313 @deffn {Directive} {%define api.location.file} "@var{file}"
6314 @deffnx {Directive} {%define api.location.file} @code{none}
6317 @item Language(s): C++
6320 Define the name of the file in which Bison's default location and position
6321 types are generated. @xref{Exposing the Location Classes}.
6323 @item Accepted Values:
6326 If locations are enabled, generate the definition of the @code{position} and
6327 @code{location} classes in the header file if @code{%header}, otherwise in
6328 the parser implementation.
6331 Generate the definition of the @code{position} and @code{location} classes
6332 in @var{file}. This file name can be relative (to where the parser file is
6333 output) or absolute.
6336 @item Default Value:
6337 Not applicable if locations are not enabled, or if a user location type is
6338 specified (see @code{api.location.type}). Otherwise, Bison's
6339 @code{location} is generated in @file{location.hh} (@pxref{C++ location}).
6342 Introduced in Bison 3.2.
6347 @c ================================================== api.location.file
6348 @deffn {Directive} {%define api.location.include} @{"@var{file}"@}
6349 @deffnx {Directive} {%define api.location.include} @{<@var{file}>@}
6352 @item Language(s): C++
6355 Specify how the generated file that defines the @code{position} and
6356 @code{location} classes is included. This makes sense when the
6357 @code{location} class is exposed to the rest of your application/library in
6358 another directory. @xref{Exposing the Location Classes}.
6360 @item Accepted Values: Argument for @code{#include}.
6362 @item Default Value:
6363 @samp{"@var{dir}/location.hh"} where @var{dir} is the directory part of the
6364 output. For instance @file{src/parse} if
6365 @option{--output=src/parse/parser.cc} was given.
6368 Introduced in Bison 3.2.
6374 @c ================================================== api.location.type
6375 @deffn {Directive} {%define api.location.type} @{@var{type}@}
6378 @item Language(s): C, C++, Java
6380 @item Purpose: Define the location type.
6381 @xref{Location Type}, and @ref{User Defined Location Type}.
6383 @item Accepted Values: String
6385 @item Default Value: none
6388 Introduced in Bison 2.7 for C++ and Java, in Bison 3.4 for C. Was
6389 originally named @code{location_type} in Bison 2.5 and 2.6.
6394 @c ================================================== api.namespace
6395 @deffn Directive {%define api.namespace} @{@var{namespace}@}
6397 @item Languages(s): C++
6399 @item Purpose: Specify the namespace for the parser class.
6400 For example, if you specify:
6403 %define api.namespace @{foo::bar@}
6406 Bison uses @code{foo::bar} verbatim in references such as:
6409 foo::bar::parser::value_type
6412 However, to open a namespace, Bison removes any leading @code{::} and then
6413 splits on any remaining occurrences:
6416 namespace foo @{ namespace bar @{
6422 @item Accepted Values:
6423 Any absolute or relative C++ namespace reference without a trailing
6424 @code{"::"}. For example, @code{"foo"} or @code{"::foo::bar"}.
6426 @item Default Value:
6427 @code{yy}, unless you used the obsolete @samp{%name-prefix "@var{prefix}"}
6434 @c ================================================== api.parser.class
6435 @deffn Directive {%define api.parser.class} @{@var{name}@}
6441 The name of the parser class.
6443 @item Accepted Values:
6444 Any valid identifier.
6446 @item Default Value:
6447 In C++, @code{parser}. In D and Java, @code{YYParser} or
6448 @code{@var{api.prefix}Parser} (@pxref{Java Bison Interface}).
6451 Introduced in Bison 3.3 to replace @code{parser_class_name}.
6457 @c ================================================== api.prefix
6458 @deffn {Directive} {%define api.prefix} @{@var{prefix}@}
6461 @item Language(s): C, C++, Java
6463 @item Purpose: Rename exported symbols.
6464 @xref{Multiple Parsers}.
6466 @item Accepted Values: String
6468 @item Default Value: @code{YY} for Java, @code{yy} otherwise.
6471 introduced in Bison 2.6, with its argument in double quotes. Uses braces
6472 since Bison 3.0 (double quotes are still supported for backward
6478 @c ================================================== api.pure
6479 @deffn Directive {%define api.pure} @var{purity}
6482 @item Language(s): C
6484 @item Purpose: Request a pure (reentrant) parser program.
6487 @item Accepted Values: @code{true}, @code{false}, @code{full}
6489 The value may be omitted: this is equivalent to specifying @code{true}, as is
6490 the case for Boolean values.
6492 When @code{%define api.pure full} is used, the parser is made reentrant. This
6493 changes the signature for @code{yylex} (@pxref{Pure Calling}), and also that of
6494 @code{yyerror} when the tracking of locations has been activated, as shown
6497 The @code{true} value is very similar to the @code{full} value, the only
6498 difference is in the signature of @code{yyerror} on Yacc parsers without
6499 @code{%parse-param}, for historical reasons.
6501 I.e., if @samp{%locations %define api.pure} is passed then the prototypes for
6505 void yyerror (char const *msg); // Yacc parsers.
6506 void yyerror (YYLTYPE *locp, char const *msg); // GLR parsers.
6509 But if @samp{%locations %define api.pure %parse-param @{int *nastiness@}} is
6510 used, then both parsers have the same signature:
6513 void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg);
6516 (@pxref{Error Reporting Function})
6518 @item Default Value: @code{false}
6521 the @code{full} value was introduced in Bison 2.7
6528 @c ================================================== api.push-pull
6529 @deffn Directive {%define api.push-pull} @var{kind}
6532 @item Language(s): C (deterministic parsers only), D, Java
6534 @item Purpose: Request a pull parser, a push parser, or both.
6537 @item Accepted Values: @code{pull}, @code{push}, @code{both}
6539 @item Default Value: @code{pull}
6546 @c ================================================== api.symbol.prefix
6547 @deffn Directive {%define api.symbol.prefix} @{@var{prefix}@}
6550 @item Languages(s): all
6553 Add a prefix to the name of the symbol kinds. For instance
6556 %define api.symbol.prefix @{S_@}
6557 %token FILE for ERROR
6559 start: FILE for ERROR;
6563 generates this definition in C:
6567 enum yysymbol_kind_t
6569 S_YYEMPTY = -2, /* No symbol. */
6570 S_YYEOF = 0, /* $end */
6571 S_YYERROR = 1, /* error */
6572 S_YYUNDEF = 2, /* $undefined */
6573 S_FILE = 3, /* FILE */
6574 S_for = 4, /* for */
6575 S_ERROR = 5, /* ERROR */
6576 S_YYACCEPT = 6, /* $accept */
6577 S_start = 7 /* start */
6581 @item Accepted Values:
6582 Any non empty string. Must be a valid identifier in the target language
6583 (typically a non empty sequence of letters, underscores, and ---not at the
6584 beginning--- digits).
6586 The empty prefix is (generally) invalid:
6589 in C it would create collision with the @code{YYERROR} macro, and
6590 potentially token kind definitions and symbol kind definitions would
6593 unnamed symbols (such as @samp{'+'}) have a name which starts with a digit;
6595 even in languages with scoped enumerations such as Java, an empty prefix is
6596 dangerous: symbol names may collide with the target language keywords, or
6597 with other members of the @code{SymbolKind} class.
6601 @item Default Value:
6602 @code{YYSYMBOL_} in C, @code{S_} in C++ and Java, empty in D.
6604 introduced in Bison 3.6.
6607 @c api.symbol.prefix
6610 @c ================================================== api.token.constructor
6611 @deffn Directive {%define api.token.constructor}
6618 Request that symbols be handled as a whole (type, value, and possibly
6619 location) in the scanner. In the case of C++, it works only when
6620 variant-based semantic values are enabled (@pxref{C++ Variants}), see
6621 @ref{Complete Symbols}, for details. In D, token constructors work with both
6622 @samp{%union} and @samp{%define api.value.type union}.
6624 @item Accepted Values:
6627 @item Default Value:
6630 introduced in Bison 3.0.
6633 @c api.token.constructor
6636 @c ================================================== api.token.prefix
6637 @anchor{api-token-prefix}
6638 @deffn Directive {%define api.token.prefix} @{@var{prefix}@}
6640 @item Languages(s): all
6643 Add a prefix to the token names when generating their definition in the
6644 target language. For instance
6647 %define api.token.prefix @{TOK_@}
6648 %token FILE for ERROR
6650 start: FILE for ERROR;
6654 generates the definition of the symbols @code{TOK_FILE}, @code{TOK_for}, and
6655 @code{TOK_ERROR} in the generated source files. In particular, the scanner
6656 must use these prefixed token names, while the grammar itself may still use
6657 the short names (as in the sample rule given above). The generated
6658 informational files (@file{*.output}, @file{*.xml}, @file{*.gv}) are not
6659 modified by this prefix.
6661 Bison also prefixes the generated member names of the semantic value union.
6662 @xref{Type Generation}, for more
6665 See @ref{Calc++ Parser} and @ref{Calc++ Scanner}, for a complete example.
6667 @item Accepted Values:
6668 Any string. Must be a valid identifier prefix in the target language
6669 (typically, a possibly empty sequence of letters, underscores, and ---not at
6670 the beginning--- digits).
6672 @item Default Value:
6675 introduced in Bison 3.0.
6681 @c ================================================== api.token.raw
6682 @deffn Directive {%define api.token.raw}
6689 The output files normally define the enumeration of the @emph{token kinds}
6690 with Yacc-compatible token codes: sequential numbers starting at 257 except
6691 for single character tokens which stand for themselves (e.g., in ASCII,
6692 @samp{'a'} is numbered 65). The parser however uses @emph{symbol kinds}
6693 which are assigned numbers sequentially starting at 0. Therefore each time
6694 the scanner returns an (external) token kind, it must be mapped to the
6695 (internal) symbol kind.
6697 When @code{api.token.raw} is set, the code of the token kinds are forced to
6698 coincide with the symbol kind. This saves one table lookup per token to map
6699 them from the token kind to the symbol kind, and also saves the generation
6700 of the mapping table. The gain is typically moderate, but in extreme cases
6701 (very simple user actions), a 10% improvement can be observed.
6703 When @code{api.token.raw} is set, the grammar cannot use character literals
6704 (such as @samp{'a'}).
6706 @item Accepted Values: Boolean.
6708 @item Default Value:
6709 @code{true} in D, @code{false} otherwise
6711 introduced in Bison 3.5. Was initially introduced in Bison 1.25 as
6712 @samp{%raw}, but never worked and was removed in Bison 1.29.
6718 @c ================================================== api.value.automove
6719 @deffn Directive {%define api.value.automove}
6726 Let occurrences of semantic values of the right-hand sides of a rule be
6727 implicitly turned in rvalues. When enabled, a grammar such as:
6731 "number" @{ $$ = make_number ($1); @}
6732 | exp "+" exp @{ $$ = make_binary (add, $1, $3); @}
6733 | "(" exp ")" @{ $$ = $2; @}
6737 is actually compiled as if you had written:
6741 "number" @{ $$ = make_number (std::move ($1)); @}
6742 | exp "+" exp @{ $$ = make_binary (add,
6745 | "(" exp ")" @{ $$ = std::move ($2); @}
6748 Using a value several times with automove enabled is typically an error.
6749 For instance, instead of:
6752 exp: "twice" exp @{ $$ = make_binary (add, $2, $2); @}
6759 exp: "twice" exp @{ auto v = $2; $$ = make_binary (add, v, v); @}
6763 It is tempting to use @code{std::move} on one of the @code{v}, but the
6764 argument evaluation order in C++ is unspecified.
6766 @item Accepted Values:
6769 @item Default Value:
6772 introduced in Bison 3.2
6775 @c api.value.automove
6778 @c ================================================== api.value.type
6779 @deffn Directive {%define api.value.type} @var{support}
6780 @deffnx Directive {%define api.value.type} @{@var{type}@}
6786 The type for semantic values.
6788 @item Accepted Values:
6791 This grammar has no semantic value at all. This is not properly supported
6793 @item @samp{union-directive} (C, C++, D)
6794 The type is defined thanks to the @code{%union} directive. You don't have
6795 to define @code{api.value.type} in that case, using @code{%union} suffices.
6799 %define api.value.type union-directive
6805 %token <ival> INT "integer"
6806 %token <sval> STR "string"
6809 @item @samp{union} (C, C++)
6810 The symbols are defined with type names, from which Bison will generate a
6811 @code{union}. For instance:
6813 %define api.value.type union
6814 %token <int> INT "integer"
6815 %token <char *> STR "string"
6817 Most C++ objects cannot be stored in a @code{union}, use @samp{variant}
6820 @item @samp{variant} (C++)
6821 This is similar to @code{union}, but special storage techniques are used to
6822 allow any kind of C++ object to be used. For instance:
6824 %define api.value.type variant
6825 %token <int> INT "integer"
6826 %token <std::string> STR "string"
6828 @xref{C++ Variants}.
6830 @item @samp{@{@var{type}@}}
6831 Use this @var{type} as semantic value.
6848 %define api.value.type @{struct my_value@}
6849 %token <u.ival> INT "integer"
6850 %token <u.sval> STR "string"
6854 @item Default Value:
6857 @code{union-directive} if @code{%union} is used, otherwise @dots{}
6859 @code{int} if type tags are used (i.e., @samp{%token <@var{type}>@dots{}} or
6860 @samp{%nterm <@var{type}>@dots{}} is used), otherwise @dots{}
6866 introduced in Bison 3.0. Was introduced for Java only in 2.3b as
6873 @c ================================================== api.value.union.name
6874 @deffn Directive {%define api.value.union.name} @var{name}
6880 The tag of the generated @code{union} (@emph{not} the name of the
6881 @code{typedef}). This variable is set to @code{@var{id}} when @samp{%union
6882 @var{id}} is used. There is no clear reason to give this union a name.
6884 @item Accepted Values:
6885 Any valid identifier.
6887 @item Default Value:
6891 Introduced in Bison 3.0.3.
6894 @c api.value.union.name
6897 @c ================================================== cex.timeout
6899 @deffn Directive {%define cex.timeout} @var{duration}
6902 @item Language(s): all
6905 Define the time limit for finding unifying counterexamples.
6907 @item Accepted Values: duration in seconds, e.g., @samp{1}, @samp{0.5}.
6909 @item Default Value: 5
6914 @c ================================================== lr.default-reduction
6916 @deffn Directive {%define lr.default-reduction} @var{when}
6919 @item Language(s): all
6921 @item Purpose: Specify the kind of states that are permitted to
6922 contain default reductions. @xref{Default Reductions}.
6924 @item Accepted Values: @code{most}, @code{consistent}, @code{accepting}
6925 @item Default Value:
6927 @item @code{accepting} if @code{lr.type} is @code{canonical-lr}.
6928 @item @code{most} otherwise.
6931 introduced as @code{lr.default-reductions} in 2.5, renamed as
6932 @code{lr.default-reduction} in 3.0.
6937 @c ============================================ lr.keep-unreachable-state
6939 @deffn Directive {%define lr.keep-unreachable-state}
6942 @item Language(s): all
6943 @item Purpose: Request that Bison allow unreachable parser states to
6944 remain in the parser tables. @xref{Unreachable States}.
6945 @item Accepted Values: Boolean
6946 @item Default Value: @code{false}
6948 introduced as @code{lr.keep_unreachable_states} in 2.3b, renamed as
6949 @code{lr.keep-unreachable-states} in 2.5, and as
6950 @code{lr.keep-unreachable-state} in 3.0.
6953 @c lr.keep-unreachable-state
6956 @c ================================================== lr.type
6958 @deffn Directive {%define lr.type} @var{type}
6961 @item Language(s): all
6963 @item Purpose: Specify the type of parser tables within the
6964 LR(1) family. @xref{LR Table Construction}.
6966 @item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr}
6968 @item Default Value: @code{lalr}
6973 @c ================================================== namespace
6974 @deffn Directive %define namespace @{@var{namespace}@}
6975 Obsoleted by @code{api.namespace}
6980 @c ================================================== parse.assert
6981 @deffn Directive {%define parse.assert}
6984 @item Languages(s): C, C++
6986 @item Purpose: Issue runtime assertions to catch invalid uses.
6987 In C, some important invariants in the implementation of the parser are
6988 checked when this option is enabled.
6990 In C++, when variants are used (@pxref{C++ Variants}), symbols must be
6991 constructed and destroyed properly. This option checks these constraints
6992 using runtime type information (RTTI). Therefore the generated code cannot
6993 be compiled with RTTI disabled (via compiler options such as
6994 @option{-fno-rtti}).
6996 @item Accepted Values: Boolean
6998 @item Default Value: @code{false}
7004 @c ================================================== parse.error
7005 @deffn Directive {%define parse.error} @var{verbosity}
7010 Control the generation of syntax error messages. @xref{Error Reporting}.
7011 @item Accepted Values:
7014 Error messages passed to @code{yyerror} are simply @w{@code{"syntax
7017 @item @code{detailed}
7018 Error messages report the unexpected token, and possibly the expected ones.
7019 However, this report can often be incorrect when LAC is not enabled
7020 (@pxref{LAC}). Token name internationalization is supported.
7022 @item @code{verbose}
7023 Similar (but inferior) to @code{detailed}. The D parser does not support this value.
7025 Error messages report the unexpected token, and possibly the expected ones.
7026 However, this report can often be incorrect when LAC is not enabled
7029 Does not support token internationalization. Using non-ASCII characters in
7030 token aliases is not portable.
7033 The user is in charge of generating the syntax error message by defining the
7034 @code{yyreport_syntax_error} function. @xref{Syntax Error Reporting
7038 @item Default Value:
7042 introduced in 3.0 with support for @code{simple} and @code{verbose}. Values
7043 @code{custom} and @code{detailed} were introduced in 3.6.
7049 @c ================================================== parse.lac
7050 @deffn Directive {%define parse.lac} @var{when}
7053 @item Languages(s): C/C++ (deterministic parsers only), D and Java.
7055 @item Purpose: Enable LAC (lookahead correction) to improve
7056 syntax error handling. @xref{LAC}.
7057 @item Accepted Values: @code{none}, @code{full}
7058 @item Default Value: @code{none}
7064 @c ================================================== parse.trace
7065 @deffn Directive {%define parse.trace}
7068 @item Languages(s): C, C++, D, Java
7070 @item Purpose: Require parser instrumentation for tracing.
7073 In C/C++, define the macro @code{YYDEBUG} (or @code{@var{prefix}DEBUG} with
7074 @samp{%define api.prefix @{@var{prefix}@}}), see @ref{Multiple Parsers}) to
7075 1 (if it is not already defined) so that the debugging facilities are
7078 @item Accepted Values: Boolean
7080 @item Default Value: @code{false}
7086 @c ================================================== parser_class_name
7087 @deffn Directive %define parser_class_name @{@var{name}@}
7088 Obsoleted by @code{api.parser.class}
7090 @c parser_class_name
7098 @subsection %code Summary
7102 The @code{%code} directive inserts code verbatim into the output
7103 parser source at any of a predefined set of locations. It thus serves
7104 as a flexible and user-friendly alternative to the traditional Yacc
7105 prologue, @code{%@{@var{code}%@}}. This section summarizes the
7106 functionality of @code{%code} for the various target languages
7107 supported by Bison. For a detailed discussion of how to use
7108 @code{%code} in place of @code{%@{@var{code}%@}} for C/C++ and why it
7109 is advantageous to do so, @pxref{Prologue Alternatives}.
7111 @deffn {Directive} %code @{@var{code}@}
7112 This is the unqualified form of the @code{%code} directive. It
7113 inserts @var{code} verbatim at a language-dependent default location
7114 in the parser implementation.
7116 For C/C++, the default location is the parser implementation file
7117 after the usual contents of the parser header file. Thus, the
7118 unqualified form replaces @code{%@{@var{code}%@}} for most purposes.
7120 For D and Java, the default location is inside the parser class.
7123 @deffn {Directive} %code @var{qualifier} @{@var{code}@}
7124 This is the qualified form of the @code{%code} directive.
7125 @var{qualifier} identifies the purpose of @var{code} and thus the
7126 location(s) where Bison should insert it. That is, if you need to
7127 specify location-sensitive @var{code} that does not belong at the
7128 default location selected by the unqualified @code{%code} form, use
7132 For any particular qualifier or for the unqualified form, if there are
7133 multiple occurrences of the @code{%code} directive, Bison concatenates
7134 the specified code in the order in which it appears in the grammar
7137 Not all qualifiers are accepted for all target languages. Unaccepted
7138 qualifiers produce an error. Some of the accepted qualifiers are:
7142 @findex %code requires
7145 @item Language(s): C, C++
7148 This is the best place to write dependency code required for the value and
7149 location types (@code{YYSTYPE} and @code{YYLTYPE} in C). In other words,
7150 it's the best place to define types referenced in @code{%union} directives.
7151 In C, if you use @code{#define} to override Bison's default @code{YYSTYPE}
7152 and @code{YYLTYPE} definitions, then it is also the best place. However you
7153 should rather @code{%define} @code{api.value.type} and
7154 @code{api.location.type}.
7157 The parser header file and the parser implementation file before the
7158 Bison-generated definitions of the value and location types (@code{YYSTYPE}
7159 and @code{YYLTYPE} in C).
7163 @findex %code provides
7166 @item Language(s): C, C++
7168 @item Purpose: This is the best place to write additional definitions and
7169 declarations that should be provided to other modules.
7172 The parser header file and the parser implementation file after the
7173 Bison-generated value and location types (@code{YYSTYPE} and @code{YYLTYPE}
7174 in C), and token definitions.
7181 @item Language(s): C, C++
7183 @item Purpose: The unqualified @code{%code} or @code{%code requires}
7184 should usually be more appropriate than @code{%code top}. However,
7185 occasionally it is necessary to insert code much nearer the top of the
7186 parser implementation file. For example:
7195 @item Location(s): Near the top of the parser implementation file.
7199 @findex %code imports
7202 @item Language(s): D, Java
7204 @item Purpose: This is the best place to write Java import directives. D syntax
7205 allows for import statements all throughout the code.
7207 @item Location(s): The parser Java file after any Java package directive and
7208 before any class definitions. The parser D file before any class definitions.
7212 Though we say the insertion locations are language-dependent, they are
7213 technically skeleton-dependent. Writers of non-standard skeletons
7214 however should choose their locations consistently with the behavior
7215 of the standard Bison skeletons.
7218 @node Multiple Parsers
7219 @section Multiple Parsers in the Same Program
7221 Most programs that use Bison parse only one language and therefore contain
7222 only one Bison parser. But what if you want to parse more than one language
7223 with the same program? Then you need to avoid name conflicts between
7224 different definitions of functions and variables such as @code{yyparse},
7225 @code{yylval}. To use different parsers from the same compilation unit, you
7226 also need to avoid conflicts on types and macros (e.g., @code{YYSTYPE})
7227 exported in the generated header.
7229 The easy way to do this is to define the @code{%define} variable
7230 @code{api.prefix}. With different @code{api.prefix}s it is guaranteed that
7231 headers do not conflict when included together, and that compiled objects
7232 can be linked together too. Specifying @samp{%define api.prefix
7233 @{@var{prefix}@}} (or passing the option @option{-Dapi.prefix=@{@var{prefix}@}}, see
7234 @ref{Invocation}) renames the interface functions and
7235 variables of the Bison parser to start with @var{prefix} instead of
7236 @samp{yy}, and all the macros to start by @var{PREFIX} (i.e., @var{prefix}
7237 upper-cased) instead of @samp{YY}.
7239 The renamed symbols include @code{yyparse}, @code{yylex}, @code{yyerror},
7240 @code{yynerrs}, @code{yylval}, @code{yylloc}, @code{yychar} and
7241 @code{yydebug}. If you use a push parser, @code{yypush_parse},
7242 @code{yypull_parse}, @code{yypstate}, @code{yypstate_new} and
7243 @code{yypstate_delete} will also be renamed. The renamed macros include
7244 @code{YYSTYPE}, @code{YYLTYPE}, and @code{YYDEBUG}, which is treated
7245 specifically --- more about this below.
7247 For example, if you use @samp{%define api.prefix @{c@}}, the names become
7248 @code{cparse}, @code{clex}, @dots{}, @code{CSTYPE}, @code{CLTYPE}, and so
7251 Users of Flex must update the signature of the generated @code{yylex}
7252 function. Since the Flex scanner usually includes the generated header of
7253 the parser (to get the definitions of the tokens, etc.), the most convenient
7254 way is to insert the declaration of @code{yylex} in the @code{provides}
7258 %define api.prefix @{c@}
7259 // Emitted in the header file, after the definition of YYSTYPE.
7262 // Tell Flex the expected prototype of yylex.
7264 int clex (CSTYPE *yylval, CLTYPE *yylloc)
7266 // Declare the scanner.
7273 The @code{%define} variable @code{api.prefix} works in two different ways.
7274 In the implementation file, it works by adding macro definitions to the
7275 beginning of the parser implementation file, defining @code{yyparse} as
7276 @code{@var{prefix}parse}, and so on:
7279 #define YYSTYPE CTYPE
7280 #define yyparse cparse
7281 #define yylval clval
7287 This effectively substitutes one name for the other in the entire parser
7288 implementation file, thus the ``original'' names (@code{yylex},
7289 @code{YYSTYPE}, @dots{}) are also usable in the parser implementation file.
7291 However, in the parser header file, the symbols are defined renamed, for
7295 extern CSTYPE clval;
7299 The macro @code{YYDEBUG} is commonly used to enable the tracing support in
7300 parsers. To comply with this tradition, when @code{api.prefix} is used,
7301 @code{YYDEBUG} (not renamed) is used as a default value:
7306 # if defined YYDEBUG
7323 Prior to Bison 2.6, a feature similar to @code{api.prefix} was provided by
7324 the obsolete directive @code{%name-prefix} (@pxref{Table of Symbols}) and
7325 the option @option{--name-prefix} (@pxref{Output Files}).
7328 @chapter Parser C-Language Interface
7329 @cindex C-language interface
7332 The Bison parser is actually a C function named @code{yyparse}. Here we
7333 describe the interface conventions of @code{yyparse} and the other
7334 functions that it needs to use.
7336 Keep in mind that the parser uses many C identifiers starting with
7337 @samp{yy} and @samp{YY} for internal purposes. If you use such an
7338 identifier (aside from those in this manual) in an action or in epilogue
7339 in the grammar file, you are likely to run into trouble.
7342 * Parser Function:: How to call @code{yyparse} and what it returns.
7343 * Push Parser Interface:: How to create, use, and destroy push parsers.
7344 * Lexical:: You must supply a function @code{yylex}
7346 * Error Reporting:: Passing error messages to the user.
7347 * Action Features:: Special features for use in actions.
7348 * Internationalization:: How to let the parser speak in the user's
7352 @node Parser Function
7353 @section The Parser Function @code{yyparse}
7356 You call the function @code{yyparse} to cause parsing to occur. This
7357 function reads tokens, executes actions, and ultimately returns when it
7358 encounters end-of-input or an unrecoverable syntax error. You can also
7359 write an action which directs @code{yyparse} to return immediately
7360 without reading further.
7363 @deftypefun int yyparse (@code{void})
7364 The value returned by @code{yyparse} is 0 if parsing was successful (return
7365 is due to end-of-input).
7367 The value is 1 if parsing failed because of invalid input, i.e., input
7368 that contains a syntax error or that causes @code{YYABORT} to be
7371 The value is 2 if parsing failed due to memory exhaustion.
7374 In an action, you can cause immediate return from @code{yyparse} by using
7379 Return immediately with value 0 (to report success).
7384 Return immediately with value 1 (to report failure).
7389 Return immediately with value 2 (to report memory exhaustion).
7392 If you use a reentrant parser, you can optionally pass additional
7393 parameter information to it in a reentrant way. To do so, use the
7394 declaration @code{%parse-param}:
7396 @deffn {Directive} %parse-param @{@var{argument-declaration}@} @dots{}
7397 @findex %parse-param
7398 Declare that one or more
7399 @var{argument-declaration} are additional @code{yyparse} arguments.
7400 The @var{argument-declaration} is used when declaring
7401 functions or prototypes. The last identifier in
7402 @var{argument-declaration} must be the argument name.
7405 Here's an example. Write this in the parser:
7408 %parse-param @{int *nastiness@} @{int *randomness@}
7412 Then call the parser like this:
7416 int nastiness, randomness;
7417 @dots{} /* @r{Store proper data in @code{nastiness} and @code{randomness}.} */
7418 value = yyparse (&nastiness, &randomness);
7424 In the grammar actions, use expressions like this to refer to the data:
7427 exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @}
7431 Using the following:
7433 %parse-param @{int *randomness@}
7436 Results in these signatures:
7438 void yyerror (int *randomness, const char *msg);
7439 int yyparse (int *randomness);
7443 Or, if both @code{%define api.pure full} (or just @code{%define api.pure})
7444 and @code{%locations} are used:
7447 void yyerror (YYLTYPE *llocp, int *randomness, const char *msg);
7448 int yyparse (int *randomness);
7451 @node Push Parser Interface
7452 @section Push Parser Interface
7454 @findex yypstate_new
7455 You call the function @code{yypstate_new} to create a new parser instance.
7456 This function is available if either the @samp{%define api.push-pull push}
7457 or @samp{%define api.push-pull both} declaration is used. @xref{Push Decl}.
7459 @anchor{yypstate_new}
7460 @deftypefun {yypstate*} yypstate_new (@code{void})
7461 Return a valid parser instance if there is memory available, 0 otherwise.
7462 In impure mode, it will also return 0 if a parser instance is currently
7466 @findex yypstate_delete
7467 You call the function @code{yypstate_delete} to delete a parser instance.
7468 function is available if either the @samp{%define api.push-pull push} or
7469 @samp{%define api.push-pull both} declaration is used.
7472 @anchor{yypstate_delete}
7473 @deftypefun void yypstate_delete (@code{yypstate *}@var{yyps})
7474 Reclaim the memory associated with a parser instance. After this call, you
7475 should no longer attempt to use the parser instance.
7478 @findex yypush_parse
7479 You call the function @code{yypush_parse} to parse a single token. This
7480 function is available if either the @samp{%define api.push-pull push} or
7481 @samp{%define api.push-pull both} declaration is used. @xref{Push Decl}.
7483 @anchor{yypush_parse}
7484 @deftypefun int yypush_parse (@code{yypstate *}@var{yyps})
7485 The value returned by @code{yypush_parse} is the same as for @code{yyparse}
7486 with the following exception: it returns @code{YYPUSH_MORE} if more input is
7487 required to finish parsing the grammar.
7489 After @code{yypush_parse} returned, the instance may be consulted. For
7490 instance check @code{yynerrs} to see whether there were (possibly recovered)
7493 After @code{yypush_parse} returns a status other than @code{YYPUSH_MORE},
7494 the parser instance @code{yyps} may be reused for a new parse.
7497 The fact that the parser state is reusable even after an error simplifies
7498 reuse. For example, a calculator application which parses each input line
7499 as an expression can just keep reusing the same @code{yyps} even if an input
7502 You call the function @code{yypull_parse} to parse the rest of the input
7503 stream. This function is available if the @samp{%define api.push-pull both}
7504 declaration is used. @xref{Push Decl}.
7506 @anchor{yypull_parse}
7507 @deftypefun int yypull_parse (@code{yypstate *}@var{yyps})
7508 The value returned by @code{yypull_parse} is the same as for @code{yyparse}.
7510 The parser instance @code{yyps} may be reused for new parses.
7513 @deftypefun int yypstate_expected_tokens (@code{const yypstate *}yyps, @code{yysymbol_kind_t} @var{argv}@code{[]}, @code{int} @var{argc})
7514 Fill @var{argv} with the expected tokens, which never includes
7515 @code{YYSYMBOL_YYEMPTY}, @code{YYSYMBOL_YYerror}, or
7516 @code{YYSYMBOL_YYUNDEF}.
7518 Never put more than @var{argc} elements into @var{argv}, and on success
7519 return the number of tokens stored in @var{argv}. If there are more
7520 expected tokens than @var{argc}, fill @var{argv} up to @var{argc} and return
7521 0. If there are no expected tokens, also return 0, but set @code{argv[0]}
7522 to @code{YYSYMBOL_YYEMPTY}.
7524 When LAC is enabled, may return a negative number on errors,
7525 such as @code{YYENOMEM} on memory exhaustion.
7527 If @var{argv} is null, return the size needed to store all the possible
7528 values, which is always less than @code{YYNTOKENS}.
7533 @section The Lexical Analyzer Function @code{yylex}
7535 @cindex lexical analyzer
7537 The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
7538 the input stream and returns them to the parser. Bison does not create
7539 this function automatically; you must write it so that @code{yyparse} can
7540 call it. The function is sometimes referred to as a lexical scanner.
7542 In simple programs, @code{yylex} is often defined at the end of the Bison
7543 grammar file. If @code{yylex} is defined in a separate source file, you
7544 need to arrange for the token-kind definitions to be available there. To do
7545 this, use the @option{-d} option when you run Bison, so that it will write
7546 these definitions into the separate parser header file,
7547 @file{@var{name}.tab.h}, which you can include in the other source files
7548 that need it. @xref{Invocation}.
7551 * Calling Convention:: How @code{yyparse} calls @code{yylex}.
7552 * Special Tokens:: Signaling end-of-file and errors to the parser.
7553 * Tokens from Literals:: Finding token kinds from string aliases.
7554 * Token Values:: How @code{yylex} must return the semantic value
7555 of the token it has read.
7556 * Token Locations:: How @code{yylex} must return the text location
7557 (line number, etc.) of the token, if the
7559 * Pure Calling:: How the calling convention differs in a pure parser
7560 (@pxref{Pure Decl}).
7563 @node Calling Convention
7564 @subsection Calling Convention for @code{yylex}
7566 The value that @code{yylex} returns must be the positive numeric code for
7567 the kind of token it has just found; a zero or negative value signifies
7570 When a token kind is referred to in the grammar rules by a name, that name
7571 in the parser implementation file becomes an enumerator of the enum
7572 @code{yytoken_kind_t} whose definition is the proper numeric code for that
7573 token kind. So @code{yylex} should use the name to indicate that type.
7576 When a token is referred to in the grammar rules by a character literal, the
7577 numeric code for that character is also the code for the token kind. So
7578 @code{yylex} can simply return that character code, possibly converted to
7579 @code{unsigned char} to avoid sign-extension. The null character must not
7580 be used this way, because its code is zero and that signifies end-of-input.
7582 A simple program might use the following declaration:
7591 and the following definition, either in the grammar file itself or in some
7592 other module that has @code{#include "y.tab.h"}:
7604 return YYEOF; /* Report end-of-input. */
7605 if (c == '+' || c == '-')
7606 return c; /* Assume token kind for '+' is '+'. */
7607 if ('0' <= c && c <= '9')
7610 while ('0' <= (c = getchar ()) && c <= '9')
7611 yylval = yylval * 10 + (c - '0');
7613 return INT; /* Return the kind of the token. */
7621 This interface has been designed so that the output from the @code{lex}
7622 utility can be used without change as the definition of @code{yylex}.
7625 @node Special Tokens
7626 @subsection Special Tokens
7628 In addition to the user defined tokens, Bison generates a few special tokens
7629 that @code{yylex} may return.
7632 The @code{YYEOF} token denotes the end of file, and signals to the parser
7633 that there is nothing left afterwards. @xref{Calling Convention}, for an
7637 Returning @code{YYUNDEF} tells the parser that some lexical error was found.
7638 It will emit an error message about an ``invalid token'', and enter
7639 error-recovery (@pxref{Error Recovery}). Returning an unknown token kind
7640 results in the exact same behavior.
7643 Returning @code{YYerror} requires the parser to enter error-recovery
7644 @emph{without} emitting an error message. This way the lexical analyzer can
7645 produce an accurate error messages about the invalid input (something the
7646 parser cannot do), and yet benefit from the error-recovery features of the
7657 case '0': case '1': case '2': case '3': case '4':
7658 case '5': case '6': case '7': case '8': case '9':
7665 yyerror ("syntax error: invalid character: %c", c);
7671 @node Tokens from Literals
7672 @subsection Finding Tokens by String Literals
7674 If the grammar uses literal string tokens, there are two ways that
7675 @code{yylex} can determine the token kind codes for them:
7679 If the grammar defines symbolic token names as aliases for the literal
7680 string tokens, @code{yylex} can use these symbolic names like all others.
7681 In this case, the use of the literal string tokens in the grammar file has
7682 no effect on @code{yylex}.
7684 This is the preferred approach.
7687 @code{yylex} can search for the multicharacter token in the @code{yytname}
7688 table. This method is discouraged: the primary purpose of string aliases is
7689 forging good error messages, not describing the spelling of keywords. In
7690 addition, looking for the token kind at runtime incurs a (small but
7693 The @code{yytname} table is generated only if you use the
7694 @code{%token-table} declaration. @xref{Decl Summary}.
7699 @subsection Semantic Values of Tokens
7702 In an ordinary (nonreentrant) parser, the semantic value of the token must
7703 be stored into the global variable @code{yylval}. When you are using just
7704 one data type for semantic values, @code{yylval} has that type. Thus, if
7705 the type is @code{int} (the default), you might write this in @code{yylex}:
7710 yylval = value; /* Put value onto Bison stack. */
7711 return INT; /* Return the kind of the token. */
7716 When you are using multiple data types, @code{yylval}'s type is a union made
7717 from the @code{%union} declaration (@pxref{Union Decl}). So when you store
7718 a token's value, you must use the proper member of the union. If the
7719 @code{%union} declaration looks like this:
7732 then the code in @code{yylex} might look like this:
7737 yylval.intval = value; /* Put value onto Bison stack. */
7738 return INT; /* Return the kind of the token. */
7743 @node Token Locations
7744 @subsection Textual Locations of Tokens
7747 If you are using the @samp{@@@var{n}}-feature (@pxref{Tracking Locations})
7748 in actions to keep track of the textual locations of tokens and groupings,
7749 then you must provide this information in @code{yylex}. The function
7750 @code{yyparse} expects to find the textual location of a token just parsed
7751 in the global variable @code{yylloc}. So @code{yylex} must store the proper
7752 data in that variable.
7754 By default, the value of @code{yylloc} is a structure and you need only
7755 initialize the members that are going to be used by the actions. The
7756 four members are called @code{first_line}, @code{first_column},
7757 @code{last_line} and @code{last_column}. Note that the use of this
7758 feature makes the parser noticeably slower.
7761 The data type of @code{yylloc} has the name @code{YYLTYPE}.
7764 @subsection Calling Conventions for Pure Parsers
7766 When you use the Bison declaration @code{%define api.pure full} to request a
7767 pure, reentrant parser, the global communication variables @code{yylval} and
7768 @code{yylloc} cannot be used. (@xref{Pure Decl}.) In such parsers the two
7769 global variables are replaced by pointers passed as arguments to
7770 @code{yylex}. You must declare them as shown here, and pass the information
7771 back by storing it through those pointers.
7775 yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
7778 *lvalp = value; /* Put value onto Bison stack. */
7779 return INT; /* Return the kind of the token. */
7784 If the grammar file does not use the @samp{@@} constructs to refer to
7785 textual locations, then the type @code{YYLTYPE} will not be defined. In
7786 this case, omit the second argument; @code{yylex} will be called with
7789 If you wish to pass additional arguments to @code{yylex}, use
7790 @code{%lex-param} just like @code{%parse-param} (@pxref{Parser
7791 Function}). To pass additional arguments to both @code{yylex} and
7792 @code{yyparse}, use @code{%param}.
7794 @deffn {Directive} %lex-param @{@var{argument-declaration}@} @dots{}
7796 Specify that @var{argument-declaration} are additional @code{yylex} argument
7797 declarations. You may pass one or more such declarations, which is
7798 equivalent to repeating @code{%lex-param}.
7801 @deffn {Directive} %param @{@var{argument-declaration}@} @dots{}
7803 Specify that @var{argument-declaration} are additional
7804 @code{yylex}/@code{yyparse} argument declaration. This is equivalent to
7805 @samp{%lex-param @{@var{argument-declaration}@} @dots{} %parse-param
7806 @{@var{argument-declaration}@} @dots{}}. You may pass one or more
7807 declarations, which is equivalent to repeating @code{%param}.
7814 %lex-param @{scanner_mode *mode@}
7815 %parse-param @{parser_mode *mode@}
7816 %param @{environment_type *env@}
7820 results in the following signatures:
7823 int yylex (scanner_mode *mode, environment_type *env);
7824 int yyparse (parser_mode *mode, environment_type *env);
7827 If @samp{%define api.pure full} is added:
7830 int yylex (YYSTYPE *lvalp, scanner_mode *mode, environment_type *env);
7831 int yyparse (parser_mode *mode, environment_type *env);
7835 and finally, if both @samp{%define api.pure full} and @code{%locations} are
7839 int yylex (YYSTYPE *lvalp, YYLTYPE *llocp,
7840 scanner_mode *mode, environment_type *env);
7841 int yyparse (parser_mode *mode, environment_type *env);
7845 @node Error Reporting
7846 @section Error Reporting
7848 During its execution the parser may have error messages to pass to the user,
7849 such as syntax error, or memory exhaustion. How this message is delivered
7850 to the user must be specified by the developer.
7853 * Error Reporting Function:: You must supply a @code{yyerror} function.
7854 * Syntax Error Reporting Function:: You can supply a @code{yyreport_syntax_error} function.
7857 @node Error Reporting Function
7858 @subsection The Error Reporting Function @code{yyerror}
7859 @cindex error reporting function
7862 @cindex syntax error
7864 The Bison parser detects a @dfn{syntax error} (or @dfn{parse error})
7865 whenever it reads a token which cannot satisfy any syntax rule. An
7866 action in the grammar can also explicitly proclaim an error, using the
7867 macro @code{YYERROR} (@pxref{Action Features}).
7869 The Bison parser expects to report the error by calling an error
7870 reporting function named @code{yyerror}, which you must supply. It is
7871 called by @code{yyparse} whenever a syntax error is found, and it
7872 receives one argument. For a syntax error, the string is normally
7873 @w{@code{"syntax error"}}.
7875 @findex %define parse.error detailed
7876 @findex %define parse.error verbose
7877 If you invoke @samp{%define parse.error detailed} (or @samp{custom}) in the
7878 Bison declarations section (@pxref{Bison Declarations}), then Bison provides
7879 a more verbose and specific error message string instead of just plain
7880 @w{@code{"syntax error"}}. However, that message sometimes contains
7881 incorrect information if LAC is not enabled (@pxref{LAC}).
7883 The parser can detect one other kind of error: memory exhaustion. This
7884 can happen when the input contains constructions that are very deeply
7885 nested. It isn't likely you will encounter this, since the Bison
7886 parser normally extends its stack automatically up to a very large limit. But
7887 if memory is exhausted, @code{yyparse} calls @code{yyerror} in the usual
7888 fashion, except that the argument string is @w{@code{"memory exhausted"}}.
7890 In some cases diagnostics like @w{@code{"syntax error"}} are
7891 translated automatically from English to some other language before
7892 they are passed to @code{yyerror}. @xref{Internationalization}.
7894 A simple program might use the following declaration:
7898 void yyerror (char const *);
7903 and the following definition, either in the grammar file itself or in some
7904 other module that has @code{#include "y.tab.h"}:
7911 yyerror (char const *s)
7915 fprintf (stderr, "%s\n", s);
7920 After @code{yyerror} returns to @code{yyparse}, the latter will attempt
7921 error recovery if you have written suitable error recovery grammar rules
7922 (@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will
7923 immediately return 1.
7925 Obviously, in location tracking pure parsers, @code{yyerror} should have
7926 an access to the current location. With @code{%define api.pure}, this is
7927 indeed the case for the GLR parsers, but not for the Yacc parser, for
7928 historical reasons, and this is the why @code{%define api.pure full} should be
7929 preferred over @code{%define api.pure}.
7931 When @code{%locations %define api.pure full} is used, @code{yyerror} has the
7932 following signature:
7935 void yyerror (YYLTYPE *locp, char const *msg);
7939 The prototypes are only indications of how the code produced by Bison
7940 uses @code{yyerror}. Bison-generated code always ignores the returned
7941 value, so @code{yyerror} can return any type, including @code{void}.
7942 Also, @code{yyerror} can be a variadic function; that is why the
7943 message is always passed last.
7945 Traditionally @code{yyerror} returns an @code{int} that is always
7946 ignored, but this is purely for historical reasons, and @code{void} is
7947 preferable since it more accurately describes the return type for
7951 The variable @code{yynerrs} contains the number of syntax errors
7952 reported so far. Normally this variable is global; but if you
7953 request a pure parser (@pxref{Pure Decl})
7954 then it is a local variable which only the actions can access.
7957 @node Syntax Error Reporting Function
7958 @subsection The Syntax Error Reporting Function @code{yyreport_syntax_error}
7960 @findex %define parse.error custom
7961 If you invoke @samp{%define parse.error custom} (@pxref{Bison
7962 Declarations}), then the parser no longer passes syntax error messages to
7963 @code{yyerror}, rather it delegates that task to the user by calling the
7964 @code{yyreport_syntax_error} function.
7966 The following functions and types are ``@code{static}'': they are defined in
7967 the implementation file (@file{*.c}) and available only from there. They
7968 are meant to be used from the grammar's epilogue.
7970 @deftypefun {static int} yyreport_syntax_error (@code{const yypcontext_t *}@var{ctx})
7971 Report a syntax error to the user. Return 0 on success, @code{YYENOMEM} on
7972 memory exhaustion. Whether it uses @code{yyerror} is up to the user.
7975 Use the following types and functions to build the error message.
7977 @deffn {Type} yypcontext_t
7978 An opaque type that captures the circumstances of the syntax error.
7981 @deffn {Type} yysymbol_kind_t
7982 An enum of all the grammar symbols, tokens and nonterminals. Its
7983 enumerators are forged from the symbol names:
7986 enum yysymbol_kind_t
7988 YYSYMBOL_YYEMPTY = -2, /* No symbol. */
7989 YYSYMBOL_YYEOF = 0, /* "end of file" */
7990 YYSYMBOL_YYerror = 1, /* error */
7991 YYSYMBOL_YYUNDEF = 2, /* "invalid token" */
7992 YYSYMBOL_PLUS = 3, /* "+" */
7993 YYSYMBOL_MINUS = 4, /* "-" */
7995 YYSYMBOL_VAR = 14, /* "variable" */
7996 YYSYMBOL_NEG = 15, /* NEG */
7997 YYSYMBOL_YYACCEPT = 16, /* $accept */
7998 YYSYMBOL_exp = 17, /* exp */
7999 YYSYMBOL_input = 18 /* input */
8001 typedef enum yysymbol_kind_t yysymbol_kind_t;
8005 @deftypefun {static yysymbol_kind_t} yypcontext_token (@code{const yypcontext_t *}@var{ctx})
8006 The ``unexpected'' token: the symbol kind of the lookahead token that caused
8007 the syntax error. Returns @code{YYSYMBOL_YYEMPTY} if there is no lookahead.
8010 @deftypefun {static YYLTYPE *} yypcontext_location (@code{const yypcontext_t *}@var{ctx})
8011 The location of the syntax error (that of the unexpected token).
8014 @deftypefun {static int} yypcontext_expected_tokens (@code{const yypcontext_t *}ctx, @code{yysymbol_kind_t} @var{argv}@code{[]}, @code{int} @var{argc})
8015 Fill @var{argv} with the expected tokens, which never includes
8016 @code{YYSYMBOL_YYEMPTY}, @code{YYSYMBOL_YYerror}, or
8017 @code{YYSYMBOL_YYUNDEF}.
8019 Never put more than @var{argc} elements into @var{argv}, and on success
8020 return the number of tokens stored in @var{argv}. If there are more
8021 expected tokens than @var{argc}, fill @var{argv} up to @var{argc} and return
8022 0. If there are no expected tokens, also return 0, but set @code{argv[0]}
8023 to @code{YYSYMBOL_YYEMPTY}.
8025 When LAC is enabled, may return a negative number on errors,
8026 such as @code{YYENOMEM} on memory exhaustion.
8028 If @var{argv} is null, return the size needed to store all the possible
8029 values, which is always less than @code{YYNTOKENS}.
8032 @deftypefun {static const char *} yysymbol_name (@code{symbol_kind_t} @var{symbol})
8033 The name of the symbol whose kind is @var{symbol}, possibly translated.
8036 A custom syntax error function looks as follows. This implementation is
8037 inappropriate for internationalization, see the @file{c/bistromathic}
8038 example for a better alternative.
8042 yyreport_syntax_error (const yypcontext_t *ctx)
8045 YYLOCATION_PRINT (stderr, *yypcontext_location (ctx));
8046 fprintf (stderr, ": syntax error");
8047 // Report the tokens expected at this point.
8049 enum @{ TOKENMAX = 5 @};
8050 yysymbol_kind_t expected[TOKENMAX];
8051 int n = yypcontext_expected_tokens (ctx, expected, TOKENMAX);
8053 // Forward errors to yyparse.
8056 for (int i = 0; i < n; ++i)
8057 fprintf (stderr, "%s %s",
8058 i == 0 ? ": expected" : " or", yysymbol_name (expected[i]));
8060 // Report the unexpected token.
8062 yysymbol_kind_t lookahead = yypcontext_token (ctx);
8063 if (lookahead != YYSYMBOL_YYEMPTY)
8064 fprintf (stderr, " before %s", yysymbol_name (lookahead));
8066 fprintf (stderr, "\n");
8071 You still must provide a @code{yyerror} function, used for instance to
8072 report memory exhaustion.
8074 @node Action Features
8075 @section Special Features for Use in Actions
8076 @cindex summary, action features
8077 @cindex action features summary
8079 Here is a table of Bison constructs, variables and macros that are useful in
8082 @deffn {Variable} $$
8083 Acts like a variable that contains the semantic value for the
8084 grouping made by the current rule. @xref{Actions}.
8087 @deffn {Variable} $@var{n}
8088 Acts like a variable that contains the semantic value for the
8089 @var{n}th component of the current rule. @xref{Actions}.
8092 @deffn {Variable} $<@var{typealt}>$
8093 Like @code{$$} but specifies alternative @var{typealt} in the union
8094 specified by the @code{%union} declaration. @xref{Action Types}.
8097 @deffn {Variable} $<@var{typealt}>@var{n}
8098 Like @code{$@var{n}} but specifies alternative @var{typealt} in the
8099 union specified by the @code{%union} declaration.
8100 @xref{Action Types}.
8103 @deffn {Macro} YYABORT @code{;}
8104 Return immediately from @code{yyparse}, indicating failure.
8105 @xref{Parser Function}.
8108 @deffn {Macro} YYACCEPT @code{;}
8109 Return immediately from @code{yyparse}, indicating success.
8110 @xref{Parser Function}.
8113 @deffn {Macro} YYBACKUP (@var{token}, @var{value})@code{;}
8115 Unshift a token. This macro is allowed only for rules that reduce
8116 a single value, and only when there is no lookahead token.
8117 It is also disallowed in GLR parsers.
8118 It installs a lookahead token with token kind @var{token} and
8119 semantic value @var{value}; then it discards the value that was
8120 going to be reduced by this rule.
8122 If the macro is used when it is not valid, such as when there is
8123 a lookahead token already, then it reports a syntax error with
8124 a message @samp{cannot back up} and performs ordinary error
8127 In either case, the rest of the action is not executed.
8130 @deffn {Value} YYEMPTY
8131 Value stored in @code{yychar} when there is no lookahead token.
8134 @deffn {Value} YYEOF
8135 Value stored in @code{yychar} when the lookahead is the end of the input
8139 @deffn {Macro} YYERROR @code{;}
8140 Cause an immediate syntax error. This statement initiates error
8141 recovery just as if the parser itself had detected an error; however, it
8142 does not call @code{yyerror}, and does not print any message. If you
8143 want to print an error message, call @code{yyerror} explicitly before
8144 the @samp{YYERROR;} statement. @xref{Error Recovery}.
8147 @deffn {Macro} YYNOMEM @code{;}
8148 Return immediately from @code{yyparse}, indicating memory exhaustion.
8149 @xref{Parser Function}.
8152 @deffn {Macro} YYRECOVERING
8153 @findex YYRECOVERING
8154 The expression @code{YYRECOVERING ()} yields 1 when the parser
8155 is recovering from a syntax error, and 0 otherwise.
8156 @xref{Error Recovery}.
8159 @deffn {Variable} yychar
8160 Variable containing either the lookahead token, or @code{YYEOF} when the
8161 lookahead is the end of the input stream, or @code{YYEMPTY} when no lookahead
8162 has been performed so the next token is not yet known.
8163 Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic
8168 @deffn {Macro} yyclearin @code{;}
8169 Discard the current lookahead token. This is useful primarily in
8171 Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR
8173 @xref{Error Recovery}.
8176 @deffn {Macro} yyerrok @code{;}
8177 Resume generating error messages immediately for subsequent syntax
8178 errors. This is useful primarily in error rules.
8179 @xref{Error Recovery}.
8182 @deffn {Variable} yylloc
8183 Variable containing the lookahead token location when @code{yychar} is not set
8184 to @code{YYEMPTY} or @code{YYEOF}.
8185 Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic
8187 @xref{Actions and Locations}.
8190 @deffn {Variable} yylval
8191 Variable containing the lookahead token semantic value when @code{yychar} is
8192 not set to @code{YYEMPTY} or @code{YYEOF}.
8193 Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic
8199 Acts like a structure variable containing information on the textual
8200 location of the grouping made by the current rule. @xref{Tracking
8203 @c Check if those paragraphs are still useful or not.
8207 @c int first_line, last_line;
8208 @c int first_column, last_column;
8212 @c Thus, to get the starting line number of the third component, you would
8213 @c use @samp{@@3.first_line}.
8215 @c In order for the members of this structure to contain valid information,
8216 @c you must make @code{yylex} supply this information about each token.
8217 @c If you need only certain members, then @code{yylex} need only fill in
8220 @c The use of this feature makes the parser noticeably slower.
8223 @deffn {Value} @@@var{n}
8225 Acts like a structure variable containing information on the textual
8226 location of the @var{n}th component of the current rule. @xref{Tracking
8230 @node Internationalization
8231 @section Parser Internationalization
8232 @cindex internationalization
8238 A Bison-generated parser can print diagnostics, including error and
8239 tracing messages. By default, they appear in English. However, Bison
8240 also supports outputting diagnostics in the user's native language. To
8241 make this work, the user should set the usual environment variables.
8242 @xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}.
8243 For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might
8244 set the user's locale to French Canadian using the UTF-8
8245 encoding. The exact set of available locales depends on the user's
8249 * Enabling I18n:: Preparing your project to support internationalization.
8250 * Token I18n:: Preparing tokens for internationalization in error messages.
8254 @subsection Enabling Internationalization
8256 The maintainer of a package that uses a Bison-generated parser enables
8257 the internationalization of the parser's output through the following
8258 steps. Here we assume a package that uses GNU Autoconf and
8263 @cindex bison-i18n.m4
8264 Into the directory containing the GNU Autoconf macros used
8265 by the package ---often called @file{m4}--- copy the
8266 @file{bison-i18n.m4} file installed by Bison under
8267 @samp{share/aclocal/bison-i18n.m4} in Bison's installation directory.
8271 cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4
8276 @vindex BISON_LOCALEDIR
8277 @vindex YYENABLE_NLS
8278 In the top-level @file{configure.ac}, after the @code{AM_GNU_GETTEXT}
8279 invocation, add an invocation of @code{BISON_I18N}. This macro is
8280 defined in the file @file{bison-i18n.m4} that you copied earlier. It
8281 causes @code{configure} to find the value of the
8282 @code{BISON_LOCALEDIR} variable, and it defines the source-language
8283 symbol @code{YYENABLE_NLS} to enable translations in the
8284 Bison-generated parser.
8287 In the @code{main} function of your program, designate the directory
8288 containing Bison's runtime message catalog, through a call to
8289 @samp{bindtextdomain} with domain name @samp{bison-runtime}.
8293 bindtextdomain ("bison-runtime", BISON_LOCALEDIR);
8296 Typically this appears after any other call @code{bindtextdomain
8297 (PACKAGE, LOCALEDIR)} that your package already has. Here we rely on
8298 @samp{BISON_LOCALEDIR} to be defined as a string through the
8302 In the @file{Makefile.am} that controls the compilation of the @code{main}
8303 function, make @samp{BISON_LOCALEDIR} available as a C preprocessor macro,
8304 either in @samp{DEFS} or in @samp{AM_CPPFLAGS}. For example:
8307 DEFS = @@DEFS@@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
8313 AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
8317 Finally, invoke the command @command{autoreconf} to generate the build
8322 @subsection Token Internationalization
8324 When the @code{%define} variable @code{parse.error} is set to @code{custom}
8325 or @code{detailed}, token aliases can be internationalized:
8329 '\n' _("end of line")
8337 The remainder of the grammar may freely use either the token symbol
8338 (@code{FUN}) or its alias (@code{"function"}), but not with the
8339 internationalization marker (@code{_("function")}).
8341 If at least one token alias is internationalized, then the generated parser
8342 will use both @code{N_} and @code{_}, that must be defined
8343 (@pxref{Programmers, , The Programmer’s View, gettext, GNU @code{gettext}
8344 utilities}). They are used only on string aliases marked for translation.
8345 In other words, even if your catalog features a translation for
8346 ``function'', then with
8356 ``function'' will appear untranslated in debug traces and error messages.
8358 Unless defined by the user, the end-of-file token, @code{YYEOF}, is provided
8359 ``end of file'' as an alias. It is also internationalized if the user
8360 internationalized tokens. To map it to another string, use:
8363 %token END 0 _("end of input")
8368 @chapter The Bison Parser Algorithm
8369 @cindex Bison parser algorithm
8370 @cindex algorithm of parser
8373 @cindex parser stack
8374 @cindex stack, parser
8376 As Bison reads tokens, it pushes them onto a stack along with their
8377 semantic values. The stack is called the @dfn{parser stack}. Pushing a
8378 token is traditionally called @dfn{shifting}.
8380 For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
8381 @samp{3} to come. The stack will have four elements, one for each token
8384 But the stack does not always have an element for each token read. When
8385 the last @var{n} tokens and groupings shifted match the components of a
8386 grammar rule, they can be combined according to that rule. This is called
8387 @dfn{reduction}. Those tokens and groupings are replaced on the stack by a
8388 single grouping whose symbol is the result (left hand side) of that rule.
8389 Running the rule's action is part of the process of reduction, because this
8390 is what computes the semantic value of the resulting grouping.
8392 For example, if the infix calculator's parser stack contains this:
8399 and the next input token is a newline character, then the last three
8400 elements can be reduced to 15 via the rule:
8403 expr: expr '*' expr;
8407 Then the stack contains just these three elements:
8414 At this point, another reduction can be made, resulting in the single value
8415 16. Then the newline token can be shifted.
8417 The parser tries, by shifts and reductions, to reduce the entire input down
8418 to a single grouping whose symbol is the grammar's start-symbol
8419 (@pxref{Language and Grammar}).
8421 This kind of parser is known in the literature as a bottom-up parser.
8424 * Lookahead:: Parser looks one token ahead when deciding what to do.
8425 * Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
8426 * Precedence:: Operator precedence works by resolving conflicts.
8427 * Contextual Precedence:: When an operator's precedence depends on context.
8428 * Parser States:: The parser is a finite-state-machine with stack.
8429 * Reduce/Reduce:: When two rules are applicable in the same situation.
8430 * Mysterious Conflicts:: Conflicts that look unjustified.
8431 * Tuning LR:: How to tune fundamental aspects of LR-based parsing.
8432 * Generalized LR Parsing:: Parsing arbitrary context-free grammars.
8433 * Memory Management:: What happens when memory is exhausted. How to avoid it.
8437 @section Lookahead Tokens
8438 @cindex lookahead token
8440 The Bison parser does @emph{not} always reduce immediately as soon as the
8441 last @var{n} tokens and groupings match a rule. This is because such a
8442 simple strategy is inadequate to handle most languages. Instead, when a
8443 reduction is possible, the parser sometimes ``looks ahead'' at the next
8444 token in order to decide what to do.
8446 When a token is read, it is not immediately shifted; first it becomes the
8447 @dfn{lookahead token}, which is not on the stack. Now the parser can
8448 perform one or more reductions of tokens and groupings on the stack, while
8449 the lookahead token remains off to the side. When no more reductions
8450 should take place, the lookahead token is shifted onto the stack. This
8451 does not mean that all possible reductions have been done; depending on the
8452 token kind of the lookahead token, some rules may choose to delay their
8455 Here is a simple case where lookahead is needed. These three rules define
8456 expressions which contain binary addition operators and postfix unary
8457 factorial operators (@samp{!}), and allow parentheses for grouping.
8476 Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
8477 should be done? If the following token is @samp{)}, then the first three
8478 tokens must be reduced to form an @code{expr}. This is the only valid
8479 course, because shifting the @samp{)} would produce a sequence of symbols
8480 @w{@code{term ')'}}, and no rule allows this.
8482 If the following token is @samp{!}, then it must be shifted immediately so
8483 that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the
8484 parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
8485 @code{expr}. It would then be impossible to shift the @samp{!} because
8486 doing so would produce on the stack the sequence of symbols @code{expr
8487 '!'}. No rule allows that sequence.
8492 The lookahead token is stored in the variable @code{yychar}. Its semantic
8493 value and location, if any, are stored in the variables @code{yylval} and
8494 @code{yylloc}. @xref{Action Features}.
8497 @section Shift/Reduce Conflicts
8499 @cindex shift/reduce conflicts
8500 @cindex dangling @code{else}
8501 @cindex @code{else}, dangling
8503 Suppose we are parsing a language which has if-then and if-then-else
8504 statements, with a pair of rules like this:
8509 "if" expr "then" stmt
8510 | "if" expr "then" stmt "else" stmt
8516 Here @code{"if"}, @code{"then"} and @code{"else"} are terminal symbols for
8517 specific keyword tokens.
8519 When the @code{"else"} token is read and becomes the lookahead token, the
8520 contents of the stack (assuming the input is valid) are just right for
8521 reduction by the first rule. But it is also legitimate to shift the
8522 @code{"else"}, because that would lead to eventual reduction by the second
8525 This situation, where either a shift or a reduction would be valid, is
8526 called a @dfn{shift/reduce conflict}. Bison is designed to resolve
8527 these conflicts by choosing to shift, unless otherwise directed by
8528 operator precedence declarations. To see the reason for this, let's
8529 contrast it with the other alternative.
8531 Since the parser prefers to shift the @code{"else"}, the result is to attach
8532 the else-clause to the innermost if-statement, making these two inputs
8536 if x then if y then win; else lose;
8538 if x then do; if y then win; else lose; end;
8541 But if the parser chose to reduce when possible rather than shift, the
8542 result would be to attach the else-clause to the outermost if-statement,
8543 making these two inputs equivalent:
8546 if x then if y then win; else lose;
8548 if x then do; if y then win; end; else lose;
8551 The conflict exists because the grammar as written is ambiguous: either
8552 parsing of the simple nested if-statement is legitimate. The established
8553 convention is that these ambiguities are resolved by attaching the
8554 else-clause to the innermost if-statement; this is what Bison accomplishes
8555 by choosing to shift rather than reduce. (It would ideally be cleaner to
8556 write an unambiguous grammar, but that is very hard to do in this case.)
8557 This particular ambiguity was first encountered in the specifications of
8558 Algol 60 and is called the ``dangling @code{else}'' ambiguity.
8560 To assist the grammar author in understanding the nature of each conflict,
8561 Bison can be asked to generate ``counterexamples''. In the present case it
8562 actually even proves that the grammar is ambiguous by exhibiting a string
8563 with two different parses:
8565 @macro danglingElseCex
8568 Example: @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @blue{"else" stmt}
8571 @yellow{↳ 3: "if" expr "then"} @green{stmt}
8572 @green{↳ 2:} @blue{if_stmt}
8573 @blue{↳ 4: "if" expr "then" stmt} @red{•} @blue{"else" stmt}
8574 Example: @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @yellow{"else" stmt}
8577 @yellow{↳ 4: "if" expr "then"} @green{stmt} @yellow{"else" stmt}
8578 @green{↳ 2:} @blue{if_stmt}
8579 @blue{↳ 3: "if" expr "then" stmt} @red{•}
8582 Example: @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @blue{"else" stmt}
8585 @yellow{@arrow{} 3: "if" expr "then"} @green{stmt}
8586 @green{@arrow{} 2:} @blue{if_stmt}
8587 @blue{@arrow{} 4: "if" expr "then" stmt} @red{•} @blue{"else" stmt}
8588 Example: @yellow{"if" expr "then"} @blue{"if" expr "then" stmt} @red{•} @yellow{"else" stmt}
8591 @yellow{@arrow{} 4: "if" expr "then"} @green{stmt} @yellow{"else" stmt}
8592 @green{@arrow{} 2:} @blue{if_stmt}
8593 @blue{@arrow{} 3: "if" expr "then" stmt} @red{•}
8602 @xref{Counterexamples}, for more details.
8606 To avoid warnings from Bison about predictable, @emph{legitimate} shift/reduce
8607 conflicts, you can use the @code{%expect @var{n}} declaration.
8608 There will be no warning as long as the number of shift/reduce conflicts
8609 is exactly @var{n}, and Bison will report an error if there is a
8611 @xref{Expect Decl}. However, we don't
8612 recommend the use of @code{%expect} (except @samp{%expect 0}!), as an equal
8613 number of conflicts does not mean that they are the @emph{same}. When
8614 possible, you should rather use precedence directives to @emph{fix} the
8615 conflicts explicitly (@pxref{Non Operators}).
8617 The definition of @code{if_stmt} above is solely to blame for the
8618 conflict, but the conflict does not actually appear without additional
8619 rules. Here is a complete Bison grammar file that actually manifests
8633 "if" expr "then" stmt
8634 | "if" expr "then" stmt "else" stmt
8644 @section Operator Precedence
8645 @cindex operator precedence
8646 @cindex precedence of operators
8648 Another situation where shift/reduce conflicts appear is in arithmetic
8649 expressions. Here shifting is not always the preferred resolution; the
8650 Bison declarations for operator precedence allow you to specify when to
8651 shift and when to reduce.
8654 * Why Precedence:: An example showing why precedence is needed.
8655 * Using Precedence:: How to specify precedence and associativity.
8656 * Precedence Only:: How to specify precedence only.
8657 * Precedence Examples:: How these features are used in the previous example.
8658 * How Precedence:: How they work.
8659 * Non Operators:: Using precedence for general conflicts.
8662 @node Why Precedence
8663 @subsection When Precedence is Needed
8665 Consider the following ambiguous grammar fragment (ambiguous because the
8666 input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
8681 Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
8682 should it reduce them via the rule for the subtraction operator? It
8683 depends on the next token. Of course, if the next token is @samp{)}, we
8684 must reduce; shifting is invalid because no single rule can reduce the
8685 token sequence @w{@samp{- 2 )}} or anything starting with that. But if
8686 the next token is @samp{*} or @samp{<}, we have a choice: either
8687 shifting or reduction would allow the parse to complete, but with
8690 To decide which one Bison should do, we must consider the results. If
8691 the next operator token @var{op} is shifted, then it must be reduced
8692 first in order to permit another opportunity to reduce the difference.
8693 The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other
8694 hand, if the subtraction is reduced before shifting @var{op}, the result
8695 is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or
8696 reduce should depend on the relative precedence of the operators
8697 @samp{-} and @var{op}: @samp{*} should be shifted first, but not
8700 @cindex associativity
8701 What about input such as @w{@samp{1 - 2 - 5}}; should this be
8702 @w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most
8703 operators we prefer the former, which is called @dfn{left association}.
8704 The latter alternative, @dfn{right association}, is desirable for
8705 assignment operators. The choice of left or right association is a
8706 matter of whether the parser chooses to shift or reduce when the stack
8707 contains @w{@samp{1 - 2}} and the lookahead token is @samp{-}: shifting
8708 makes right-associativity.
8710 @node Using Precedence
8711 @subsection Specifying Operator Precedence
8717 Bison allows you to specify these choices with the operator precedence
8718 declarations @code{%left} and @code{%right}. Each such declaration
8719 contains a list of tokens, which are operators whose precedence and
8720 associativity is being declared. The @code{%left} declaration makes all
8721 those operators left-associative and the @code{%right} declaration makes
8722 them right-associative. A third alternative is @code{%nonassoc}, which
8723 declares that it is a syntax error to find the same operator twice ``in a
8725 The last alternative, @code{%precedence}, allows to define only
8726 precedence and no associativity at all. As a result, any
8727 associativity-related conflict that remains will be reported as an
8728 compile-time error. The directive @code{%nonassoc} creates run-time
8729 error: using the operator in a associative way is a syntax error. The
8730 directive @code{%precedence} creates compile-time errors: an operator
8731 @emph{can} be involved in an associativity-related conflict, contrary to
8732 what expected the grammar author.
8734 The relative precedence of different operators is controlled by the
8735 order in which they are declared. The first precedence/associativity
8736 declaration in the file declares the operators whose
8737 precedence is lowest, the next such declaration declares the operators
8738 whose precedence is a little higher, and so on.
8740 @node Precedence Only
8741 @subsection Specifying Precedence Only
8744 Since POSIX Yacc defines only @code{%left}, @code{%right}, and
8745 @code{%nonassoc}, which all defines precedence and associativity, little
8746 attention is paid to the fact that precedence cannot be defined without
8747 defining associativity. Yet, sometimes, when trying to solve a
8748 conflict, precedence suffices. In such a case, using @code{%left},
8749 @code{%right}, or @code{%nonassoc} might hide future (associativity
8750 related) conflicts that would remain hidden.
8752 The dangling @code{else} ambiguity (@pxref{Shift/Reduce}) can be solved
8753 explicitly. This shift/reduce conflicts occurs in the following situation,
8754 where the period denotes the current parsing state:
8757 if @var{e1} then if @var{e2} then @var{s1} • else @var{s2}
8760 The conflict involves the reduction of the rule @samp{IF expr THEN
8761 stmt}, which precedence is by default that of its last token
8762 (@code{THEN}), and the shifting of the token @code{ELSE}. The usual
8763 disambiguation (attach the @code{else} to the closest @code{if}),
8764 shifting must be preferred, i.e., the precedence of @code{ELSE} must be
8765 higher than that of @code{THEN}. But neither is expected to be involved
8766 in an associativity related conflict, which can be specified as follows.
8773 The unary-minus is another typical example where associativity is usually
8774 over-specified, see @ref{Infix Calc}. The @code{%left} directive is
8775 traditionally used to declare the precedence of @code{NEG}, which is more
8776 than needed since it also defines its associativity. While this is harmless
8777 in the traditional example, who knows how @code{NEG} might be used in future
8778 evolutions of the grammar@dots{}
8780 @node Precedence Examples
8781 @subsection Precedence Examples
8783 In our example, we would want the following declarations:
8791 In a more complete example, which supports other operators as well, we
8792 would declare them in groups of equal precedence. For example, @code{'+'} is
8793 declared with @code{'-'}:
8796 %left '<' '>' '=' "!=" "<=" ">="
8801 @node How Precedence
8802 @subsection How Precedence Works
8804 The first effect of the precedence declarations is to assign precedence
8805 levels to the terminal symbols declared. The second effect is to assign
8806 precedence levels to certain rules: each rule gets its precedence from
8807 the last terminal symbol mentioned in the components. (You can also
8808 specify explicitly the precedence of a rule. @xref{Contextual
8811 Finally, the resolution of conflicts works by comparing the precedence
8812 of the rule being considered with that of the lookahead token. If the
8813 token's precedence is higher, the choice is to shift. If the rule's
8814 precedence is higher, the choice is to reduce. If they have equal
8815 precedence, the choice is made based on the associativity of that
8816 precedence level. The verbose output file made by @option{-v}
8817 (@pxref{Invocation}) says how each conflict was
8820 Not all rules and not all tokens have precedence. If either the rule or
8821 the lookahead token has no precedence, then the default is to shift.
8824 @subsection Using Precedence For Non Operators
8826 Using properly precedence and associativity directives can help fixing
8827 shift/reduce conflicts that do not involve arithmetic-like operators. For
8828 instance, the ``dangling @code{else}'' problem (@pxref{Shift/Reduce}) can be
8829 solved elegantly in two different ways.
8831 In the present case, the conflict is between the token @code{"else"} willing
8832 to be shifted, and the rule @samp{if_stmt: "if" expr "then" stmt}, asking
8833 for reduction. By default, the precedence of a rule is that of its last
8834 token, here @code{"then"}, so the conflict will be solved appropriately
8835 by giving @code{"else"} a precedence higher than that of @code{"then"}, for
8836 instance as follows:
8845 Alternatively, you may give both tokens the same precedence, in which case
8846 associativity is used to solve the conflict. To preserve the shift action,
8847 use right associativity:
8850 %right "then" "else"
8853 Neither solution is perfect however. Since Bison does not provide, so far,
8854 ``scoped'' precedence, both force you to declare the precedence
8855 of these keywords with respect to the other operators your grammar.
8856 Therefore, instead of being warned about new conflicts you would be unaware
8857 of (e.g., a shift/reduce conflict due to @samp{if test then 1 else 2 + 3}
8858 being ambiguous: @samp{if test then 1 else (2 + 3)} or @samp{(if test then 1
8859 else 2) + 3}?), the conflict will be already ``fixed''.
8861 @node Contextual Precedence
8862 @section Context-Dependent Precedence
8863 @cindex context-dependent precedence
8864 @cindex unary operator precedence
8865 @cindex precedence, context-dependent
8866 @cindex precedence, unary operator
8869 Often the precedence of an operator depends on the context. This sounds
8870 outlandish at first, but it is really very common. For example, a minus
8871 sign typically has a very high precedence as a unary operator, and a
8872 somewhat lower precedence (lower than multiplication) as a binary operator.
8874 The Bison precedence declarations
8875 can only be used once for a given token; so a token has
8876 only one precedence declared in this way. For context-dependent
8877 precedence, you need to use an additional mechanism: the @code{%prec}
8880 The @code{%prec} modifier declares the precedence of a particular rule by
8881 specifying a terminal symbol whose precedence should be used for that rule.
8882 It's not necessary for that symbol to appear otherwise in the rule. The
8883 modifier's syntax is:
8886 %prec @var{terminal-symbol}
8890 and it is written after the components of the rule. Its effect is to
8891 assign the rule the precedence of @var{terminal-symbol}, overriding
8892 the precedence that would be deduced for it in the ordinary way. The
8893 altered rule precedence then affects how conflicts involving that rule
8894 are resolved (@pxref{Precedence}).
8896 Here is how @code{%prec} solves the problem of unary minus. First, declare
8897 a precedence for a fictitious terminal symbol named @code{UMINUS}. There
8898 are no tokens of this type, but the symbol serves to stand for its
8908 Now the precedence of @code{UMINUS} can be used in specific rules:
8916 | '-' exp %prec UMINUS
8921 If you forget to append @code{%prec UMINUS} to the rule for unary
8922 minus, Bison silently assumes that minus has its usual precedence.
8923 This kind of problem can be tricky to debug, since one typically
8924 discovers the mistake only by testing the code.
8926 The @code{%no-default-prec;} declaration makes it easier to discover
8927 this kind of problem systematically. It causes rules that lack a
8928 @code{%prec} modifier to have no precedence, even if the last terminal
8929 symbol mentioned in their components has a declared precedence.
8931 If @code{%no-default-prec;} is in effect, you must specify @code{%prec}
8932 for all rules that participate in precedence conflict resolution.
8933 Then you will see any shift/reduce conflict until you tell Bison how
8934 to resolve it, either by changing your grammar or by adding an
8935 explicit precedence. This will probably add declarations to the
8936 grammar, but it helps to protect against incorrect rule precedences.
8938 The effect of @code{%no-default-prec;} can be reversed by giving
8939 @code{%default-prec;}, which is the default.
8943 @section Parser States
8944 @cindex finite-state machine
8945 @cindex parser state
8946 @cindex state (of parser)
8948 The function @code{yyparse} is implemented using a finite-state machine.
8949 The values pushed on the parser stack are not simply token kind codes; they
8950 represent the entire sequence of terminal and nonterminal symbols at or
8951 near the top of the stack. The current state collects all the information
8952 about previous input which is relevant to deciding what to do next.
8954 Each time a lookahead token is read, the current parser state together with
8955 the kind of lookahead token are looked up in a table. This table entry can
8956 say, ``Shift the lookahead token.'' In this case, it also specifies the new
8957 parser state, which is pushed onto the top of the parser stack. Or it can
8958 say, ``Reduce using rule number @var{n}.'' This means that a certain number
8959 of tokens or groupings are taken off the top of the stack, and replaced by
8960 one grouping. In other words, that number of states are popped from the
8961 stack, and one new state is pushed.
8963 There is one other alternative: the table can say that the lookahead token
8964 is erroneous in the current state. This causes error processing to begin
8965 (@pxref{Error Recovery}).
8968 @section Reduce/Reduce Conflicts
8969 @cindex reduce/reduce conflict
8970 @cindex conflicts, reduce/reduce
8972 A reduce/reduce conflict occurs if there are two or more rules that apply
8973 to the same sequence of input. This usually indicates a serious error
8976 For example, here is an erroneous attempt to define a sequence
8977 of zero or more @code{word} groupings.
8982 %empty @{ printf ("empty sequence\n"); @}
8984 | sequence word @{ printf ("added word %s\n", $2); @}
8990 %empty @{ printf ("empty maybeword\n"); @}
8991 | word @{ printf ("single word %s\n", $1); @}
8997 The error is an ambiguity: as counterexample generation would demonstrate
8998 (@pxref{Counterexamples}), there is more than one way to parse a single
8999 @code{word} into a @code{sequence}. It could be reduced to a
9000 @code{maybeword} and then into a @code{sequence} via the second rule.
9001 Alternatively, nothing-at-all could be reduced into a @code{sequence}
9002 via the first rule, and this could be combined with the @code{word}
9003 using the third rule for @code{sequence}.
9005 There is also more than one way to reduce nothing-at-all into a
9006 @code{sequence}. This can be done directly via the first rule,
9007 or indirectly via @code{maybeword} and then the second rule.
9009 You might think that this is a distinction without a difference, because it
9010 does not change whether any particular input is valid or not. But it does
9011 affect which actions are run. One parsing order runs the second rule's
9012 action; the other runs the first rule's action and the third rule's action.
9013 In this example, the output of the program changes.
9015 Bison resolves a reduce/reduce conflict by choosing to use the rule that
9016 appears first in the grammar, but it is very risky to rely on this. Every
9017 reduce/reduce conflict must be studied and usually eliminated. Here is the
9018 proper way to define @code{sequence}:
9023 %empty @{ printf ("empty sequence\n"); @}
9024 | sequence word @{ printf ("added word %s\n", $2); @}
9029 Here is another common error that yields a reduce/reduce conflict:
9036 | sequence redirects
9050 | redirects redirect
9056 The intention here is to define a sequence which can contain either
9057 @code{word} or @code{redirect} groupings. The individual definitions of
9058 @code{sequence}, @code{words} and @code{redirects} are error-free, but the
9059 three together make a subtle ambiguity: even an empty input can be parsed
9060 in infinitely many ways!
9062 Consider: nothing-at-all could be a @code{words}. Or it could be two
9063 @code{words} in a row, or three, or any number. It could equally well be a
9064 @code{redirects}, or two, or any number. Or it could be a @code{words}
9065 followed by three @code{redirects} and another @code{words}. And so on.
9067 Here are two ways to correct these rules. First, to make it a single level
9078 Second, to prevent either a @code{words} or a @code{redirects}
9086 | sequence redirects
9100 | redirects redirect
9105 Yet this proposal introduces another kind of ambiguity! The input
9106 @samp{word word} can be parsed as a single @code{words} composed of two
9107 @samp{word}s, or as two one-@code{word} @code{words} (and likewise for
9108 @code{redirect}/@code{redirects}). However this ambiguity is now a
9109 shift/reduce conflict, and therefore it can now be addressed with precedence
9112 To simplify the matter, we will proceed with @code{word} and @code{redirect}
9113 being tokens: @code{"word"} and @code{"redirect"}.
9115 To prefer the longest @code{words}, the conflict between the token
9116 @code{"word"} and the rule @samp{sequence: sequence words} must be resolved
9117 as a shift. To this end, we use the same techniques as exposed above, see
9118 @ref{Non Operators}. One solution
9119 relies on precedences: use @code{%prec} to give a lower precedence to the
9124 %precedence "sequence"
9129 | sequence word %prec "sequence"
9130 | sequence redirect %prec "sequence"
9142 Another solution relies on associativity: provide both the token and the
9143 rule with the same precedence, but make them right-associative:
9146 %right "word" "redirect"
9151 | sequence word %prec "word"
9152 | sequence redirect %prec "redirect"
9157 @node Mysterious Conflicts
9158 @section Mysterious Conflicts
9159 @cindex Mysterious Conflicts
9161 Sometimes reduce/reduce conflicts can occur that don't look warranted.
9167 def: param_spec return_spec ',';
9170 | name_list ':' type
9187 | name ',' name_list
9192 It would seem that this grammar can be parsed with only a single token of
9193 lookahead: when a @code{param_spec} is being read, an @code{"id"} is a
9194 @code{name} if a comma or colon follows, or a @code{type} if another
9195 @code{"id"} follows. In other words, this grammar is LR(1). Yet Bison
9196 finds one reduce/reduce conflict, for which counterexample generation
9197 (@pxref{Counterexamples}) would find a @emph{nonunifying} example.
9201 This is because Bison does not handle all LR(1) grammars @emph{by default},
9202 for historical reasons.
9203 In this grammar, two contexts, that after an @code{"id"} at the beginning
9204 of a @code{param_spec} and likewise at the beginning of a
9205 @code{return_spec}, are similar enough that Bison assumes they are the
9207 They appear similar because the same set of rules would be
9208 active---the rule for reducing to a @code{name} and that for reducing to
9209 a @code{type}. Bison is unable to determine at that stage of processing
9210 that the rules would require different lookahead tokens in the two
9211 contexts, so it makes a single parser state for them both. Combining
9212 the two contexts causes a conflict later. In parser terminology, this
9213 occurrence means that the grammar is not LALR(1).
9216 @cindex canonical LR
9217 For many practical grammars (specifically those that fall into the non-LR(1)
9218 class), the limitations of LALR(1) result in difficulties beyond just
9219 mysterious reduce/reduce conflicts. The best way to fix all these problems
9220 is to select a different parser table construction algorithm. Either
9221 IELR(1) or canonical LR(1) would suffice, but the former is more efficient
9222 and easier to debug during development. @xref{LR Table Construction}, for
9225 If you instead wish to work around LALR(1)'s limitations, you
9226 can often fix a mysterious conflict by identifying the two parser states
9227 that are being confused, and adding something to make them look
9228 distinct. In the above example, adding one rule to
9229 @code{return_spec} as follows makes the problem go away:
9237 | "id" "bogus" /* This rule is never used. */
9242 This corrects the problem because it introduces the possibility of an
9243 additional active rule in the context after the @code{"id"} at the beginning of
9244 @code{return_spec}. This rule is not active in the corresponding context
9245 in a @code{param_spec}, so the two contexts receive distinct parser states.
9246 As long as the token @code{"bogus"} is never generated by @code{yylex},
9247 the added rule cannot alter the way actual input is parsed.
9249 In this particular example, there is another way to solve the problem:
9250 rewrite the rule for @code{return_spec} to use @code{"id"} directly
9251 instead of via @code{name}. This also causes the two confusing
9252 contexts to have different sets of active rules, because the one for
9253 @code{return_spec} activates the altered rule for @code{return_spec}
9254 rather than the one for @code{name}.
9260 | name_list ':' type
9272 For a more detailed exposition of LALR(1) parsers and parser generators, see
9273 @tcite{DeRemer 1982}.
9278 The default behavior of Bison's LR-based parsers is chosen mostly for
9279 historical reasons, but that behavior is often not robust. For example, in
9280 the previous section, we discussed the mysterious conflicts that can be
9281 produced by LALR(1), Bison's default parser table construction algorithm.
9282 Another example is Bison's @code{%define parse.error verbose} directive,
9283 which instructs the generated parser to produce verbose syntax error
9284 messages, which can sometimes contain incorrect information.
9286 In this section, we explore several modern features of Bison that allow you
9287 to tune fundamental aspects of the generated LR-based parsers. Some of
9288 these features easily eliminate shortcomings like those mentioned above.
9289 Others can be helpful purely for understanding your parser.
9292 * LR Table Construction:: Choose a different construction algorithm.
9293 * Default Reductions:: Disable default reductions.
9294 * LAC:: Correct lookahead sets in the parser states.
9295 * Unreachable States:: Keep unreachable parser states for debugging.
9298 @node LR Table Construction
9299 @subsection LR Table Construction
9300 @cindex Mysterious Conflict
9303 @cindex canonical LR
9304 @findex %define lr.type
9306 For historical reasons, Bison constructs LALR(1) parser tables by default.
9307 However, LALR does not possess the full language-recognition power of LR.
9308 As a result, the behavior of parsers employing LALR parser tables is often
9309 mysterious. We presented a simple example of this effect in @ref{Mysterious
9312 As we also demonstrated in that example, the traditional approach to
9313 eliminating such mysterious behavior is to restructure the grammar.
9314 Unfortunately, doing so correctly is often difficult. Moreover, merely
9315 discovering that LALR causes mysterious behavior in your parser can be
9318 Fortunately, Bison provides an easy way to eliminate the possibility of such
9319 mysterious behavior altogether. You simply need to activate a more powerful
9320 parser table construction algorithm by using the @code{%define lr.type}
9323 @deffn {Directive} {%define lr.type} @var{type}
9324 Specify the type of parser tables within the LR(1) family. The accepted
9325 values for @var{type} are:
9328 @item @code{lalr} (default)
9330 @item @code{canonical-lr}
9334 For example, to activate IELR, you might add the following directive to you
9338 %define lr.type ielr
9341 @noindent For the example in @ref{Mysterious Conflicts}, the mysterious
9342 conflict is then eliminated, so there is no need to invest time in
9343 comprehending the conflict or restructuring the grammar to fix it. If,
9344 during future development, the grammar evolves such that all mysterious
9345 behavior would have disappeared using just LALR, you need not fear that
9346 continuing to use IELR will result in unnecessarily large parser tables.
9347 That is, IELR generates LALR tables when LALR (using a deterministic parsing
9348 algorithm) is sufficient to support the full language-recognition power of
9349 LR. Thus, by enabling IELR at the start of grammar development, you can
9350 safely and completely eliminate the need to consider LALR's shortcomings.
9352 While IELR is almost always preferable, there are circumstances where LALR
9353 or the canonical LR parser tables described by Knuth @pcite{Knuth 1965} can
9354 be useful. Here we summarize the relative advantages of each parser table
9355 construction algorithm within Bison:
9360 There are at least two scenarios where LALR can be worthwhile:
9363 @item GLR without static conflict resolution.
9365 @cindex GLR with LALR
9366 When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any
9367 conflicts statically (for example, with @code{%left} or @code{%precedence}),
9369 the parser explores all potential parses of any given input. In this case,
9370 the choice of parser table construction algorithm is guaranteed not to alter
9371 the language accepted by the parser. LALR parser tables are the smallest
9372 parser tables Bison can currently construct, so they may then be preferable.
9373 Nevertheless, once you begin to resolve conflicts statically, GLR behaves
9374 more like a deterministic parser in the syntactic contexts where those
9375 conflicts appear, and so either IELR or canonical LR can then be helpful to
9376 avoid LALR's mysterious behavior.
9378 @item Malformed grammars.
9380 Occasionally during development, an especially malformed grammar with a
9381 major recurring flaw may severely impede the IELR or canonical LR parser
9382 table construction algorithm. LALR can be a quick way to construct parser
9383 tables in order to investigate such problems while ignoring the more subtle
9384 differences from IELR and canonical LR.
9389 IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given
9390 any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables
9391 always accept exactly the same set of sentences. However, like LALR, IELR
9392 merges parser states during parser table construction so that the number of
9393 parser states is often an order of magnitude less than for canonical LR.
9394 More importantly, because canonical LR's extra parser states may contain
9395 duplicate conflicts in the case of non-LR grammars, the number of conflicts
9396 for IELR is often an order of magnitude less as well. This effect can
9397 significantly reduce the complexity of developing a grammar.
9401 @cindex delayed syntax error detection
9404 While inefficient, canonical LR parser tables can be an interesting means to
9405 explore a grammar because they possess a property that IELR and LALR tables
9406 do not. That is, if @code{%nonassoc} is not used and default reductions are
9407 left disabled (@pxref{Default Reductions}), then, for every left context of
9408 every canonical LR state, the set of tokens accepted by that state is
9409 guaranteed to be the exact set of tokens that is syntactically acceptable in
9410 that left context. It might then seem that an advantage of canonical LR
9411 parsers in production is that, under the above constraints, they are
9412 guaranteed to detect a syntax error as soon as possible without performing
9413 any unnecessary reductions. However, IELR parsers that use LAC are also
9414 able to achieve this behavior without sacrificing @code{%nonassoc} or
9415 default reductions. For details and a few caveats of LAC, @pxref{LAC}.
9418 For a more detailed exposition of the mysterious behavior in LALR parsers
9419 and the benefits of IELR, see @tcite{Denny 2008}, and @tcite{Denny 2010
9422 @node Default Reductions
9423 @subsection Default Reductions
9424 @cindex default reductions
9425 @findex %define lr.default-reduction
9428 After parser table construction, Bison identifies the reduction with the
9429 largest lookahead set in each parser state. To reduce the size of the
9430 parser state, traditional Bison behavior is to remove that lookahead set and
9431 to assign that reduction to be the default parser action. Such a reduction
9432 is known as a @dfn{default reduction}.
9434 Default reductions affect more than the size of the parser tables. They
9435 also affect the behavior of the parser:
9438 @item Delayed @code{yylex} invocations.
9440 @cindex delayed yylex invocations
9441 @cindex consistent states
9442 @cindex defaulted states
9443 A @dfn{consistent state} is a state that has only one possible parser
9444 action. If that action is a reduction and is encoded as a default
9445 reduction, then that consistent state is called a @dfn{defaulted state}.
9446 Upon reaching a defaulted state, a Bison-generated parser does not bother to
9447 invoke @code{yylex} to fetch the next token before performing the reduction.
9448 In other words, whether default reductions are enabled in consistent states
9449 determines how soon a Bison-generated parser invokes @code{yylex} for a
9450 token: immediately when it @emph{reaches} that token in the input or when it
9451 eventually @emph{needs} that token as a lookahead to determine the next
9452 parser action. Traditionally, default reductions are enabled, and so the
9453 parser exhibits the latter behavior.
9455 The presence of defaulted states is an important consideration when
9456 designing @code{yylex} and the grammar file. That is, if the behavior of
9457 @code{yylex} can influence or be influenced by the semantic actions
9458 associated with the reductions in defaulted states, then the delay of the
9459 next @code{yylex} invocation until after those reductions is significant.
9460 For example, the semantic actions might pop a scope stack that @code{yylex}
9461 uses to determine what token to return. Thus, the delay might be necessary
9462 to ensure that @code{yylex} does not look up the next token in a scope that
9463 should already be considered closed.
9465 @item Delayed syntax error detection.
9467 @cindex delayed syntax error detection
9468 When the parser fetches a new token by invoking @code{yylex}, it checks
9469 whether there is an action for that token in the current parser state. The
9470 parser detects a syntax error if and only if either (1) there is no action
9471 for that token or (2) the action for that token is the error action (due to
9472 the use of @code{%nonassoc}). However, if there is a default reduction in
9473 that state (which might or might not be a defaulted state), then it is
9474 impossible for condition 1 to exist. That is, all tokens have an action.
9475 Thus, the parser sometimes fails to detect the syntax error until it reaches
9479 @c If there's an infinite loop, default reductions can prevent an incorrect
9480 @c sentence from being rejected.
9481 While default reductions never cause the parser to accept syntactically
9482 incorrect sentences, the delay of syntax error detection can have unexpected
9483 effects on the behavior of the parser. However, the delay can be caused
9484 anyway by parser state merging and the use of @code{%nonassoc}, and it can
9485 be fixed by another Bison feature, LAC. We discuss the effects of delayed
9486 syntax error detection and LAC more in the next section (@pxref{LAC}).
9489 For canonical LR, the only default reduction that Bison enables by default
9490 is the accept action, which appears only in the accepting state, which has
9491 no other action and is thus a defaulted state. However, the default accept
9492 action does not delay any @code{yylex} invocation or syntax error detection
9493 because the accept action ends the parse.
9495 For LALR and IELR, Bison enables default reductions in nearly all states by
9496 default. There are only two exceptions. First, states that have a shift
9497 action on the @code{error} token do not have default reductions because
9498 delayed syntax error detection could then prevent the @code{error} token
9499 from ever being shifted in that state. However, parser state merging can
9500 cause the same effect anyway, and LAC fixes it in both cases, so future
9501 versions of Bison might drop this exception when LAC is activated. Second,
9502 GLR parsers do not record the default reduction as the action on a lookahead
9503 token for which there is a conflict. The correct action in this case is to
9504 split the parse instead.
9506 To adjust which states have default reductions enabled, use the
9507 @code{%define lr.default-reduction} directive.
9509 @deffn {Directive} {%define lr.default-reduction} @var{where}
9510 Specify the kind of states that are permitted to contain default reductions.
9511 The accepted values of @var{where} are:
9513 @item @code{most} (default for LALR and IELR)
9514 @item @code{consistent}
9515 @item @code{accepting} (default for canonical LR)
9521 @findex %define parse.lac
9523 @cindex lookahead correction
9525 Canonical LR, IELR, and LALR can suffer from a couple of problems upon
9526 encountering a syntax error. First, the parser might perform additional
9527 parser stack reductions before discovering the syntax error. Such
9528 reductions can perform user semantic actions that are unexpected because
9529 they are based on an invalid token, and they cause error recovery to begin
9530 in a different syntactic context than the one in which the invalid token was
9531 encountered. Second, when verbose error messages are enabled (@pxref{Error
9532 Reporting}), the expected token list in the syntax error message can both
9533 contain invalid tokens and omit valid tokens.
9535 The culprits for the above problems are @code{%nonassoc}, default reductions
9536 in inconsistent states (@pxref{Default Reductions}), and parser state
9537 merging. Because IELR and LALR merge parser states, they suffer the most.
9538 Canonical LR can suffer only if @code{%nonassoc} is used or if default
9539 reductions are enabled for inconsistent states.
9541 LAC (Lookahead Correction) is a new mechanism within the parsing algorithm
9542 that solves these problems for canonical LR, IELR, and LALR without
9543 sacrificing @code{%nonassoc}, default reductions, or state merging. You can
9544 enable LAC with the @code{%define parse.lac} directive.
9546 @deffn {Directive} {%define parse.lac} @var{value}
9547 Enable LAC to improve syntax error handling.
9549 @item @code{none} (default)
9552 This feature is currently only available for deterministic parsers in C and C++.
9555 Conceptually, the LAC mechanism is straight-forward. Whenever the parser
9556 fetches a new token from the scanner so that it can determine the next
9557 parser action, it immediately suspends normal parsing and performs an
9558 exploratory parse using a temporary copy of the normal parser state stack.
9559 During this exploratory parse, the parser does not perform user semantic
9560 actions. If the exploratory parse reaches a shift action, normal parsing
9561 then resumes on the normal parser stacks. If the exploratory parse reaches
9562 an error instead, the parser reports a syntax error. If verbose syntax
9563 error messages are enabled, the parser must then discover the list of
9564 expected tokens, so it performs a separate exploratory parse for each token
9567 There is one subtlety about the use of LAC. That is, when in a consistent
9568 parser state with a default reduction, the parser will not attempt to fetch
9569 a token from the scanner because no lookahead is needed to determine the
9570 next parser action. Thus, whether default reductions are enabled in
9571 consistent states (@pxref{Default Reductions}) affects how soon the parser
9572 detects a syntax error: immediately when it @emph{reaches} an erroneous
9573 token or when it eventually @emph{needs} that token as a lookahead to
9574 determine the next parser action. The latter behavior is probably more
9575 intuitive, so Bison currently provides no way to achieve the former behavior
9576 while default reductions are enabled in consistent states.
9578 Thus, when LAC is in use, for some fixed decision of whether to enable
9579 default reductions in consistent states, canonical LR and IELR behave almost
9580 exactly the same for both syntactically acceptable and syntactically
9581 unacceptable input. While LALR still does not support the full
9582 language-recognition power of canonical LR and IELR, LAC at least enables
9583 LALR's syntax error handling to correctly reflect LALR's
9584 language-recognition power.
9586 There are a few caveats to consider when using LAC:
9589 @item Infinite parsing loops.
9591 IELR plus LAC does have one shortcoming relative to canonical LR. Some
9592 parsers generated by Bison can loop infinitely. LAC does not fix infinite
9593 parsing loops that occur between encountering a syntax error and detecting
9594 it, but enabling canonical LR or disabling default reductions sometimes
9597 @item Verbose error message limitations.
9599 Because of internationalization considerations, Bison-generated parsers
9600 limit the size of the expected token list they are willing to report in a
9601 verbose syntax error message. If the number of expected tokens exceeds that
9602 limit, the list is simply dropped from the message. Enabling LAC can
9603 increase the size of the list and thus cause the parser to drop it. Of
9604 course, dropping the list is better than reporting an incorrect list.
9608 Because LAC requires many parse actions to be performed twice, it can have a
9609 performance penalty. However, not all parse actions must be performed
9610 twice. Specifically, during a series of default reductions in consistent
9611 states and shift actions, the parser never has to initiate an exploratory
9612 parse. Moreover, the most time-consuming tasks in a parse are often the
9613 file I/O, the lexical analysis performed by the scanner, and the user's
9614 semantic actions, but none of these are performed during the exploratory
9615 parse. Finally, the base of the temporary stack used during an exploratory
9616 parse is a pointer into the normal parser state stack so that the stack is
9617 never physically copied. In our experience, the performance penalty of LAC
9618 has proved insignificant for practical grammars.
9621 While the LAC algorithm shares techniques that have been recognized in the
9622 parser community for years, for the publication that introduces LAC, see
9623 @tcite{Denny 2010 May}.
9625 @node Unreachable States
9626 @subsection Unreachable States
9627 @findex %define lr.keep-unreachable-state
9628 @cindex unreachable states
9630 If there exists no sequence of transitions from the parser's start state to
9631 some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable
9632 state}. A state can become unreachable during conflict resolution if Bison
9633 disables a shift action leading to it from a predecessor state.
9635 By default, Bison removes unreachable states from the parser after conflict
9636 resolution because they are useless in the generated parser. However,
9637 keeping unreachable states is sometimes useful when trying to understand the
9638 relationship between the parser and the grammar.
9640 @deffn {Directive} {%define lr.keep-unreachable-state} @var{value}
9641 Request that Bison allow unreachable states to remain in the parser tables.
9642 @var{value} must be a Boolean. The default is @code{false}.
9645 There are a few caveats to consider:
9648 @item Missing or extraneous warnings.
9650 Unreachable states may contain conflicts and may use rules not used in any
9651 other state. Thus, keeping unreachable states may induce warnings that are
9652 irrelevant to your parser's behavior, and it may eliminate warnings that are
9653 relevant. Of course, the change in warnings may actually be relevant to a
9654 parser table analysis that wants to keep unreachable states, so this
9655 behavior will likely remain in future Bison releases.
9657 @item Other useless states.
9659 While Bison is able to remove unreachable states, it is not guaranteed to
9660 remove other kinds of useless states. Specifically, when Bison disables
9661 reduce actions during conflict resolution, some goto actions may become
9662 useless, and thus some additional states may become useless. If Bison were
9663 to compute which goto actions were useless and then disable those actions,
9664 it could identify such states as unreachable and then remove those states.
9665 However, Bison does not compute which goto actions are useless.
9668 @node Generalized LR Parsing
9669 @section Generalized LR (GLR) Parsing
9671 @cindex generalized LR (GLR) parsing
9672 @cindex ambiguous grammars
9673 @cindex nondeterministic parsing
9675 Bison produces @emph{deterministic} parsers that choose uniquely
9676 when to reduce and which reduction to apply
9677 based on a summary of the preceding input and on one extra token of lookahead.
9678 As a result, normal Bison handles a proper subset of the family of
9679 context-free languages.
9680 Ambiguous grammars, since they have strings with more than one possible
9681 sequence of reductions cannot have deterministic parsers in this sense.
9682 The same is true of languages that require more than one symbol of
9683 lookahead, since the parser lacks the information necessary to make a
9684 decision at the point it must be made in a shift/reduce parser.
9685 Finally, as previously mentioned (@pxref{Mysterious Conflicts}),
9686 there are languages where Bison's default choice of how to
9687 summarize the input seen so far loses necessary information.
9689 When you use the @samp{%glr-parser} declaration in your grammar file,
9690 Bison generates a parser that uses a different algorithm, called
9691 Generalized LR (or GLR). A Bison GLR
9692 parser uses the same basic
9693 algorithm for parsing as an ordinary Bison parser, but behaves
9694 differently in cases where there is a shift/reduce conflict that has not
9695 been resolved by precedence rules (@pxref{Precedence}) or a
9696 reduce/reduce conflict. When a GLR parser encounters such a
9698 effectively @emph{splits} into a several parsers, one for each possible
9699 shift or reduction. These parsers then proceed as usual, consuming
9700 tokens in lock-step. Some of the stacks may encounter other conflicts
9701 and split further, with the result that instead of a sequence of states,
9702 a Bison GLR parsing stack is what is in effect a tree of states.
9704 In effect, each stack represents a guess as to what the proper parse
9705 is. Additional input may indicate that a guess was wrong, in which case
9706 the appropriate stack silently disappears. Otherwise, the semantics
9707 actions generated in each stack are saved, rather than being executed
9708 immediately. When a stack disappears, its saved semantic actions never
9709 get executed. When a reduction causes two stacks to become equivalent,
9710 their sets of semantic actions are both saved with the state that
9711 results from the reduction. We say that two stacks are equivalent
9712 when they both represent the same sequence of states,
9713 and each pair of corresponding states represents a
9714 grammar symbol that produces the same segment of the input token
9717 Whenever the parser makes a transition from having multiple
9718 states to having one, it reverts to the normal deterministic parsing
9719 algorithm, after resolving and executing the saved-up actions.
9720 At this transition, some of the states on the stack will have semantic
9721 values that are sets (actually multisets) of possible actions. The
9722 parser tries to pick one of the actions by first finding one whose rule
9723 has the highest dynamic precedence, as set by the @samp{%dprec}
9724 declaration. Otherwise, if the alternative actions are not ordered by
9725 precedence, but there the same merging function is declared for both
9726 rules by the @samp{%merge} declaration,
9727 Bison resolves and evaluates both and then calls the merge function on
9728 the result. Otherwise, it reports an ambiguity.
9730 It is possible to use a data structure for the GLR parsing tree that
9731 permits the processing of any LR(1) grammar in linear time (in the
9732 size of the input), any unambiguous (not necessarily
9734 quadratic worst-case time, and any general (possibly ambiguous)
9735 context-free grammar in cubic worst-case time. However, Bison currently
9736 uses a simpler data structure that requires time proportional to the
9737 length of the input times the maximum number of stacks required for any
9738 prefix of the input. Thus, really ambiguous or nondeterministic
9739 grammars can require exponential time and space to process. Such badly
9740 behaving examples, however, are not generally of practical interest.
9741 Usually, nondeterminism in a grammar is local---the parser is ``in
9742 doubt'' only for a few tokens at a time. Therefore, the current data
9743 structure should generally be adequate. On LR(1) portions of a
9744 grammar, in particular, it is only slightly slower than with the
9745 deterministic LR(1) Bison parser.
9747 For a more detailed exposition of GLR parsers, see @tcite{Scott 2000}.
9749 @node Memory Management
9750 @section Memory Management, and How to Avoid Memory Exhaustion
9751 @cindex memory exhaustion
9752 @cindex memory management
9753 @cindex stack overflow
9754 @cindex parser stack overflow
9755 @cindex overflow of parser stack
9757 The Bison parser stack can run out of memory if too many tokens are shifted and
9758 not reduced. When this happens, the parser function @code{yyparse}
9759 calls @code{yyerror} and then returns 2.
9761 Because Bison parsers have growing stacks, hitting the upper limit
9762 usually results from using a right recursion instead of a left
9763 recursion, see @ref{Recursion}.
9766 By defining the macro @code{YYMAXDEPTH}, you can control how deep the
9767 parser stack can become before memory is exhausted. Define the
9768 macro with a value that is an integer. This value is the maximum number
9769 of tokens that can be shifted (and not reduced) before overflow.
9771 The stack space allowed is not necessarily allocated. If you specify a
9772 large value for @code{YYMAXDEPTH}, the parser normally allocates a small
9773 stack at first, and then makes it bigger by stages as needed. This
9774 increasing allocation happens automatically and silently. Therefore,
9775 you do not need to make @code{YYMAXDEPTH} painfully small merely to save
9776 space for ordinary inputs that do not need much stack.
9778 However, do not allow @code{YYMAXDEPTH} to be a value so large that
9779 arithmetic overflow could occur when calculating the size of the stack
9780 space. Also, do not allow @code{YYMAXDEPTH} to be less than
9783 @cindex default stack limit
9784 The default value of @code{YYMAXDEPTH}, if you do not define it, is
9788 You can control how much stack is allocated initially by defining the
9789 macro @code{YYINITDEPTH} to a positive integer. For the deterministic
9790 parser in C, this value must be a compile-time constant
9791 unless you are assuming C99 or some other target language or compiler
9792 that allows variable-length arrays. The default is 200.
9794 Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}.
9796 You can generate a deterministic parser containing C++ user code from the
9797 default (C) skeleton, as well as from the C++ skeleton (@pxref{C++
9798 Parsers}). However, if you do use the default skeleton and want to allow
9799 the parsing stack to grow, be careful not to use semantic types or location
9800 types that require non-trivial copy constructors. The C skeleton bypasses
9801 these constructors when copying data to new, larger stacks.
9803 @node Error Recovery
9804 @chapter Error Recovery
9805 @cindex error recovery
9806 @cindex recovery from errors
9808 It is not usually acceptable to have a program terminate on a syntax
9809 error. For example, a compiler should recover sufficiently to parse the
9810 rest of the input file and check it for errors; a calculator should accept
9813 In a simple interactive command parser where each input is one line, it may
9814 be sufficient to allow @code{yyparse} to return 1 on error and have the
9815 caller ignore the rest of the input line when that happens (and then call
9816 @code{yyparse} again). But this is inadequate for a compiler, because it
9817 forgets all the syntactic context leading up to the error. A syntax error
9818 deep within a function in the compiler input should not cause the compiler
9819 to treat the following line like the beginning of a source file.
9822 You can define how to recover from a syntax error by writing rules to
9823 recognize the special token @code{error}. This is a terminal symbol that
9824 is always defined (you need not declare it) and reserved for error
9825 handling. The Bison parser generates an @code{error} token whenever a
9826 syntax error happens; if you have provided a rule to recognize this token
9827 in the current context, the parse can continue.
9839 The fourth rule in this example says that an error followed by a newline
9840 makes a valid addition to any @code{stmts}.
9842 What happens if a syntax error occurs in the middle of an @code{exp}? The
9843 error recovery rule, interpreted strictly, applies to the precise sequence
9844 of a @code{stmts}, an @code{error} and a newline. If an error occurs in
9845 the middle of an @code{exp}, there will probably be some additional tokens
9846 and subexpressions on the stack after the last @code{stmts}, and there
9847 will be tokens to read before the next newline. So the rule is not
9848 applicable in the ordinary way.
9850 But Bison can force the situation to fit the rule, by discarding part of the
9851 semantic context and part of the input. First it discards states and
9852 objects from the stack until it gets back to a state in which the
9853 @code{error} token is acceptable. (This means that the subexpressions
9854 already parsed are discarded, back to the last complete @code{stmts}.) At
9855 this point the @code{error} token can be shifted. Then, if the old
9856 lookahead token is not acceptable to be shifted next, the parser reads
9857 tokens and discards them until it finds a token which is acceptable. In
9858 this example, Bison reads and discards input until the next newline so that
9859 the fourth rule can apply. Note that discarded symbols are possible sources
9860 of memory leaks, see @ref{Destructor Decl}, for a means to reclaim this
9863 The choice of error rules in the grammar is a choice of strategies for
9864 error recovery. A simple and useful strategy is simply to skip the rest of
9865 the current input line or current statement if an error is detected:
9868 stmt: error ';' /* On error, skip until ';' is read. */
9871 It is also useful to recover to the matching close-delimiter of an
9872 opening-delimiter that has already been parsed. Otherwise the
9873 close-delimiter will probably appear to be unmatched, and generate another,
9874 spurious error message:
9884 Error recovery strategies are necessarily guesses. When they guess wrong,
9885 one syntax error often leads to another. In the above example, the error
9886 recovery rule guesses that an error is due to bad input within one
9887 @code{stmt}. Suppose that instead a spurious semicolon is inserted in the
9888 middle of a valid @code{stmt}. After the error recovery rule recovers from
9889 the first error, another syntax error will be found straight away, since the
9890 text following the spurious semicolon is also an invalid @code{stmt}.
9892 To prevent an outpouring of error messages, the parser will output no error
9893 message for another syntax error that happens shortly after the first; only
9894 after three consecutive input tokens have been successfully shifted will
9895 error messages resume.
9897 Note that rules which accept the @code{error} token may have actions, just
9898 as any other rules can.
9901 You can make error messages resume immediately by using the macro
9902 @code{yyerrok} in an action. If you do this in the error rule's action, no
9903 error messages will be suppressed. This macro requires no arguments;
9904 @samp{yyerrok;} is a valid C statement.
9907 The previous lookahead token is reanalyzed immediately after an error. If
9908 this is unacceptable, then the macro @code{yyclearin} may be used to clear
9909 this token. Write the statement @samp{yyclearin;} in the error rule's
9911 @xref{Action Features}.
9913 For example, suppose that on a syntax error, an error handling routine is
9914 called that advances the input stream to some point where parsing should
9915 once again commence. The next symbol returned by the lexical scanner is
9916 probably correct. The previous lookahead token ought to be discarded
9917 with @samp{yyclearin;}.
9919 @vindex YYRECOVERING
9920 The expression @code{YYRECOVERING ()} yields 1 when the parser
9921 is recovering from a syntax error, and 0 otherwise.
9922 Syntax error diagnostics are suppressed while recovering from a syntax
9925 @node Context Dependency
9926 @chapter Handling Context Dependencies
9928 The Bison paradigm is to parse tokens first, then group them into larger
9929 syntactic units. In many languages, the meaning of a token is affected by
9930 its context. Although this violates the Bison paradigm, certain techniques
9931 (known as @dfn{kludges}) may enable you to write Bison parsers for such
9935 * Semantic Tokens:: Token parsing can depend on the semantic context.
9936 * Lexical Tie-ins:: Token parsing can depend on the syntactic context.
9937 * Tie-in Recovery:: Lexical tie-ins have implications for how
9938 error recovery rules must be written.
9941 (Actually, ``kludge'' means any technique that gets its job done but is
9942 neither clean nor robust.)
9944 @node Semantic Tokens
9945 @section Semantic Info in Token Kinds
9947 The C language has a context dependency: the way an identifier is used
9948 depends on what its current meaning is. For example, consider this:
9954 This looks like a function call statement, but if @code{foo} is a typedef
9955 name, then this is actually a declaration of @code{x}. How can a Bison
9956 parser for C decide how to parse this input?
9958 The method used in GNU C is to have two different token kinds,
9959 @code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
9960 identifier, it looks up the current declaration of the identifier in order
9961 to decide which token kind to return: @code{TYPENAME} if the identifier is
9962 declared as a typedef, @code{IDENTIFIER} otherwise.
9964 The grammar rules can then express the context dependency by the choice of
9965 token kind to recognize. @code{IDENTIFIER} is accepted as an expression,
9966 but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
9967 @code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
9968 is @emph{not} significant, such as in declarations that can shadow a
9969 typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
9970 accepted---there is one rule for each of the two token kinds.
9972 This technique is simple to use if the decision of which kinds of
9973 identifiers to allow is made at a place close to where the identifier is
9974 parsed. But in C this is not always so: C allows a declaration to
9975 redeclare a typedef name provided an explicit type has been specified
9979 typedef int foo, bar;
9983 static bar (bar); /* @r{redeclare @code{bar} as static variable} */
9984 extern foo foo (foo); /* @r{redeclare @code{foo} as function} */
9990 Unfortunately, the name being declared is separated from the declaration
9991 construct itself by a complicated syntactic structure---the ``declarator''.
9993 As a result, part of the Bison parser for C needs to be duplicated, with
9994 all the nonterminal names changed: once for parsing a declaration in
9995 which a typedef name can be redefined, and once for parsing a
9996 declaration in which that can't be done. Here is a part of the
9997 duplication, with actions omitted for brevity:
10002 declarator maybeasm '=' init
10003 | declarator maybeasm
10009 notype_declarator maybeasm '=' init
10010 | notype_declarator maybeasm
10016 Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
10017 cannot. The distinction between @code{declarator} and
10018 @code{notype_declarator} is the same sort of thing.
10020 There is some similarity between this technique and a lexical tie-in
10021 (described next), in that information which alters the lexical analysis is
10022 changed during parsing by other parts of the program. The difference is
10023 here the information is global, and is used for other purposes in the
10024 program. A true lexical tie-in has a special-purpose flag controlled by
10025 the syntactic context.
10027 @node Lexical Tie-ins
10028 @section Lexical Tie-ins
10029 @cindex lexical tie-in
10031 One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
10032 which is set by Bison actions, whose purpose is to alter the way tokens are
10035 For example, suppose we have a language vaguely like C, but with a special
10036 construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes
10037 an expression in parentheses in which all integers are hexadecimal. In
10038 particular, the token @samp{a1b} must be treated as an integer rather than
10039 as an identifier if it appears in that context. Here is how you can do it:
10046 void yyerror (char const *);
10055 | HEX '(' @{ hexflag = 1; @}
10056 expr ')' @{ hexflag = 0; $$ = $4; @}
10057 | expr '+' expr @{ $$ = make_sum ($1, $3); @}
10071 Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
10072 it is nonzero, all integers are parsed in hexadecimal, and tokens starting
10073 with letters are parsed as integers if possible.
10075 The declaration of @code{hexflag} shown in the prologue of the grammar file
10076 is needed to make it accessible to the actions (@pxref{Prologue}). You must
10077 also write the code in @code{yylex} to obey the flag.
10079 @node Tie-in Recovery
10080 @section Lexical Tie-ins and Error Recovery
10082 Lexical tie-ins make strict demands on any error recovery rules you have.
10083 @xref{Error Recovery}.
10085 The reason for this is that the purpose of an error recovery rule is to
10086 abort the parsing of one construct and resume in some larger construct.
10087 For example, in C-like languages, a typical error recovery rule is to skip
10088 tokens until the next semicolon, and then start a new statement, like this:
10093 | IF '(' expr ')' stmt @{ @dots{} @}
10095 | error ';' @{ hexflag = 0; @}
10099 If there is a syntax error in the middle of a @samp{hex (@var{expr})}
10100 construct, this error rule will apply, and then the action for the
10101 completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would
10102 remain set for the entire rest of the input, or until the next @code{hex}
10103 keyword, causing identifiers to be misinterpreted as integers.
10105 To avoid this problem the error recovery rule itself clears @code{hexflag}.
10107 There may also be an error recovery rule that works within expressions.
10108 For example, there could be a rule which applies within parentheses
10109 and skips to the close-parenthesis:
10115 | '(' expr ')' @{ $$ = $2; @}
10121 If this rule acts within the @code{hex} construct, it is not going to abort
10122 that construct (since it applies to an inner level of parentheses within
10123 the construct). Therefore, it should not clear the flag: the rest of
10124 the @code{hex} construct should be parsed with the flag still in effect.
10126 What if there is an error recovery rule which might abort out of the
10127 @code{hex} construct or might not, depending on circumstances? There is no
10128 way you can write the action to determine whether a @code{hex} construct is
10129 being aborted or not. So if you are using a lexical tie-in, you had better
10130 make sure your error recovery rules are not of this kind. Each rule must
10131 be such that you can be sure that it always will, or always won't, have to
10134 @c ================================================== Debugging Your Parser
10137 @chapter Debugging Your Parser
10139 Developing a parser can be a challenge, especially if you don't understand
10140 the algorithm (@pxref{Algorithm}). This chapter explains how to understand
10141 and debug a parser.
10143 The most frequent issue users face is solving their conflicts. To fix them,
10144 the first step is understanding how they arise in a given grammar. This is
10145 made much easier by automated generation of counterexamples, cover in the
10146 first section (@pxref{Counterexamples}).
10148 In most cases though, looking at the structure of the automaton is still
10149 needed. The following sections explain how to generate and read the
10150 detailed structural description of the automaton. There are several formats
10154 as text, see @ref{Understanding};
10157 as a graph, see @ref{Graphviz};
10160 or as a markup report that can be turned, for instance, into HTML, see
10164 The last section focuses on the dynamic part of the parser: how to enable
10165 and understand the parser run-time traces (@pxref{Tracing}).
10168 * Counterexamples:: Understanding conflicts.
10169 * Understanding:: Understanding the structure of your parser.
10170 * Graphviz:: Getting a visual representation of the parser.
10171 * Xml:: Getting a markup representation of the parser.
10172 * Tracing:: Tracing the execution of your parser.
10175 @node Counterexamples
10176 @section Generation of Counterexamples
10178 @cindex counterexamples
10179 @cindex conflict counterexamples
10181 Solving conflicts is probably the most delicate part of the design of an LR
10182 parser, as demonstrated by the number of sections devoted to them in this
10183 very documentation. To solve a conflict, one must understand it: when does
10184 it occur? Is it because of a flaw in the grammar? Is it rather because
10185 LR(1) cannot cope with this grammar?
10187 One difficulty is that conflicts occur in the @emph{automaton}, and it can
10188 be tricky to relate them to issues in the @emph{grammar} itself. With
10189 experience and patience, analysis of the detailed description of the
10190 automaton (@pxref{Understanding}) allows one to find example strings that
10191 reach these conflicts.
10193 That task is made much easier thanks to the generation of counterexamples,
10194 initially developed by Chinawat Isradisaikul and Andrew Myers
10195 @pcite{Isradisaikul 2015}.
10197 As a first example, see the grammar of @ref{Shift/Reduce}, which features
10198 one shift/reduce conflict:
10202 $ @kbd{bison else.y}
10203 else.y: @dwarning{warning}: 1 shift/reduce conflict [@dwarning{-Wconflicts-sr}]
10204 else.y: @dnotice{note}: rerun with option '-Wcounterexamples' to generate conflict counterexamples
10208 Let's rerun @command{bison} with the option
10209 @option{-Wcex}/@option{-Wcounterexamples}@inlinefmt{info, (the following
10210 output is actually in color)}:
10213 else.y: @dwarning{warning}: 1 shift/reduce conflict [@dwarning{-Wconflicts-sr}]
10214 else.y: @dwarning{warning}: shift/reduce conflict on token "else" [@dwarning{-Wcounterexamples}]
10218 This shows two different derivations for one single expression, which proves
10219 that the grammar is ambiguous.
10223 As a more delicate example, consider the example grammar of
10224 @ref{Reduce/Reduce}, which features a reduce/reduce conflict:
10240 Bison generates the following counterexamples:
10244 $ @kbd{bison -Wcex sequence.y}
10245 sequence.y: @dwarning{warning}: 1 shift/reduce conflict [@dwarning{-Wconflicts-sr}]
10246 sequence.y: @dwarning{warning}: 2 reduce/reduce conflicts [@dwarning{-Wconflicts-rr}]
10250 sequence.y: @dwarning{warning}: shift/reduce conflict on token "word" [@dwarning{-Wcounterexamples}]
10251 Example: @red{•} @green{"word"}
10254 @yellow{↳ 2:} @green{maybeword}
10255 @green{↳ 5:} @red{•} @green{"word"}
10256 Example: @red{•} @yellow{"word"}
10259 @yellow{↳ 3:} @green{sequence} @yellow{"word"}
10260 @green{↳ 1:} @red{•}
10263 sequence.y: @dwarning{warning}: reduce/reduce conflict on tokens $end, "word" [@dwarning{-Wcounterexamples}]
10265 First reduce derivation
10267 @yellow{↳ 1:} @red{•}
10269 Second reduce derivation
10271 @yellow{↳ 2:} @green{maybeword}
10272 @green{↳ 4:} @red{•}
10275 sequence.y: @dwarning{warning}: shift/reduce conflict on token "word" [@dwarning{-Wcounterexamples}]
10276 Example: @red{•} @green{"word"}
10279 @yellow{↳ 2:} @green{maybeword}
10280 @green{↳ 5:} @red{•} @green{"word"}
10281 Example: @red{•} @yellow{"word"}
10284 @yellow{↳ 3:} @green{sequence} @yellow{"word"}
10285 @green{↳ 2:} @blue{maybeword}
10286 @blue{↳ 4:} @red{•}
10289 sequence.y:8.3-45: @dwarning{warning}: rule useless in parser due to conflicts [@dwarning{-Wother}]
10290 8 | @dwarning{%empty @{ printf ("empty maybeword\n"); @}}
10291 | @dwarning{^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
10296 sequence.y: @dwarning{warning}: shift/reduce conflict on token "word" [@dwarning{-Wcounterexamples}]
10297 Example: @red{•} @green{"word"}
10300 @yellow{@arrow{} 2:} @green{maybeword}
10301 @green{@arrow{} 5:} @red{•} @green{"word"}
10302 Example: @red{•} @yellow{"word"}
10305 @yellow{@arrow{} 3:} @green{sequence} @yellow{"word"}
10306 @green{@arrow{} 1:} @red{•}
10309 sequence.y: @dwarning{warning}: reduce/reduce conflict on tokens $end, "word" [@dwarning{-Wcounterexamples}]
10311 First reduce derivation
10313 @yellow{@arrow{} 1:} @red{•}
10315 Second reduce derivation
10317 @yellow{@arrow{} 2:} @green{maybeword}
10318 @green{@arrow{}: 4} @red{•}
10321 sequence.y: @dwarning{warning}: shift/reduce conflict on token "word" [@dwarning{-Wcounterexamples}]
10322 Example: @red{•} @green{"word"}
10325 @yellow{@arrow{} 2:} @green{maybeword}
10326 @green{@arrow{} 5:} @red{•} @green{"word"}
10327 Example: @red{•} @yellow{"word"}
10330 @yellow{@arrow{} 3:} @green{sequence} @yellow{"word"}
10331 @green{@arrow{} 2:} @blue{maybeword}
10332 @blue{@arrow{} 4:} @red{•}
10335 sequence.y:8.3-45: @dwarning{warning}: rule useless in parser due to conflicts [@dwarning{-Wother}]
10336 8 | @dwarning{%empty @{ printf ("empty maybeword\n"); @}}
10337 | @dwarning{^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}
10342 Each of these three conflicts, again, prove that the grammar is ambiguous.
10343 For instance, the second conflict (the reduce/reduce one) shows that the
10344 grammar accepts the empty input in two different ways.
10348 Sometimes, the search will not find an example that can be derived in two
10349 ways. In these cases, counterexample generation will provide two examples
10350 that are the same up until the dot. Most notably, this will happen when
10351 your grammar requires a stronger parser (more lookahead, LR instead of
10352 LALR). The following example isn't LR(1):
10360 expr: %empty | expr ID ','
10363 @command{bison} reports:
10366 ids.y: @dwarning{warning}: 1 shift/reduce conflict [@dwarning{-Wconflicts-sr}]
10367 ids.y: @dwarning{warning}: shift/reduce conflict on token ID [@dwarning{-Wcounterexamples}]
10370 First example: @purple{expr} @red{•} @purple{ID ','} @green{ID} @yellow{$end}
10373 @yellow{↳ 0:} @green{s} @yellow{$end}
10374 @green{↳ 1:} @blue{a} @green{ID}
10375 @blue{↳ 2:} @purple{expr}
10376 @purple{↳ 4: expr} @red{•} @purple{ID ','}
10377 Second example: @blue{expr} @red{•} @green{ID} @yellow{$end}
10380 @yellow{↳ 0:} @green{s} @yellow{$end}
10381 @green{↳ 1:} @blue{a} @green{ID}
10382 @blue{↳ 2: expr} @red{•}
10385 ids.y:4.4-7: @dwarning{warning}: rule useless in parser due to conflicts [@dwarning{-Wother}]
10392 First example: @purple{expr} @red{•} @purple{ID ','} @green{ID} @yellow{$end}
10395 @yellow{@arrow{} 0:} @green{s} @yellow{$end}
10396 @green{@arrow{} 1:} @blue{a} @green{ID}
10397 @blue{@arrow{} 2:} @purple{expr}
10398 @purple{@arrow{} 4: expr} @red{•} @purple{ID ','}
10399 Second example: @blue{expr} @red{•} @green{ID} @yellow{$end}
10402 @yellow{@arrow{} 0:} @green{s} @yellow{$end}
10403 @green{@arrow{} 1:} @blue{a} @green{ID}
10404 @blue{@arrow{} 2: expr} @red{•}
10407 ids.y:4.4-7: @dwarning{warning}: rule useless in parser due to conflicts [@dwarning{-Wother}]
10414 This conflict is caused by the parser not having enough information to know
10415 the difference between these two examples. The parser would need an
10416 additional lookahead token to know whether or not a comma follows the
10417 @code{ID} after @code{expr}. These types of conflicts tend to be more
10418 difficult to fix, and usually need a rework of the grammar. In this case,
10419 it can be fixed by changing around the recursion: @code{expr: ID | ',' expr
10422 Alternatively, you might also want to consider using a GLR parser
10423 (@pxref{GLR Parsers}).
10427 On occasions, it is useful to look at counterexamples @emph{in situ}: with
10428 the automaton report (@xref{Understanding}, in particular @ref{state-8,,
10431 @node Understanding
10432 @section Understanding Your Parser
10434 Bison parsers are @dfn{shift/reduce automata} (@pxref{Algorithm}). In some
10435 cases (much more frequent than one would hope), looking at this automaton is
10436 required to tune or simply fix a parser.
10438 The textual file is generated when the options @option{--report} or
10439 @option{--verbose} are specified, see @ref{Invocation}. Its name is made by
10440 removing @samp{.tab.c} or @samp{.c} from the parser implementation file
10441 name, and adding @samp{.output} instead. Therefore, if the grammar file is
10442 @file{foo.y}, then the parser implementation file is called @file{foo.tab.c}
10443 by default. As a consequence, the verbose output file is called
10446 The following grammar file, @file{calc.y}, will be used in the sequel:
10463 %nterm <sval> useless
10483 @command{bison} reports:
10486 calc.y: @dwarning{warning}: 1 nonterminal useless in grammar [@dwarning{-Wother}]
10487 calc.y: @dwarning{warning}: 1 rule useless in grammar [@dwarning{-Wother}]
10488 calc.y:19.1-7: @dwarning{warning}: nonterminal useless in grammar: useless [@dwarning{-Wother}]
10489 19 | @dwarning{useless: STR;}
10490 | @dwarning{^~~~~~~}
10491 calc.y: @dwarning{warning}: 7 shift/reduce conflicts [@dwarning{-Wconflicts-sr}]
10492 calc.y: @dnotice{note}: rerun with option '-Wcounterexamples' to generate conflict counterexamples
10495 Going back to the calc example, when given @option{--report=state},
10496 in addition to @file{calc.tab.c}, it creates a file @file{calc.output}
10497 with contents detailed below. The order of the output and the exact
10498 presentation might vary, but the interpretation is the same.
10501 @cindex token, useless
10502 @cindex useless token
10503 @cindex nonterminal, useless
10504 @cindex useless nonterminal
10505 @cindex rule, useless
10506 @cindex useless rule
10507 The first section reports useless tokens, nonterminals and rules. Useless
10508 nonterminals and rules are removed in order to produce a smaller parser, but
10509 useless tokens are preserved, since they might be used by the scanner (note
10510 the difference between ``useless'' and ``unused'' below):
10513 Nonterminals useless in grammar
10516 Terminals unused in grammar
10519 Rules useless in grammar
10524 The next section lists states that still have conflicts.
10527 State 8 conflicts: 1 shift/reduce
10528 State 9 conflicts: 1 shift/reduce
10529 State 10 conflicts: 1 shift/reduce
10530 State 11 conflicts: 4 shift/reduce
10534 Then Bison reproduces the exact grammar it used:
10539 0 $accept: exp $end
10549 and reports the uses of the symbols:
10553 Terminals, with rules where they appear
10566 Nonterminals, with rules where they appear
10572 on right: 0 1 2 3 4
10578 @cindex dotted rule
10579 @cindex rule, dotted
10580 Bison then proceeds onto the automaton itself, describing each state with
10581 its set of @dfn{items}, also known as @dfn{dotted rules}. Each item is a
10582 production rule together with a point (@samp{.}) marking the location of the
10588 0 $accept: • exp $end
10590 NUM shift, and go to state 1
10595 This reads as follows: ``state 0 corresponds to being at the very
10596 beginning of the parsing, in the initial rule, right before the start
10597 symbol (here, @code{exp}). When the parser returns to this state right
10598 after having reduced a rule that produced an @code{exp}, the control
10599 flow jumps to state 2. If there is no such transition on a nonterminal
10600 symbol, and the lookahead is a @code{NUM}, then this token is shifted onto
10601 the parse stack, and the control flow jumps to state 1. Any other
10602 lookahead triggers a syntax error.''
10604 @cindex core, item set
10605 @cindex item set core
10606 @cindex kernel, item set
10607 @cindex item set core
10608 Even though the only active rule in state 0 seems to be rule 0, the
10609 report lists @code{NUM} as a lookahead token because @code{NUM} can be
10610 at the beginning of any rule deriving an @code{exp}. By default Bison
10611 reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if
10612 you want to see more detail you can invoke @command{bison} with
10613 @option{--report=itemset} to list the derived items as well:
10618 0 $accept: • exp $end
10619 1 exp: • exp '+' exp
10620 2 | • exp '-' exp
10621 3 | • exp '*' exp
10622 4 | • exp '/' exp
10625 NUM shift, and go to state 1
10631 In the state 1@dots{}
10638 $default reduce using rule 5 (exp)
10642 the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead token
10643 (@samp{$default}), the parser will reduce it. If it was coming from State
10644 0, then, after this reduction it will return to state 0, and will jump to
10645 state 2 (@samp{exp: go to state 2}).
10650 0 $accept: exp • $end
10651 1 exp: exp • '+' exp
10652 2 | exp • '-' exp
10653 3 | exp • '*' exp
10654 4 | exp • '/' exp
10656 $end shift, and go to state 3
10657 '+' shift, and go to state 4
10658 '-' shift, and go to state 5
10659 '*' shift, and go to state 6
10660 '/' shift, and go to state 7
10664 In state 2, the automaton can only shift a symbol. For instance, because of
10665 the item @samp{exp: exp • '+' exp}, if the lookahead is @samp{+} it is
10666 shifted onto the parse stack, and the automaton jumps to state 4,
10667 corresponding to the item @samp{exp: exp '+' • exp}. Since there is no
10668 default action, any lookahead not listed triggers a syntax error.
10670 @cindex accepting state
10671 The state 3 is named the @dfn{final state}, or the @dfn{accepting
10677 0 $accept: exp $end •
10683 the initial rule is completed (the start symbol and the end-of-input were
10684 read), the parsing exits successfully.
10686 The interpretation of states 4 to 7 is straightforward, and is left to
10692 1 exp: exp '+' • exp
10694 NUM shift, and go to state 1
10701 2 exp: exp '-' • exp
10703 NUM shift, and go to state 1
10710 3 exp: exp '*' • exp
10712 NUM shift, and go to state 1
10719 4 exp: exp '/' • exp
10721 NUM shift, and go to state 1
10727 As was announced in beginning of the report, @samp{State 8 conflicts:
10733 1 exp: exp • '+' exp
10734 1 | exp '+' exp •
10735 2 | exp • '-' exp
10736 3 | exp • '*' exp
10737 4 | exp • '/' exp
10739 '*' shift, and go to state 6
10740 '/' shift, and go to state 7
10742 '/' [reduce using rule 1 (exp)]
10743 $default reduce using rule 1 (exp)
10746 Indeed, there are two actions associated to the lookahead @samp{/}:
10747 either shifting (and going to state 7), or reducing rule 1. The
10748 conflict means that either the grammar is ambiguous, or the parser lacks
10749 information to make the right decision. Indeed the grammar is
10750 ambiguous, as, since we did not specify the precedence of @samp{/}, the
10751 sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM /
10752 NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) /
10753 NUM}, which corresponds to reducing rule 1.
10755 Because in deterministic parsing a single decision can be made, Bison
10756 arbitrarily chose to disable the reduction, see @ref{Shift/Reduce}.
10757 Discarded actions are reported between square brackets.
10759 Note that all the previous states had a single possible action: either
10760 shifting the next token and going to the corresponding state, or
10761 reducing a single rule. In the other cases, i.e., when shifting
10762 @emph{and} reducing is possible or when @emph{several} reductions are
10763 possible, the lookahead is required to select the action. State 8 is
10764 one such state: if the lookahead is @samp{*} or @samp{/} then the action
10765 is shifting, otherwise the action is reducing rule 1. In other words,
10766 the first two items, corresponding to rule 1, are not eligible when the
10767 lookahead token is @samp{*}, since we specified that @samp{*} has higher
10768 precedence than @samp{+}. More generally, some items are eligible only
10769 with some set of possible lookahead tokens. When run with
10770 @option{--report=lookahead}, Bison specifies these lookahead tokens:
10775 1 exp: exp • '+' exp
10776 1 | exp '+' exp • [$end, '+', '-', '/']
10777 2 | exp • '-' exp
10778 3 | exp • '*' exp
10779 4 | exp • '/' exp
10781 '*' shift, and go to state 6
10782 '/' shift, and go to state 7
10784 '/' [reduce using rule 1 (exp)]
10785 $default reduce using rule 1 (exp)
10788 Note however that while @samp{NUM + NUM / NUM} is ambiguous (which results in
10789 the conflicts on @samp{/}), @samp{NUM + NUM * NUM} is not: the conflict was
10790 solved thanks to associativity and precedence directives. If invoked with
10791 @option{--report=solved}, Bison includes information about the solved
10792 conflicts in the report:
10795 Conflict between rule 1 and token '+' resolved as reduce (%left '+').
10796 Conflict between rule 1 and token '-' resolved as reduce (%left '-').
10797 Conflict between rule 1 and token '*' resolved as shift ('+' < '*').
10800 When given @option{--report=counterexamples}, @command{bison} will generate
10801 counterexamples within the report, augmented with the corresponding items
10802 (@pxref{Counterexamples}).
10806 shift/reduce conflict on token '/':
10807 1 exp: exp '+' exp •
10808 4 exp: exp • '/' exp
10810 Example: exp '+' exp • '/' exp
10814 ↳ 4: exp • '/' exp
10815 Example: exp '+' exp • '/' exp
10819 ↳ 1: exp '+' exp •
10825 shift/reduce conflict on token '/':
10826 1 exp: exp '+' exp •
10827 4 exp: exp • '/' exp
10829 Example: exp '+' exp • '/' exp
10832 @arrow{} 1: exp '+' exp
10833 @arrow{} 4: exp • '/' exp
10834 Example: exp '+' exp • '/' exp
10837 @arrow{} 4: exp '/' exp
10838 @arrow{} 1: exp '+' exp •
10843 This shows two separate derivations in the grammar for the same @code{exp}:
10844 @samp{e1 + e2 / e3}. The derivations show how your rules would parse the
10845 given example. Here, the first derivation completes a reduction when seeing
10846 @samp{/}, causing @samp{e1 + e2} to be grouped as an @code{exp}. The second
10847 derivation shifts on @samp{/}, resulting in @samp{e2 / e3} being grouped as
10848 an @code{exp}. Therefore, it is easy to see that adding
10849 precedence/associativity directives would fix this conflict.
10851 The remaining states are similar:
10857 1 exp: exp • '+' exp
10858 2 | exp • '-' exp
10859 2 | exp '-' exp •
10860 3 | exp • '*' exp
10861 4 | exp • '/' exp
10863 '*' shift, and go to state 6
10864 '/' shift, and go to state 7
10866 '/' [reduce using rule 2 (exp)]
10867 $default reduce using rule 2 (exp)
10873 1 exp: exp • '+' exp
10874 2 | exp • '-' exp
10875 3 | exp • '*' exp
10876 3 | exp '*' exp •
10877 4 | exp • '/' exp
10879 '/' shift, and go to state 7
10881 '/' [reduce using rule 3 (exp)]
10882 $default reduce using rule 3 (exp)
10888 1 exp: exp • '+' exp
10889 2 | exp • '-' exp
10890 3 | exp • '*' exp
10891 4 | exp • '/' exp
10892 4 | exp '/' exp •
10894 '+' shift, and go to state 4
10895 '-' shift, and go to state 5
10896 '*' shift, and go to state 6
10897 '/' shift, and go to state 7
10899 '+' [reduce using rule 4 (exp)]
10900 '-' [reduce using rule 4 (exp)]
10901 '*' [reduce using rule 4 (exp)]
10902 '/' [reduce using rule 4 (exp)]
10903 $default reduce using rule 4 (exp)
10908 Observe that state 11 contains conflicts not only due to the lack of
10909 precedence of @samp{/} with respect to @samp{+}, @samp{-}, and @samp{*}, but
10910 also because the associativity of @samp{/} is not specified.
10912 Bison may also produce an HTML version of this output, via an XML file and
10913 XSLT processing (@pxref{Xml}).
10915 @c ================================================= Graphical Representation
10918 @section Visualizing Your Parser
10921 As another means to gain better understanding of the shift/reduce
10922 automaton corresponding to the Bison parser, a DOT file can be generated. Note
10923 that debugging a real grammar with this is tedious at best, and impractical
10924 most of the times, because the generated files are huge (the generation of
10925 a PDF or PNG file from it will take very long, and more often than not it will
10926 fail due to memory exhaustion). This option was rather designed for beginners,
10927 to help them understand LR parsers.
10929 This file is generated when the @option{--graph} option is specified
10930 (@pxref{Invocation}). Its name is made by removing
10931 @samp{.tab.c} or @samp{.c} from the parser implementation file name, and
10932 adding @samp{.gv} instead. If the grammar file is @file{foo.y}, the
10933 Graphviz output file is called @file{foo.gv}. A DOT file may also be
10934 produced via an XML file and XSLT processing (@pxref{Xml}).
10937 The following grammar file, @file{rr.y}, will be used in the sequel:
10942 exp: a ";" | b ".";
10948 The graphical output
10950 (see @ref{fig:graph})
10952 is very similar to the textual one, and as such it is easier understood by
10953 making direct comparisons between them. @xref{Debugging}, for a detailed
10954 analysis of the textual report.
10957 @float Figure,fig:graph
10958 @center @image{figs/example, 430pt,,,.svg}
10959 @caption{A graphical rendering of the parser.}
10963 @subheading Graphical Representation of States
10965 The items (dotted rules) for each state are grouped together in graph nodes.
10966 Their numbering is the same as in the verbose file. See the following
10967 points, about transitions, for examples
10969 When invoked with @option{--report=lookaheads}, the lookahead tokens, when
10970 needed, are shown next to the relevant rule between square brackets as a
10971 comma separated list. This is the case in the figure for the representation of
10976 The transitions are represented as directed edges between the current and
10979 @subheading Graphical Representation of Shifts
10981 Shifts are shown as solid arrows, labeled with the lookahead token for that
10982 shift. The following describes a reduction in the @file{rr.output} file:
10990 ";" shift, and go to state 6
10994 A Graphviz rendering of this portion of the graph could be:
10996 @center @image{figs/example-shift, 100pt,,,.svg}
10998 @subheading Graphical Representation of Reductions
11000 Reductions are shown as solid arrows, leading to a diamond-shaped node
11001 bearing the number of the reduction rule. The arrow is labeled with the
11002 appropriate comma separated lookahead tokens. If the reduction is the default
11003 action for the given state, there is no such label.
11005 This is how reductions are represented in the verbose file @file{rr.output}:
11012 "." reduce using rule 4 (b)
11013 $default reduce using rule 3 (a)
11016 A Graphviz rendering of this portion of the graph could be:
11018 @center @image{figs/example-reduce, 120pt,,,.svg}
11020 When unresolved conflicts are present, because in deterministic parsing
11021 a single decision can be made, Bison can arbitrarily choose to disable a
11022 reduction, see @ref{Shift/Reduce}. Discarded actions
11023 are distinguished by a red filling color on these nodes, just like how they are
11024 reported between square brackets in the verbose file.
11026 The reduction corresponding to the rule number 0 is the acceptation
11027 state. It is shown as a blue diamond, labeled ``Acc''.
11029 @subheading Graphical Representation of Gotos
11031 The @samp{go to} jump transitions are represented as dotted lines bearing
11032 the name of the rule being jumped to.
11034 @c ================================================= XML
11037 @section Visualizing your parser in multiple formats
11040 Bison supports two major report formats: textual output
11041 (@pxref{Understanding}) when invoked
11042 with option @option{--verbose}, and DOT
11043 (@pxref{Graphviz}) when invoked with
11044 option @option{--graph}. However,
11045 another alternative is to output an XML file that may then be, with
11046 @command{xsltproc}, rendered as either a raw text format equivalent to the
11047 verbose file, or as an HTML version of the same file, with clickable
11048 transitions, or even as a DOT. The @file{.output} and DOT files obtained via
11049 XSLT have no difference whatsoever with those obtained by invoking
11050 @command{bison} with options @option{--verbose} or @option{--graph}.
11052 The XML file is generated when the options @option{-x} or
11053 @option{--xml[=FILE]} are specified, see @ref{Invocation}.
11054 If not specified, its name is made by removing @samp{.tab.c} or @samp{.c}
11055 from the parser implementation file name, and adding @samp{.xml} instead.
11056 For instance, if the grammar file is @file{foo.y}, the default XML output
11057 file is @file{foo.xml}.
11059 Bison ships with a @file{data/xslt} directory, containing XSL Transformation
11060 files to apply to the XML file. Their names are non-ambiguous:
11064 Used to output a copy of the DOT visualization of the automaton.
11066 Used to output a copy of the @samp{.output} file.
11067 @item xml2xhtml.xsl
11068 Used to output an xhtml enhancement of the @samp{.output} file.
11071 Sample usage (requires @command{xsltproc}):
11073 $ @kbd{bison -x gr.y}
11075 $ @kbd{bison --print-datadir}
11076 /usr/local/share/bison
11078 $ @kbd{xsltproc /usr/local/share/bison/xslt/xml2xhtml.xsl gr.xml >gr.html}
11081 @c ================================================= Tracing
11084 @section Tracing Your Parser
11087 @cindex tracing the parser
11089 When a Bison grammar compiles properly but parses ``incorrectly'', the
11090 @code{yydebug} parser-trace feature helps figuring out why.
11093 * Enabling Traces:: Activating run-time trace support
11094 * Mfcalc Traces:: Extending @code{mfcalc} to support traces
11097 @node Enabling Traces
11098 @subsection Enabling Traces
11099 There are several means to enable compilation of trace facilities, in
11100 decreasing order of preference:
11103 @item the variable @samp{parse.trace}
11104 @findex %define parse.trace
11105 Add the @samp{%define parse.trace} directive (@pxref{%define
11106 Summary}), or pass the @option{-Dparse.trace} option
11107 (@pxref{Tuning the Parser}). This is a Bison extension. Unless POSIX and
11108 Yacc portability matter to you, this is the preferred solution.
11110 @item the option @option{-t} (POSIX Yacc compliant)
11111 @itemx the option @option{--debug} (Bison extension)
11112 Use the @option{-t} option when you run Bison (@pxref{Invocation}). With
11113 @samp{%define api.prefix @{c@}}, it defines @code{CDEBUG} to 1, otherwise it
11114 defines @code{YYDEBUG} to 1.
11116 @item the directive @samp{%debug} (deprecated)
11118 Add the @code{%debug} directive (@pxref{Decl Summary}). This Bison
11119 extension is maintained for backward compatibility; use @code{%define
11120 parse.trace} instead.
11122 @item the macro @code{YYDEBUG} (C/C++ only)
11124 Define the macro @code{YYDEBUG} to a nonzero value when you compile the
11125 parser. This is compliant with POSIX Yacc. You could use
11126 @option{-DYYDEBUG=1} as a compiler option or you could put @samp{#define
11127 YYDEBUG 1} in the prologue of the grammar file (@pxref{Prologue}).
11129 If the @code{%define} variable @code{api.prefix} is used (@pxref{Multiple
11130 Parsers}), for instance @samp{%define
11131 api.prefix @{c@}}, then if @code{CDEBUG} is defined, its value controls the
11132 tracing feature (enabled if and only if nonzero); otherwise tracing is
11133 enabled if and only if @code{YYDEBUG} is nonzero.
11135 In C++, where POSIX compliance makes no sense, avoid this option, and prefer
11136 @samp{%define parse.trace}. If you @code{#define} the @code{YYDEBUG} macro
11137 at the wrong place (e.g., in @samp{%code top} instead of @samp{%code
11138 require}), the parser class will have two different definitions, thus
11139 leading to ODR violations and happy debugging times.
11142 We suggest that you always enable the trace option so that debugging is
11146 In C the trace facility outputs messages with macro calls of the form
11147 @code{YYFPRINTF (stderr, @var{format}, @var{args})} where @var{format} and
11148 @var{args} are the usual @code{printf} format and variadic arguments. If
11149 you define @code{YYDEBUG} to a nonzero value but do not define
11150 @code{YYFPRINTF}, @code{<stdio.h>} is automatically included and
11151 @code{YYFPRINTF} is defined to @code{fprintf}.
11153 Once you have compiled the program with trace facilities, the way to request
11154 a trace is to store a nonzero value in the variable @code{yydebug}. You can
11155 do this by making the C code do it (in @code{main}, perhaps), or you can
11156 alter the value with a C debugger.
11158 Each step taken by the parser when @code{yydebug} is nonzero produces a line
11159 or two of trace information, written on @code{stderr}. The trace messages
11160 tell you these things:
11164 Each time the parser calls @code{yylex}, what kind of token was read.
11167 Each time a token is shifted, the depth and complete contents of the state
11168 stack (@pxref{Parser States}).
11171 Each time a rule is reduced, which rule it is, and the complete contents of
11172 the state stack afterward.
11175 To make sense of this information, it helps to refer to the automaton
11176 description file (@pxref{Understanding}). This
11177 file shows the meaning of each state in terms of positions in various rules,
11178 and also what each state will do with each possible input token. As you
11179 read the successive trace messages, you can see that the parser is
11180 functioning according to its specification in the listing file. Eventually
11181 you will arrive at the place where something undesirable happens, and you
11182 will see which parts of the grammar are to blame.
11184 The parser implementation file is a C/C++/D/Java program and you can use
11185 debuggers on it, but it's not easy to interpret what it is doing. The
11186 parser function is a finite-state machine interpreter, and aside from the
11187 actions it executes the same code over and over. Only the values of
11188 variables show where in the grammar it is working.
11190 @node Mfcalc Traces
11191 @subsection Enabling Debug Traces for @code{mfcalc}
11193 The debugging information normally gives the token kind of each token read,
11194 but not its semantic value. The @code{%printer} directive allows specify
11195 how semantic values are reported, see @ref{Printer Decl}.
11197 As a demonstration of @code{%printer}, consider the multi-function
11198 calculator, @code{mfcalc} (@pxref{Multi-function Calc}). To enable run-time
11199 traces, and semantic value reports, insert the following directives in its
11202 @comment file: c/mfcalc/mfcalc.y: 2
11204 /* Generate the parser description file. */
11206 /* Enable run-time traces (yydebug). */
11207 %define parse.trace
11209 /* Formatting semantic values. */
11210 %printer @{ fprintf (yyo, "%s", $$->name); @} VAR;
11211 %printer @{ fprintf (yyo, "%s()", $$->name); @} FUN;
11212 %printer @{ fprintf (yyo, "%g", $$); @} <double>;
11215 The @code{%define} directive instructs Bison to generate run-time trace
11216 support. Then, activation of these traces is controlled at run-time by the
11217 @code{yydebug} variable, which is disabled by default. Because these traces
11218 will refer to the ``states'' of the parser, it is helpful to ask for the
11219 creation of a description of that parser; this is the purpose of (admittedly
11220 ill-named) @code{%verbose} directive.
11222 The set of @code{%printer} directives demonstrates how to format the
11223 semantic value in the traces. Note that the specification can be done
11224 either on the symbol type (e.g., @code{VAR} or @code{FUN}), or on the type
11225 tag: since @code{<double>} is the type for both @code{NUM} and @code{exp},
11226 this printer will be used for them.
11228 Here is a sample of the information provided by run-time traces. The traces
11229 are sent onto standard error.
11232 $ @kbd{echo 'sin(1-1)' | ./mfcalc -p}
11235 Reducing stack by rule 1 (line 34):
11236 -> $$ = nterm input ()
11242 This first batch shows a specific feature of this grammar: the first rule
11243 (which is in line 34 of @file{mfcalc.y} can be reduced without even having
11244 to look for the first token. The resulting left-hand symbol (@code{$$}) is
11245 a valueless (@samp{()}) @code{input} nonterminal (@code{nterm}).
11247 Then the parser calls the scanner.
11250 Next token is token FUN (sin())
11251 Shifting token FUN (sin())
11256 That token (@code{token}) is a function (@code{FUN}) whose value is
11257 @samp{sin} as formatted per our @code{%printer} specification: @samp{sin()}.
11258 The parser stores (@code{Shifting}) that token, and others, until it can do
11259 something about it.
11263 Next token is token '(' ()
11264 Shifting token '(' ()
11267 Next token is token NUM (1.000000)
11268 Shifting token NUM (1.000000)
11270 Reducing stack by rule 6 (line 44):
11271 $1 = token NUM (1.000000)
11272 -> $$ = nterm exp (1.000000)
11278 The previous reduction demonstrates the @code{%printer} directive for
11279 @code{<double>}: both the token @code{NUM} and the resulting nonterminal
11280 @code{exp} have @samp{1} as value.
11284 Next token is token '-' ()
11285 Shifting token '-' ()
11288 Next token is token NUM (1.000000)
11289 Shifting token NUM (1.000000)
11291 Reducing stack by rule 6 (line 44):
11292 $1 = token NUM (1.000000)
11293 -> $$ = nterm exp (1.000000)
11294 Stack now 0 1 6 14 24 17
11297 Next token is token ')' ()
11298 Reducing stack by rule 11 (line 49):
11299 $1 = nterm exp (1.000000)
11301 $3 = nterm exp (1.000000)
11302 -> $$ = nterm exp (0.000000)
11308 The rule for the subtraction was just reduced. The parser is about to
11309 discover the end of the call to @code{sin}.
11312 Next token is token ')' ()
11313 Shifting token ')' ()
11315 Reducing stack by rule 9 (line 47):
11316 $1 = token FUN (sin())
11318 $3 = nterm exp (0.000000)
11320 -> $$ = nterm exp (0.000000)
11326 Finally, the end-of-line allow the parser to complete the computation, and
11327 display its result.
11331 Next token is token '\n' ()
11332 Shifting token '\n' ()
11334 Reducing stack by rule 4 (line 40):
11335 $1 = nterm exp (0.000000)
11338 -> $$ = nterm line ()
11341 Reducing stack by rule 2 (line 35):
11342 $1 = nterm input ()
11344 -> $$ = nterm input ()
11349 The parser has returned into state 1, in which it is waiting for the next
11350 expression to evaluate, or for the end-of-file token, which causes the
11351 completion of the parsing.
11355 Now at end of input.
11356 Shifting token $end ()
11359 Cleanup: popping token $end ()
11360 Cleanup: popping nterm input ()
11364 @c ================================================= Invoking Bison
11367 @chapter Invoking Bison
11368 @cindex invoking Bison
11369 @cindex Bison invocation
11370 @cindex options for invoking Bison
11372 The usual way to invoke Bison is as follows:
11375 $ @kbd{bison @var{file}}
11378 Here @var{file} is the grammar file name, which usually ends in @samp{.y}.
11379 The parser implementation file's name is made by replacing the @samp{.y}
11380 with @samp{.tab.c} and removing any leading directory. Thus, the
11381 @samp{bison foo.y} file name yields @file{foo.tab.c}, and the @samp{bison
11382 hack/foo.y} file name yields @file{foo.tab.c}. It's also possible, in case
11383 you are writing C++ code instead of C in your grammar file, to name it
11384 @file{foo.ypp} or @file{foo.y++}. Then, the output files will take an
11385 extension like the given one as input (respectively @file{foo.tab.cpp} and
11386 @file{foo.tab.c++}). This feature takes effect with all options that
11387 manipulate file names like @option{-o} or @option{-d}.
11392 $ @kbd{bison -d @var{file.yxx}}
11395 will produce @file{file.tab.cxx} and @file{file.tab.hxx}, and
11398 $ @kbd{bison -d -o @var{output.c++} @var{file.y}}
11401 will produce @file{output.c++} and @file{output.h++}.
11403 For compatibility with POSIX, the standard Bison distribution also contains
11404 a shell script called @command{yacc} that invokes Bison with the @option{-y}
11409 The exit status of @command{bison} is:
11412 when there were no errors. Warnings, which are diagnostics about dubious
11413 constructs, do not change the exit status, unless they are turned into
11414 errors (@pxref{Werror,,@option{-Werror}}).
11417 when there were errors. No file was generated (except the reports generated
11418 by @option{--verbose}, etc.). In particular, the output files that possibly
11419 existed were not changed.
11421 @item 63 (mismatch)
11422 when @command{bison} does not meet the version requirements of the grammar
11423 file. @xref{Require Decl}. No file was generated or changed.
11428 * Bison Options:: All the options described in detail,
11429 in alphabetical order by short options.
11430 * Option Cross Key:: Alphabetical list of long options.
11431 * Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
11434 @node Bison Options
11435 @section Bison Options
11437 Bison supports both traditional single-letter options and mnemonic long
11438 option names. Long option names are indicated with @option{--} instead of
11439 @option{-}. Abbreviations for option names are allowed as long as they
11440 are unique. When a long option takes an argument, like
11441 @option{--file-prefix}, connect the option name and the argument with
11444 Here is a list of options that can be used with Bison. It is followed by a
11445 cross key alphabetized by long option.
11448 * Operation Modes:: Options controlling the global behavior of @command{bison}
11449 * Diagnostics:: Options controlling the diagnostics
11450 * Tuning the Parser:: Options changing the generated parsers
11451 * Output Files:: Options controlling the output
11454 @node Operation Modes
11455 @subsection Operation Modes
11457 Options controlling the global behavior of @command{bison}.
11459 @c Please, keep this ordered as in 'bison --help'.
11463 Print a summary of the command-line options to Bison and exit.
11467 Print the version number of Bison and exit.
11469 @item --print-localedir
11470 Print the name of the directory containing locale-dependent data.
11472 @item --print-datadir
11473 Print the name of the directory containing skeletons, CSS and XSLT.
11477 Update the grammar file (remove duplicates, update deprecated directives,
11478 etc.) and exit (i.e., do not generate any of the output files). Leaves a
11479 backup of the original file with a @code{~} appended. For instance:
11485 %define parse.error verbose
11490 $ @kbd{bison -u foo.y}
11491 foo.y:1.1-14: @dwarning{warning}: deprecated directive, use '%define parse.error verbose' [@dwarning{-Wdeprecated}]
11492 1 | @dwarning{%error-verbose}
11493 | @dwarning{^~~~~~~~~~~~~~}
11494 foo.y:2.1-27: @dwarning{warning}: %define variable 'parse.error' redefined [@dwarning{-Wother}]
11495 2 | @dwarning{%define parse.error verbose}
11496 | @dwarning{^~~~~~~~~~~~~~~~~~~~~~~~~~~}
11497 foo.y:1.1-14: previous definition
11498 1 | @dnotice{%error-verbose}
11499 | @dnotice{^~~~~~~~~~~~~~}
11500 bison: file 'foo.y' was updated (backup: 'foo.y~')
11504 %define parse.error verbose
11510 See the documentation of @option{--feature=fixit} below for more details.
11512 @item -f [@var{feature}]
11513 @itemx --feature[=@var{feature}]
11514 Activate miscellaneous @var{feature}s. @var{Feature} can be one of:
11517 @itemx diagnostics-show-caret
11518 Show caret errors, in a manner similar to GCC's
11519 @option{-fdiagnostics-show-caret}, or Clang's
11520 @option{-fcaret-diagnostics}. The location provided with the message is used
11521 to quote the corresponding line of the source file, underlining the
11522 important part of it with carets (@samp{^}). Here is an example, using the
11523 following file @file{in.y}:
11528 exp: exp '+' exp @{ $exp = $1 + $2; @};
11531 When invoked with @option{-fcaret} (or nothing), Bison will report:
11535 in.y:3.20-23: @derror{error}: ambiguous reference: '$exp'
11536 3 | exp: exp '+' exp @{ @derror{$exp} = $1 + $2; @};
11540 in.y:3.1-3: refers to: $exp at $$
11541 3 | @dnotice{exp}: exp '+' exp @{ $exp = $1 + $2; @};
11545 in.y:3.6-8: refers to: $exp at $1
11546 3 | exp: @dnotice{exp} '+' exp @{ $exp = $1 + $2; @};
11550 in.y:3.14-16: refers to: $exp at $3
11551 3 | exp: exp '+' @dnotice{exp} @{ $exp = $1 + $2; @};
11555 in.y:3.32-33: @derror{error}: $2 of 'exp' has no declared type
11556 3 | exp: exp '+' exp @{ $exp = $1 + @derror{$2}; @};
11561 Whereas, when invoked with @option{-fno-caret}, Bison will only report:
11565 in.y:3.20-23: @derror{error}: ambiguous reference: '$exp'
11566 in.y:3.1-3: refers to: $exp at $$
11567 in.y:3.6-8: refers to: $exp at $1
11568 in.y:3.14-16: refers to: $exp at $3
11569 in.y:3.32-33: @derror{error}: $2 of 'exp' has no declared type
11573 This option is activated by default.
11576 @itemx diagnostics-parseable-fixits
11577 Show machine-readable fixes, in a manner similar to GCC's and Clang's
11578 @option{-fdiagnostics-parseable-fixits}.
11580 Fix-its are generated for duplicate directives:
11585 %define api.prefix @{foo@}
11586 %define api.prefix @{bar@}
11592 $ @kbd{bison -ffixit foo.y}
11593 foo.y:2.1-24: @derror{error}: %define variable 'api.prefix' redefined
11594 2 | @derror{%define api.prefix @{bar@}}
11595 | @derror{^~~~~~~~~~~~~~~~~~~~~~~~}
11596 foo.y:1.1-24: previous definition
11597 1 | @dnotice{%define api.prefix @{foo@}}
11598 | @dnotice{^~~~~~~~~~~~~~~~~~~~~~~~}
11599 fix-it:"foo.y":@{2:1-2:25@}:""
11600 foo.y: @dwarning{warning}: fix-its can be applied. Rerun with option '--update'. [@dwarning{-Wother}]
11604 They are also generated to update deprecated directives, unless
11605 @option{-Wno-deprecated} was given:
11609 $ @kbd{cat /tmp/foo.yy}
11616 $ @kbd{bison foo.y}
11617 foo.y:1.1-14: @dwarning{warning}: deprecated directive, use '%define parse.error verbose' [@dwarning{-Wdeprecated}]
11618 1 | @dwarning{%error-verbose}
11619 | @dwarning{^~~~~~~~~~~~~~}
11620 foo.y:2.1-18: @dwarning{warning}: deprecated directive, use '%define api.prefix @{foo@}' [@dwarning{-Wdeprecated}]
11621 2 | @dwarning{%name-prefix "foo"}
11622 | @dwarning{^~~~~~~~~~~~~~~~~~}
11623 foo.y: @dwarning{warning}: fix-its can be applied. Rerun with option '--update'. [@dwarning{-Wother}]
11627 The fix-its are applied by @command{bison} itself when given the option
11628 @option{-u}/@option{--update}. See its documentation above.
11631 Do not generate the output files. The name of this feature is somewhat
11632 misleading as more than just checking the syntax is done: every stage is run
11633 (including checking for conflicts for instance), except the generation of
11640 @subsection Diagnostics
11642 Options controlling the diagnostics.
11644 @c Please, keep this ordered as in 'bison --help'.
11646 @item -W [@var{category}]
11647 @itemx --warnings[=@var{category}]
11648 Output warnings falling in @var{category}. @var{category} can be one
11651 @item @anchor{Wconflicts-sr}conflicts-sr
11652 @itemx @anchor{Wconflicts-rr}conflicts-rr
11653 S/R and R/R conflicts. These warnings are enabled by default. However, if
11654 the @code{%expect} or @code{%expect-rr} directive is specified, an
11655 unexpected number of conflicts is an error, and an expected number of
11656 conflicts is not reported, so @option{-W} and @option{--warning} then have
11657 no effect on the conflict report.
11659 @item @anchor{Wcounterexamples}counterexamples
11661 Provide counterexamples for conflicts. @xref{Counterexamples}.
11662 Counterexamples take time to compute. The option @option{-Wcex} should be
11663 used by the developer when working on the grammar; it hardly makes sense to
11666 @item @anchor{Wdangling-alias}dangling-alias
11667 Report string literals that are not bound to a token symbol.
11669 String literals, which allow for better error messages, are (too) liberally
11670 accepted by Bison, which might result in silent errors. For instance
11673 %type <exVal> cond "condition"
11677 does not define ``condition'' as a string alias to @code{cond}---nonterminal
11678 symbols do not have string aliases. It is rather equivalent to
11681 %nterm <exVal> cond
11682 %token <exVal> "condition"
11686 i.e., it gives the @samp{"condition"} token the type @code{exVal}.
11688 Also, because string aliases do not need to be defined, typos such as
11689 @samp{"baz"} instead of @samp{"bar"} will be not reported.
11691 The option @option{-Wdangling-alias} catches these situations. On
11695 %type <ival> foo "foo"
11701 @samp{bison -Wdangling-alias} reports
11704 @dwarning{warning}: string literal not attached to a symbol
11705 | %type <ival> foo @dwarning{"foo"}
11707 @dwarning{warning}: string literal not attached to a symbol
11708 | foo: @dwarning{"baz"} @{@}
11712 @item @anchor{Wdeprecated}deprecated
11713 Deprecated constructs whose support will be removed in future versions of
11716 @item @anchor{Wempty-rule}empty-rule
11717 Empty rules without @code{%empty}. @xref{Empty Rules}. Disabled by
11718 default, but enabled by uses of @code{%empty}, unless
11719 @option{-Wno-empty-rule} was specified.
11721 @item @anchor{Wmidrule-values}midrule-values
11722 Warn about midrule values that are set but not used within any of the actions
11723 of the parent rule.
11724 For example, warn about unused @code{$2} in:
11727 exp: '1' @{ $$ = 1; @} '+' exp @{ $$ = $1 + $4; @};
11730 Also warn about midrule values that are used but not set.
11731 For example, warn about unset @code{$$} in the midrule action in:
11734 exp: '1' @{ $1 = 1; @} '+' exp @{ $$ = $2 + $4; @};
11737 These warnings are not enabled by default since they sometimes prove to
11738 be false alarms in existing grammars employing the Yacc constructs
11739 @code{$0} or @code{$-@var{n}} (where @var{n} is some positive integer).
11741 @item @anchor{Wprecedence}precedence
11742 Useless precedence and associativity directives. Disabled by default.
11744 Consider for instance the following grammar:
11773 @c cannot leave the location and the [-Wprecedence] for lack of
11777 @dwarning{warning}: useless precedence and associativity for "="
11778 | %nonassoc @dwarning{"="}
11782 @dwarning{warning}: useless associativity for "*", use %precedence
11783 | %left @dwarning{"*"}
11787 @dwarning{warning}: useless precedence for "("
11788 | %precedence @dwarning{"("}
11793 One would get the exact same parser with the following directives instead:
11802 @item @anchor{Wyacc}yacc
11803 Incompatibilities with POSIX Yacc.
11805 @item @anchor{Wother}other
11806 All warnings not categorized above. These warnings are enabled by default.
11808 This category is provided merely for the sake of completeness. Future
11809 releases of Bison may move warnings from this category to new, more specific
11812 @item @anchor{Wall}all
11813 All the warnings except @code{counterexamples}, @code{dangling-alias} and
11816 @item @anchor{Wnone}none
11817 Turn off all the warnings.
11820 See @option{-Werror}, below.
11823 A category can be turned off by prefixing its name with @samp{no-}. For
11824 instance, @option{-Wno-yacc} will hide the warnings about
11825 POSIX Yacc incompatibilities.
11827 @item @anchor{Werror}-Werror
11828 Turn enabled warnings for every @var{category} into errors, unless they are
11829 explicitly disabled by @option{-Wno-error=@var{category}}.
11831 @item -Werror=@var{category}
11832 Enable warnings falling in @var{category}, and treat them as errors.
11834 @var{category} is the same as for @option{--warnings}, with the exception that
11835 it may not be prefixed with @samp{no-} (see above).
11837 Note that the precedence of the @samp{=} and @samp{,} operators is such that
11838 the following commands are @emph{not} equivalent, as the first will not treat
11839 S/R conflicts as errors.
11842 $ @kbd{bison -Werror=yacc,conflicts-sr input.y}
11843 $ @kbd{bison -Werror=yacc,error=conflicts-sr input.y}
11847 Do not turn enabled warnings for every @var{category} into errors, unless
11848 they are explicitly enabled by @option{-Werror=@var{category}}.
11850 @item -Wno-error=@var{category}
11851 Deactivate the error treatment for this @var{category}. However, the warning
11852 itself won't be disabled, or enabled, by this option.
11855 Equivalent to @option{--color=always}.
11857 @item --color=@var{when}
11858 Control whether diagnostics are colorized, depending on @var{when}:
11862 Enable colorized diagnostics.
11866 Disable colorized diagnostics.
11868 @item auto @r{(default)}
11870 Diagnostics will be colorized if the output device is a tty, i.e. when the
11871 output goes directly to a text screen or terminal emulator window.
11874 @item --style=@var{file}
11875 Specifies the CSS style @var{file} to use when colorizing. It has an effect
11876 only when the @option{--color} option is effective. The
11877 @file{bison-default.css} file provide a good example from which to define
11878 your own style file. See the documentation of libtextstyle for more
11882 @node Tuning the Parser
11883 @subsection Tuning the Parser
11885 Options changing the generated parsers.
11887 @c Please, keep this ordered as in 'bison --help'.
11891 In the parser implementation file, define the macro @code{YYDEBUG} to 1 if
11892 it is not already defined, so that the debugging facilities are compiled.
11895 @item -D @var{name}[=@var{value}]
11896 @itemx --define=@var{name}[=@var{value}]
11897 @itemx -F @var{name}[=@var{value}]
11898 @itemx --force-define=@var{name}[=@var{value}]
11899 Each of these is equivalent to @samp{%define @var{name} @var{value}}
11900 (@pxref{%define Summary}). Note that the delimiters are part of
11901 @var{value}: @option{-Dapi.value.type=union},
11902 @option{-Dapi.value.type=@{union@}} and @option{-Dapi.value.type="union"}
11903 correspond to @samp{%define api.value.type union}, @samp{%define
11904 api.value.type @{union@}} and @samp{%define api.value.type "union"}.
11906 Bison processes multiple definitions for the same @var{name} as follows:
11910 Bison quietly ignores all command-line definitions for @var{name} except
11913 If that command-line definition is specified by a @option{-D} or
11914 @option{--define}, Bison reports an error for any @code{%define} definition
11917 If that command-line definition is specified by a @option{-F} or
11918 @option{--force-define} instead, Bison quietly ignores all @code{%define}
11919 definitions for @var{name}.
11921 Otherwise, Bison reports an error if there are multiple @code{%define}
11922 definitions for @var{name}.
11925 You should avoid using @option{-F} and @option{--force-define} in your
11926 make files unless you are confident that it is safe to quietly ignore
11927 any conflicting @code{%define} that may be added to the grammar file.
11929 @item -L @var{language}
11930 @itemx --language=@var{language}
11931 Specify the programming language for the generated parser, as if
11932 @code{%language} was specified (@pxref{Decl Summary}). Currently supported
11933 languages include C, C++, D and Java. @var{language} is case-insensitive.
11936 Pretend that @code{%locations} was specified. @xref{Decl Summary}.
11938 @item -p @var{prefix}
11939 @itemx --name-prefix=@var{prefix}
11940 Pretend that @code{%name-prefix "@var{prefix}"} was specified (@pxref{Decl
11941 Summary}). The option @option{-p} is specified by POSIX. When POSIX
11942 compatibility is not a requirement, @option{-Dapi.prefix=@var{prefix}} is a
11943 better option (@pxref{Multiple Parsers}).
11947 Don't put any @code{#line} preprocessor commands in the parser
11948 implementation file. Ordinarily Bison puts them in the parser
11949 implementation file so that the C compiler and debuggers will
11950 associate errors with your source file, the grammar file. This option
11951 causes them to associate errors with the parser implementation file,
11952 treating it as an independent source file in its own right.
11954 @item -S @var{file}
11955 @itemx --skeleton=@var{file}
11956 Specify the skeleton to use, similar to @code{%skeleton}
11957 (@pxref{Decl Summary}).
11959 @c You probably don't need this option unless you are developing Bison.
11960 @c You should use @option{--language} if you want to specify the skeleton for a
11961 @c different language, because it is clearer and because it will always
11962 @c choose the correct skeleton for non-deterministic or push parsers.
11964 If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton
11965 file in the Bison installation directory.
11966 If it does, @var{file} is an absolute file name or a file name relative to the
11967 current working directory.
11968 This is similar to how most shells resolve commands.
11971 @itemx --token-table
11972 Pretend that @code{%token-table} was specified. @xref{Decl Summary}.
11975 @itemx @anchor{option-yacc} --yacc
11976 Act more like the traditional @command{yacc} command:
11979 Generate different diagnostics (it implies @option{-Wyacc}).
11981 Generate @code{#define} statements in addition to an @code{enum} to
11982 associate token codes with token kind names.
11984 If the @code{POSIXLY_CORRECT} environment variable is defined, generate
11985 prototypes for @code{yyerror} and @code{yylex}@footnote{See
11986 @url{https://austingroupbugs.net/view.php?id=1388#c5220}.} (since Bison
11990 void yyerror (const char *);
11992 As a Bison extension, additional arguments required by @code{%pure-parser},
11993 @code{%locations}, @code{%lex-param} and @code{%parse-param} are taken into
11994 account. You may disable @code{yyerror}'s prototype with @samp{#define
11995 yyerror yyerror} (as specified by POSIX), or with @samp{#define
11996 YYERROR_IS_DECLARED} (a Bison extension). Likewise for @code{yylex}.
11998 Imitate Yacc's output file name conventions, so that the parser
11999 implementation file is called @file{y.tab.c}, and the other outputs are
12000 called @file{y.output} and @file{y.tab.h}. Do not use @option{--yacc} just
12001 to change the output file names since it also triggers all the
12002 aforementioned behavior changes; rather use @samp{-o y.tab.c}.
12005 The @option{-y}/@option{--yacc} option is intended for use with traditional
12006 Yacc grammars. This option only makes sense for the default C skeleton,
12007 @file{yacc.c}. If your grammar uses Bison extensions Bison cannot be
12008 Yacc-compatible, even if this option is specified.
12010 Thus, the following shell script can substitute for Yacc, and the Bison
12011 distribution contains such a @command{yacc} script for compatibility with
12021 @subsection Output Files
12023 Options controlling the output.
12025 @c Please, keep this ordered as in 'bison --help'.
12027 @item -H [@var{file}]
12028 @itemx --header=[@var{file}]
12029 Pretend that @code{%header} was specified, i.e., write an extra output file
12030 containing definitions for the token kind names defined in the grammar, as
12031 well as a few other declarations. @xref{Decl Summary}.
12033 @item --defines[=@var{file}]
12034 Historical name for option @option{--header} before Bison 3.8.
12037 This is the same as @option{--header} except @option{-d} does not accept a
12038 @var{file} argument since POSIX Yacc requires that @option{-d} can be
12039 bundled with other short options.
12041 @item -b @var{file-prefix}
12042 @itemx --file-prefix=@var{prefix}
12043 Pretend that @code{%file-prefix} was specified, i.e., specify prefix to use
12044 for all Bison output file names. @xref{Decl Summary}.
12046 @item -r @var{things}
12047 @itemx --report=@var{things}
12048 Write an extra output file containing verbose description of the comma
12049 separated list of @var{things} among:
12053 Description of the grammar, conflicts (resolved and unresolved), and
12054 parser's automaton.
12057 Implies @code{state} and augments the description of the automaton with
12058 the full set of items for each state, instead of its core only.
12061 Implies @code{state} and augments the description of the automaton with
12062 each rule's lookahead set.
12065 Implies @code{state}. Explain how conflicts were solved thanks to
12066 precedence and associativity directives.
12068 @item counterexamples
12070 Look for counterexamples for the conflicts. @xref{Counterexamples}.
12071 Counterexamples take time to compute. The option @option{-rcex} should be
12072 used by the developer when working on the grammar; it hardly makes sense to
12076 Enable all the items.
12079 Do not generate the report.
12082 @item --report-file=@var{file}
12083 Specify the @var{file} for the verbose description.
12087 Pretend that @code{%verbose} was specified, i.e., write an extra output
12088 file containing verbose descriptions of the grammar and
12089 parser. @xref{Decl Summary}.
12091 @item -o @var{file}
12092 @itemx --output=@var{file}
12093 Specify the @var{file} for the parser implementation file.
12095 The names of the other output files are constructed from @var{file} as
12096 described under the @option{-v} and @option{-d} options.
12098 @item -g [@var{file}]
12099 @itemx --graph[=@var{file}]
12100 Output a graphical representation of the parser's automaton computed by
12101 Bison, in @uref{https://www.graphviz.org/, Graphviz}
12102 @uref{https://www.graphviz.org/doc/info/lang.html, DOT} format.
12103 @code{@var{file}} is optional. If omitted and the grammar file is
12104 @file{foo.y}, the output file will be @file{foo.gv}.
12106 @item -x [@var{file}]
12107 @itemx --xml[=@var{file}]
12108 Output an XML report of the parser's automaton computed by Bison.
12109 @code{@var{file}} is optional.
12110 If omitted and the grammar file is @file{foo.y}, the output file will be
12113 @item -M @var{old}=@var{new}
12114 @itemx --file-prefix-map=@var{old}=@var{new}
12115 Replace prefix @var{old} with @var{new} when writing file paths in output
12119 @node Option Cross Key
12120 @section Option Cross Key
12122 Here is a list of options, alphabetized by long option, to help you find
12123 the corresponding short option and directive.
12125 @multitable {@option{--force-define=@var{name}[=@var{value}]}} {@option{-F @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}}
12126 @headitem Long Option @tab Short Option @tab Bison Directive
12127 @include cross-options.texi
12131 @section Yacc Library
12133 The Yacc library contains default implementations of the @code{yyerror} and
12134 @code{main} functions. These default implementations are normally not
12135 useful, but POSIX requires them. To use the Yacc library, link your program
12136 with the @option{-ly} option. Note that Bison's implementation of the Yacc
12137 library is distributed under the terms of the GNU General Public License
12140 If you use the Yacc library's @code{yyerror} function, you should declare
12141 @code{yyerror} as follows:
12144 int yyerror (char const *);
12148 The @code{int} value returned by this @code{yyerror} is ignored.
12150 The implementation of Yacc library's @code{main} function is:
12155 setlocale (LC_ALL, "");
12161 so if you use it, the internationalization support is enabled (e.g., error
12162 messages are translated), and your @code{yyparse} function should have the
12163 following type signature:
12166 int yyparse (void);
12169 @c ================================================= C++ Bison
12171 @node Other Languages
12172 @chapter Parsers Written In Other Languages
12174 In addition to C, Bison can generate parsers in C++, D and Java. This chapter
12175 is devoted to these languages. The reader is expected to understand how
12176 Bison works; read the introductory chapters first if you don't.
12179 * C++ Parsers:: The interface to generate C++ parser classes
12180 * D Parsers:: The interface to generate D parser classes
12181 * Java Parsers:: The interface to generate Java parser classes
12185 @section C++ Parsers
12187 The Bison parser in C++ is an object, an instance of the class
12191 * A Simple C++ Example:: A short introduction to C++ parsers
12192 * C++ Bison Interface:: Asking for C++ parser generation
12193 * C++ Parser Interface:: Instantiating and running the parser
12194 * C++ Semantic Values:: %union vs. C++
12195 * C++ Location Values:: The position and location classes
12196 * C++ Parser Context:: You can supply a @code{report_syntax_error} function.
12197 * C++ Scanner Interface:: Exchanges between yylex and parse
12198 * A Complete C++ Example:: Demonstrating their use
12201 @node A Simple C++ Example
12202 @subsection A Simple C++ Example
12204 This tutorial about C++ parsers is based on a simple, self contained
12205 example.@footnote{The sources of this example are available as
12206 @file{examples/c++/simple.yy}.} The following sections are the reference
12207 manual for Bison with C++, the last one showing a fully blown example
12208 (@pxref{A Complete C++ Example}).
12210 To look nicer, our example will be in C++14. It is not required: Bison
12211 supports the original C++98 standard.
12213 A Bison file has three parts. In the first part, the prologue, we start by
12214 making sure we run a version of Bison which is recent enough, and that we
12218 @comment file: c++/simple.yy: 1
12220 /* Simple variant-based parser. -*- C++ -*-
12222 Copyright (C) 2018-2021 Free Software Foundation, Inc.
12224 This file is part of Bison, the GNU Compiler Compiler.
12226 This program is free software: you can redistribute it and/or modify
12227 it under the terms of the GNU General Public License as published by
12228 the Free Software Foundation, either version 3 of the License, or
12229 (at your option) any later version.
12231 This program is distributed in the hope that it will be useful,
12232 but WITHOUT ANY WARRANTY; without even the implied warranty of
12233 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12234 GNU General Public License for more details.
12236 You should have received a copy of the GNU General Public License
12237 along with this program. If not, see <https://www.gnu.org/licenses/>. */
12241 @comment file: c++/simple.yy: 1
12247 Let's dive directly into the middle part: the grammar. Our input is a
12248 simple list of strings, that we display once the parsing is done.
12250 @comment file: c++/simple.yy: 2
12255 list @{ std::cout << $1 << '\n'; @}
12259 %nterm <std::vector<std::string>> list;
12262 %empty @{ /* Generates an empty string list */ @}
12263 | list item @{ $$ = $1; $$.push_back ($2); @}
12268 We used a vector of strings as a semantic value! To use genuine C++ objects
12269 as semantic values---not just PODs---we cannot rely on the union that Bison
12270 uses by default to store them, we need @emph{variants} (@pxref{C++
12273 @comment file: c++/simple.yy: 1
12275 %define api.value.type variant
12278 Obviously, the rule for @code{result} needs to print a vector of strings.
12279 In the prologue, we add:
12281 @comment file: c++/simple.yy: 1
12285 // Print a list of strings.
12287 operator<< (std::ostream& o, const std::vector<std::string>& ss)
12291 const char *sep = "";
12293 for (const auto& s: ss)
12305 You may want to move it into the @code{yy} namespace to avoid leaking it in
12306 your default namespace. We recommend that you keep the actions simple, and
12307 move details into auxiliary functions, as we did with @code{operator<<}.
12309 Our list of strings will be built from two types of items: numbers and
12312 @comment file: c++/simple.yy: 2
12314 %nterm <std::string> item;
12315 %token <std::string> TEXT;
12316 %token <int> NUMBER;
12320 | NUMBER @{ $$ = std::to_string ($1); @}
12325 In the case of @code{TEXT}, the implicit default action applies: @w{@code{$$
12330 Our scanner deserves some attention. The traditional interface of
12331 @code{yylex} is not type safe: since the token kind and the token value are
12332 not correlated, you may return a @code{NUMBER} with a string as semantic
12333 value. To avoid this, we use @emph{token constructors} (@pxref{Complete
12334 Symbols}). This directive:
12336 @comment file: c++/simple.yy: 1
12338 %define api.token.constructor
12342 requests that Bison generates the functions @code{make_TEXT} and
12343 @code{make_NUMBER}, but also @code{make_YYEOF}, for the end of input.
12345 Everything is in place for our scanner:
12347 @comment file: c++/simple.yy: 1
12353 // Return the next token.
12354 auto yylex () -> parser::symbol_type
12356 static int count = 0;
12357 switch (int stage = count++)
12361 return parser::make_TEXT ("I have three numbers for you.");
12364 case 1: case 2: case 3:
12365 return parser::make_NUMBER (stage);
12369 return parser::make_TEXT ("And that's all!");
12373 return parser::make_YYEOF ();
12381 In the epilogue, the third part of a Bison grammar file, we leave simple
12382 details: the error reporting function, and the main function.
12384 @comment file: c++/simple.yy: 3
12389 // Report an error to the user.
12390 auto parser::error (const std::string& msg) -> void
12392 std::cerr << msg << '\n';
12406 $ @kbd{bison simple.yy -o simple.cc}
12407 $ @kbd{g++ -std=c++14 simple.cc -o simple}
12410 @{I have three numbers for you., 1, 2, 3, And that's all!@}
12414 @node C++ Bison Interface
12415 @subsection C++ Bison Interface
12416 @c - %skeleton "lalr1.cc"
12418 @c - initial action
12420 The C++ deterministic parser is selected using the skeleton directive,
12421 @samp{%skeleton "lalr1.cc"}. @xref{Decl Summary}.
12423 When run, @command{bison} will create several entities in the @samp{yy}
12425 @findex %define api.namespace
12426 Use the @samp{%define api.namespace} directive to change the namespace name,
12427 see @ref{%define Summary}. The various classes are generated
12428 in the following files:
12431 @item @var{file}.hh
12432 (Assuming the extension of the grammar file was @samp{.yy}.) The
12433 declaration of the C++ parser class and auxiliary types. By default, this
12434 file is not generated (@pxref{Decl Summary}).
12436 @item @var{file}.cc
12437 The implementation of the C++ parser class. The basename and extension of
12438 these two files (@file{@var{file}.hh} and @file{@var{file}.cc}) follow the
12439 same rules as with regular C parsers (@pxref{Invocation}).
12442 Generated when both @code{%header} and @code{%locations} are enabled, this
12443 file contains the definition of the classes @code{position} and
12444 @code{location}, used for location tracking. It is not generated if
12445 @samp{%define api.location.file none} is specified, or if user defined
12446 locations are used. @xref{C++ Location Values}.
12450 Useless legacy files. To get rid of then, use @samp{%require "3.2"} or
12454 All these files are documented using Doxygen; run @command{doxygen} for a
12455 complete and accurate documentation.
12457 @node C++ Parser Interface
12458 @subsection C++ Parser Interface
12460 The output files @file{@var{file}.hh} and @file{@var{file}.cc} declare and
12461 define the parser class in the namespace @code{yy}. The class name defaults
12462 to @code{parser}, but may be changed using @samp{%define api.parser.class
12463 @{@var{name}@}}. The interface of this class is detailed below. It can be
12464 extended using the @code{%parse-param} feature: its semantics is slightly
12465 changed since it describes an additional member of the parser class, and an
12466 additional argument for its constructor.
12469 @defcv {Type} {parser} {token}
12470 A structure that contains (only) the @code{token_kind_type} enumeration,
12471 which defines the tokens. To refer to the token @code{FOO}, use
12472 @code{yy::parser::token::FOO}. The scanner can use @samp{typedef
12473 yy::parser::token token;} to ``import'' the token enumeration (@pxref{Calc++
12477 @defcv {Type} {parser} {token_kind_type}
12478 An enumeration of the token kinds. Its enumerators are forged from the
12479 token names, with a possible token prefix
12480 (@pxref{api-token-prefix,,@code{api.token.prefix}}):
12486 enum token_kind_type
12488 YYEMPTY = -2, // No token.
12489 YYEOF = 0, // "end of file"
12490 YYerror = 256, // error
12491 YYUNDEF = 257, // "invalid token"
12493 MINUS = 259, // "-"
12495 VAR = 271, // "variable"
12500 /// Token kind, as returned by yylex.
12501 typedef token::token_kind_type token_kind_type;
12505 @defcv {Type} {parser} {value_type}
12506 The types for semantic values. @xref{C++ Semantic Values}.
12509 @defcv {Type} {parser} {location_type}
12510 The type of locations, if location tracking is enabled. @xref{C++ Location
12514 @defcv {Type} {parser} {syntax_error}
12515 This class derives from @code{std::runtime_error}. Throw instances of it
12516 from the scanner or from the actions to raise parse errors. This is
12517 equivalent with first invoking @code{error} to report the location and
12518 message of the syntax error, and then to invoke @code{YYERROR} to enter the
12519 error-recovery mode. But contrary to @code{YYERROR} which can only be
12520 invoked from user actions (i.e., written in the action itself), the
12521 exception can be thrown from functions invoked from the user action.
12524 @deftypeop {Constructor} {parser} {} parser ()
12525 @deftypeopx {Constructor} {parser} {} parser (@var{type1} @var{arg1}, ...)
12526 Build a new parser object. There are no arguments, unless
12527 @samp{%parse-param @{@var{type1} @var{arg1}@}} was used.
12530 @deftypeop {Constructor} {syntax_error} {} syntax_error (@code{const location_type&} @var{l}, @code{const std::string&} @var{m})
12531 @deftypeopx {Constructor} {syntax_error} {} syntax_error (@code{const std::string&} @var{m})
12532 Instantiate a syntax-error exception.
12535 @deftypemethod {parser} {int} operator() ()
12536 @deftypemethodx {parser} {int} parse ()
12537 Run the syntactic analysis, and return 0 on success, 1 otherwise. Both
12538 routines are equivalent, @code{operator()} being more C++ish.
12541 The whole function is wrapped in a @code{try}/@code{catch} block, so that
12542 when an exception is thrown, the @code{%destructor}s are called to release
12543 the lookahead symbol, and the symbols pushed on the stack.
12545 Exception related code in the generated parser is protected by CPP guards
12546 (@code{#if}) and disabled when exceptions are not supported (i.e., passing
12547 @option{-fno-exceptions} to the C++ compiler).
12550 @deftypemethod {parser} {std::ostream&} debug_stream ()
12551 @deftypemethodx {parser} {void} set_debug_stream (@code{std::ostream&} @var{o})
12552 Get or set the stream used for tracing the parsing. It defaults to
12556 @deftypemethod {parser} {debug_level_type} debug_level ()
12557 @deftypemethodx {parser} {void} set_debug_level (debug_level_type @var{l})
12558 Get or set the tracing level (an integral). Currently its value is either
12559 0, no trace, or nonzero, full tracing.
12562 @deftypemethod {parser} {void} error (@code{const location_type&} @var{l}, @code{const std::string&} @var{m})
12563 @deftypemethodx {parser} {void} error (@code{const std::string&} @var{m})
12564 The definition for this member function must be supplied by the user: the
12565 parser uses it to report a parser error occurring at @var{l}, described by
12566 @var{m}. If location tracking is not enabled, the second signature is used.
12570 @node C++ Semantic Values
12571 @subsection C++ Semantic Values
12573 Bison supports two different means to handle semantic values in C++. One is
12574 alike the C interface, and relies on unions. As C++ practitioners know,
12575 unions are inconvenient in C++, therefore another approach is provided,
12579 * C++ Unions:: Semantic values cannot be objects
12580 * C++ Variants:: Using objects as semantic values
12584 @subsubsection C++ Unions
12586 The @code{%union} directive works as for C, see @ref{Union Decl}. In
12587 particular it produces a genuine @code{union}, which have a few specific
12591 The value type is @code{yy::parser::value_type}, not @code{YYSTYPE}.
12593 Non POD (Plain Old Data) types cannot be used. C++98 forbids any instance
12594 of classes with constructors in unions: only @emph{pointers} to such objects
12595 are allowed. C++11 relaxed this constraints, but at the cost of safety.
12598 Because objects have to be stored via pointers, memory is not
12599 reclaimed automatically: using the @code{%destructor} directive is the
12600 only means to avoid leaks. @xref{Destructor Decl}.
12603 @subsubsection C++ Variants
12605 Bison provides a @emph{variant} based implementation of semantic values for
12606 C++. This alleviates all the limitations reported in the previous section,
12607 and in particular, object types can be used without pointers.
12609 To enable variant-based semantic values, set the @code{%define} variable
12610 @code{api.value.type} to @code{variant} (@pxref{%define Summary}). Then
12611 @code{%union} is ignored; instead of using the name of the fields of the
12612 @code{%union} to ``type'' the symbols, use genuine types.
12614 For instance, instead of:
12622 %token <ival> NUMBER;
12623 %token <sval> STRING;
12630 %token <int> NUMBER;
12631 %token <std::string> STRING;
12634 @code{STRING} is no longer a pointer, which should fairly simplify the user
12635 actions in the grammar and in the scanner (in particular the memory
12638 Since C++ features destructors, and since it is customary to specialize
12639 @code{operator<<} to support uniform printing of values, variants also
12640 typically simplify Bison printers and destructors.
12642 Variants are stricter than unions. When based on unions, you may play any
12643 dirty game with @code{yylval}, say storing an @code{int}, reading a
12644 @code{char*}, and then storing a @code{double} in it. This is no longer
12645 possible with variants: they must be initialized, then assigned to, and
12646 eventually, destroyed. As a matter of fact, Bison variants forbid the use
12647 of alternative types such as @samp{$<int>2} or @samp{$<std::string>$}, even
12648 in midrule actions. It is mandatory to use typed midrule actions
12649 (@pxref{Typed Midrule Actions}).
12651 @deftypemethod {value_type} {T&} {emplace<T>} ()
12652 @deftypemethodx {value_type} {T&} {emplace<T>} (@code{const T&} @var{t})
12653 Available in C++98/C++03 only. Default construct/copy-construct from
12654 @var{t}. Return a reference to where the actual value may be stored.
12655 Requires that the variant was not initialized yet.
12658 @deftypemethod {value_type} {T&} {emplace<T, U>} (@code{U&&...} @var{u})
12659 Available in C++11 and later only. Build a variant of type @code{T} from
12660 the variadic forwarding references @var{u...}.
12663 @strong{Warning}: We do not use Boost.Variant, for two reasons. First, it
12664 appeared unacceptable to require Boost on the user's machine (i.e., the
12665 machine on which the generated parser will be compiled, not the machine on
12666 which @command{bison} was run). Second, for each possible semantic value,
12667 Boost.Variant not only stores the value, but also a tag specifying its
12668 type. But the parser already ``knows'' the type of the semantic value, so
12669 that would be duplicating the information.
12671 We do not use C++17's @code{std::variant} either: we want to support all the
12672 C++ standards, and of course @code{std::variant} also stores a tag to record
12675 Therefore we developed light-weight variants whose type tag is external (so
12676 they are really like @code{unions} for C++ actually). There is a number of
12677 limitations in (the current implementation of) variants:
12680 Alignment must be enforced: values should be aligned in memory according to
12681 the most demanding type. Computing the smallest alignment possible requires
12682 meta-programming techniques that are not currently implemented in Bison, and
12683 therefore, since, as far as we know, @code{double} is the most demanding
12684 type on all platforms, alignments are enforced for @code{double} whatever
12685 types are actually used. This may waste space in some cases.
12688 There might be portability issues we are not aware of.
12691 As far as we know, these limitations @emph{can} be alleviated. All it takes
12692 is some time and/or some talented C++ hacker willing to contribute to Bison.
12694 @node C++ Location Values
12695 @subsection C++ Location Values
12697 When the directive @code{%locations} is used, the C++ parser supports
12698 location tracking, see @ref{Tracking Locations}.
12700 By default, two auxiliary classes define a @code{position}, a single point
12701 in a file, and a @code{location}, a range composed of a pair of
12702 @code{position}s (possibly spanning several files). If the @code{%define}
12703 variable @code{api.location.type} is defined, then these classes will not be
12704 generated, and the user defined type will be used.
12707 * C++ position:: One point in the source file
12708 * C++ location:: Two points in the source file
12709 * Exposing the Location Classes:: Using the Bison location class in your
12711 * User Defined Location Type:: Required interface for locations
12715 @subsubsection C++ @code{position}
12717 @defcv {Type} {position} {filename_type}
12718 The base type for file names. Defaults to @code{const std::string}.
12719 @xref{api-filename-type,,@code{api.filename.type}}, to change its definition.
12722 @defcv {Type} {position} {counter_type}
12723 The type used to store line and column numbers. Defined as @code{int}.
12726 @deftypeop {Constructor} {position} {} position (@code{filename_type*} @var{file} = nullptr, @code{counter_type} @var{line} = 1, @code{counter_type} @var{col} = 1)
12727 Create a @code{position} denoting a given point. Note that @code{file} is
12728 not reclaimed when the @code{position} is destroyed: memory managed must be
12732 @deftypemethod {position} {void} initialize (@code{filename_type*} @var{file} = nullptr, @code{counter_type} @var{line} = 1, @code{counter_type} @var{col} = 1)
12733 Reset the position to the given values.
12736 @deftypeivar {position} {filename_type*} file
12737 The name of the file. It will always be handled as a pointer, the parser
12738 will never duplicate nor deallocate it.
12741 @deftypeivar {position} {counter_type} line
12742 The line, starting at 1.
12745 @deftypemethod {position} {void} lines (@code{counter_type} @var{height} = 1)
12746 If @var{height} is not null, advance by @var{height} lines, resetting the
12747 column number. The resulting line number cannot be less than 1.
12750 @deftypeivar {position} {counter_type} column
12751 The column, starting at 1.
12754 @deftypemethod {position} {void} columns (@code{counter_type} @var{width} = 1)
12755 Advance by @var{width} columns, without changing the line number. The
12756 resulting column number cannot be less than 1.
12759 @deftypemethod {position} {position&} operator+= (@code{counter_type} @var{width})
12760 @deftypemethodx {position} {position} operator+ (@code{counter_type} @var{width})
12761 @deftypemethodx {position} {position&} operator-= (@code{counter_type} @var{width})
12762 @deftypemethodx {position} {position} operator- (@code{counter_type} @var{width})
12763 Various forms of syntactic sugar for @code{columns}.
12766 @deftypemethod {position} {bool} operator== (@code{const position&} @var{that})
12767 @deftypemethodx {position} {bool} operator!= (@code{const position&} @var{that})
12768 Whether @code{*this} and @code{that} denote equal/different positions.
12771 @deftypefun {std::ostream&} operator<< (@code{std::ostream&} @var{o}, @code{const position&} @var{p})
12772 Report @var{p} on @var{o} like this:
12773 @samp{@var{file}:@var{line}.@var{column}}, or
12774 @samp{@var{line}.@var{column}} if @var{file} is null.
12778 @subsubsection C++ @code{location}
12780 @deftypeop {Constructor} {location} {} location (@code{const position&} @var{begin}, @code{const position&} @var{end})
12781 Create a @code{Location} from the endpoints of the range.
12784 @deftypeop {Constructor} {location} {} location (@code{const position&} @var{pos} = position())
12785 @deftypeopx {Constructor} {location} {} location (@code{filename_type*} @var{file}, @code{counter_type} @var{line}, @code{counter_type} @var{col})
12786 Create a @code{Location} denoting an empty range located at a given point.
12789 @deftypemethod {location} {void} initialize (@code{filename_type*} @var{file} = nullptr, @code{counter_type} @var{line} = 1, @code{counter_type} @var{col} = 1)
12790 Reset the location to an empty range at the given values.
12793 @deftypeivar {location} {position} begin
12794 @deftypeivarx {location} {position} end
12795 The first, inclusive, position of the range, and the first beyond.
12798 @deftypemethod {location} {void} columns (@code{counter_type} @var{width} = 1)
12799 @deftypemethodx {location} {void} lines (@code{counter_type} @var{height} = 1)
12800 Forwarded to the @code{end} position.
12803 @deftypemethod {location} {location} operator+ (@code{counter_type} @var{width})
12804 @deftypemethodx {location} {location} operator+= (@code{counter_type} @var{width})
12805 @deftypemethodx {location} {location} operator- (@code{counter_type} @var{width})
12806 @deftypemethodx {location} {location} operator-= (@code{counter_type} @var{width})
12807 Various forms of syntactic sugar for @code{columns}.
12810 @deftypemethod {location} {location} operator+ (@code{const location&} @var{end})
12811 @deftypemethodx {location} {location} operator+= (@code{const location&} @var{end})
12812 Join two locations: starts at the position of the first one, and ends at the
12813 position of the second.
12816 @deftypemethod {location} {void} step ()
12817 Move @code{begin} onto @code{end}.
12820 @deftypemethod {location} {bool} operator== (@code{const location&} @var{that})
12821 @deftypemethodx {location} {bool} operator!= (@code{const location&} @var{that})
12822 Whether @code{*this} and @code{that} denote equal/different ranges of
12826 @deftypefun {std::ostream&} operator<< (@code{std::ostream&} @var{o}, @code{const location&} @var{p})
12827 Report @var{p} on @var{o}, taking care of special cases such as: no
12828 @code{filename} defined, or equal filename/line or column.
12831 @node Exposing the Location Classes
12832 @subsubsection Exposing the Location Classes
12834 When both @code{%header} and @code{%locations} are enabled, Bison generates
12835 an additional file: @file{location.hh}. If you don't use locations outside
12836 of the parser, you may avoid its creation with @samp{%define
12837 api.location.file none}.
12839 However this file is useful if, for instance, your parser builds an abstract
12840 syntax tree decorated with locations: you may use Bison's @code{location}
12841 type independently of Bison's parser. You may name the file differently,
12842 e.g., @samp{%define api.location.file "include/ast/location.hh"}: this name
12843 can have directory components, or even be absolute. The way the location
12844 file is included is controlled by @code{api.location.include}.
12846 This way it is possible to have several parsers share the same location
12849 For instance, in @file{src/foo/parser.yy}, generate the
12850 @file{include/ast/loc.hh} file:
12853 // src/foo/parser.yy
12855 %define api.namespace @{foo@}
12856 %define api.location.file "include/ast/loc.hh"
12857 %define api.location.include @{<ast/loc.hh>@}
12861 and use it in @file{src/bar/parser.yy}:
12864 // src/bar/parser.yy
12866 %define api.namespace @{bar@}
12867 %code requires @{#include <ast/loc.hh>@}
12868 %define api.location.type @{bar::location@}
12871 Absolute file names are supported; it is safe in your @file{Makefile} to
12873 @option{-Dapi.location.file='"$(top_srcdir)/include/ast/loc.hh"'} to
12874 @command{bison} for @file{src/foo/parser.yy}. The generated file will not
12875 have references to this absolute path, thanks to @samp{%define
12876 api.location.include @{<ast/loc.hh>@}}. Adding @samp{-I
12877 $(top_srcdir)/include} to your @code{CPPFLAGS} will suffice for the compiler
12878 to find @file{ast/loc.hh}.
12880 @node User Defined Location Type
12881 @subsubsection User Defined Location Type
12882 @findex %define api.location.type
12884 Instead of using the built-in types you may use the @code{%define} variable
12885 @code{api.location.type} to specify your own type:
12888 %define api.location.type @{@var{LocationType}@}
12891 The requirements over your @var{LocationType} are:
12894 it must be copyable;
12897 in order to compute the (default) value of @code{@@$} in a reduction, the
12898 parser basically runs
12900 @@$.begin = @@1.begin;
12901 @@$.end = @@@var{N}.end; // The location of last right-hand side symbol.
12904 so there must be copyable @code{begin} and @code{end} members;
12907 alternatively you may redefine the computation of the default location, in
12908 which case these members are not required (@pxref{Location Default Action});
12911 if traces are enabled, then there must exist an @samp{std::ostream&
12912 operator<< (std::ostream& o, const @var{LocationType}& s)} function.
12917 In programs with several C++ parsers, you may also use the @code{%define}
12918 variable @code{api.location.type} to share a common set of built-in
12919 definitions for @code{position} and @code{location}. For instance, one
12920 parser @file{master/parser.yy} might use:
12925 %define api.namespace @{master::@}
12929 to generate the @file{master/position.hh} and @file{master/location.hh}
12930 files, reused by other parsers as follows:
12933 %define api.location.type @{master::location@}
12934 %code requires @{ #include <master/location.hh> @}
12938 @node C++ Parser Context
12939 @subsection C++ Parser Context
12941 When @samp{%define parse.error custom} is used (@pxref{Syntax Error
12942 Reporting Function}), the user must define the following function.
12944 @deftypemethod {parser} {void} report_syntax_error (@code{const context_type&}@var{ctx}) @code{const}
12945 Report a syntax error to the user. Whether it uses @code{yyerror} is up to
12949 Use the following types and functions to build the error message.
12951 @defcv {Type} {parser} {context}
12952 A type that captures the circumstances of the syntax error.
12955 @defcv {Type} {parser} {symbol_kind_type}
12956 An enum of all the grammar symbols, tokens and nonterminals. Its
12957 enumerators are forged from the symbol names:
12962 enum symbol_kind_type
12964 S_YYEMPTY = -2, // No symbol.
12965 S_YYEOF = 0, // "end of file"
12966 S_YYERROR = 1, // error
12967 S_YYUNDEF = 2, // "invalid token"
12969 S_MINUS = 4, // "-"
12971 S_VAR = 14, // "variable"
12973 S_YYACCEPT = 16, // $accept
12975 S_input = 18 // input
12978 typedef symbol_kind::symbol_kind_t symbol_kind_type;
12982 @deftypemethod {context} {const symbol_type&} lookahead () @code{const}
12983 The ``unexpected'' token: the lookahead that caused the syntax error.
12986 @deftypemethod {context} {symbol_kind_type} token () @code{const}
12987 The symbol kind of the lookahead token that caused the syntax error. Returns
12988 @code{symbol_kind::S_YYEMPTY} if there is no lookahead.
12991 @deftypemethod {context} {const location&} location () @code{const}
12992 The location of the syntax error (that of the lookahead).
12995 @deftypemethod {context} int expected_tokens (@code{symbol_kind_type} @var{argv}@code{[]}, @code{int} @var{argc}) @code{const}
12996 Fill @var{argv} with the expected tokens, which never includes
12997 @code{symbol_kind::S_YYEMPTY}, @code{symbol_kind::S_YYERROR}, or
12998 @code{symbol_kind::S_YYUNDEF}.
13000 Never put more than @var{argc} elements into @var{argv}, and on success
13001 return the number of tokens stored in @var{argv}. If there are more
13002 expected tokens than @var{argc}, fill @var{argv} up to @var{argc} and return
13003 0. If there are no expected tokens, also return 0, but set @code{argv[0]}
13004 to @code{symbol_kind::S_YYEMPTY}.
13006 If @var{argv} is null, return the size needed to store all the possible
13007 values, which is always less than @code{YYNTOKENS}.
13010 @deftypemethod {parser} {const char *} symbol_name (@code{symbol_kind_t} @var{symbol}) @code{const}
13011 The name of the symbol whose kind is @var{symbol}, possibly translated.
13013 Returns a @code{std::string} when @code{parse.error} is @code{verbose}.
13016 A custom syntax error function looks as follows. This implementation is
13017 inappropriate for internationalization, see the @file{c/bistromathic}
13018 example for a better alternative.
13022 yy::parser::report_syntax_error (const context& ctx)
13025 std::cerr << ctx.location () << ": syntax error";
13026 // Report the tokens expected at this point.
13028 enum @{ TOKENMAX = 5 @};
13029 symbol_kind_type expected[TOKENMAX];
13030 int n = ctx.expected_tokens (ctx, expected, TOKENMAX);
13031 for (int i = 0; i < n; ++i)
13032 std::cerr << i == 0 ? ": expected " : " or "
13033 << symbol_name (expected[i]);
13035 // Report the unexpected token.
13037 symbol_kind_type lookahead = ctx.token ();
13038 if (lookahead != symbol_kind::S_YYEMPTY)
13039 std::cerr << " before " << symbol_name (lookahead));
13045 You still must provide a @code{yyerror} function, used for instance to
13046 report memory exhaustion.
13049 @node C++ Scanner Interface
13050 @subsection C++ Scanner Interface
13051 @c - prefix for yylex.
13052 @c - Pure interface to yylex
13055 The parser invokes the scanner by calling @code{yylex}. Contrary to C
13056 parsers, C++ parsers are always pure: there is no point in using the
13057 @samp{%define api.pure} directive. The actual interface with @code{yylex}
13058 depends whether you use unions, or variants.
13061 * Split Symbols:: Passing symbols as two/three components
13062 * Complete Symbols:: Making symbols a whole
13065 @node Split Symbols
13066 @subsubsection Split Symbols
13068 The generated parser expects @code{yylex} to have the following prototype.
13070 @deftypefun {int} yylex (@code{value_type*} @var{yylval}, @code{location_type*} @var{yylloc}, @var{type1} @var{arg1}, @dots{})
13071 @deftypefunx {int} yylex (@code{value_type*} @var{yylval}, @var{type1} @var{arg1}, @dots{})
13072 Return the next token. Its kind is the return value, its semantic value and
13073 location (if enabled) being @var{yylval} and @var{yylloc}. Invocations of
13074 @samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments.
13077 Note that when using variants, the interface for @code{yylex} is the same,
13078 but @code{yylval} is handled differently.
13080 Regular union-based code in Lex scanner typically looks like:
13084 yylval->ival = text_to_int (yytext);
13085 return yy::parser::token::INTEGER;
13088 yylval->sval = new std::string (yytext);
13089 return yy::parser::token::IDENTIFIER;
13093 Using variants, @code{yylval} is already constructed, but it is not
13094 initialized. So the code would look like:
13098 yylval->emplace<int> () = text_to_int (yytext);
13099 return yy::parser::token::INTEGER;
13102 yylval->emplace<std::string> () = yytext;
13103 return yy::parser::token::IDENTIFIER;
13112 yylval->emplace (text_to_int (yytext));
13113 return yy::parser::token::INTEGER;
13116 yylval->emplace (yytext);
13117 return yy::parser::token::IDENTIFIER;
13122 @node Complete Symbols
13123 @subsubsection Complete Symbols
13125 With both @code{%define api.value.type variant} and @code{%define
13126 api.token.constructor}, the parser defines the type @code{symbol_type}, and
13127 expects @code{yylex} to have the following prototype.
13129 @deftypefun {parser::symbol_type} yylex ()
13130 @deftypefunx {parser::symbol_type} yylex (@var{type1} @var{arg1}, @dots{})
13131 Return a @emph{complete} symbol, aggregating its type (i.e., the traditional
13132 value returned by @code{yylex}), its semantic value, and possibly its
13133 location. Invocations of @samp{%lex-param @{@var{type1} @var{arg1}@}} yield
13134 additional arguments.
13137 @defcv {Type} {parser} {symbol_type}
13138 A ``complete symbol'', that binds together its kind, value and (when
13139 applicable) location.
13142 @deftypemethod {symbol_type} {symbol_kind_type} kind () @code{const}
13143 The kind of this symbol.
13146 @deftypemethod {symbol_type} {const char *} name () @code{const}
13147 The name of the kind of this symbol.
13149 Returns a @code{std::string} when @code{parse.error} is @code{verbose}.
13154 For each token kind, Bison generates named constructors as follows.
13156 @deftypeop {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const @var{value_type}&} @var{value}, @code{const location_type&} @var{location})
13157 @deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const location_type&} @var{location})
13158 @deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token}, @code{const @var{value_type}&} @var{value})
13159 @deftypeopx {Constructor} {parser::symbol_type} {} {symbol_type} (@code{int} @var{token})
13160 Build a complete terminal symbol for the token kind @var{token} (including
13161 the @code{api.token.prefix}), whose semantic value, if it has one, is
13162 @var{value} of adequate @var{value_type}. Pass the @var{location} iff
13163 location tracking is enabled.
13165 Consistency between @var{token} and @var{value_type} is checked via an
13169 For instance, given the following declarations:
13172 %define api.token.prefix @{TOK_@}
13173 %token <std::string> IDENTIFIER;
13174 %token <int> INTEGER;
13179 you may use these constructors:
13182 symbol_type (int token, const std::string&, const location_type&);
13183 symbol_type (int token, const int&, const location_type&);
13184 symbol_type (int token, const location_type&);
13187 Correct matching between token kinds and value types is checked via
13188 @code{assert}; for instance, @samp{symbol_type (ID, 42)} would abort. Named
13189 constructors are preferable (see below), as they offer better type safety
13190 (for instance @samp{make_ID (42)} would not even compile), but symbol_type
13191 constructors may help when token kinds are discovered at run-time, e.g.,
13196 if (auto i = lookup_keyword (yytext))
13197 return yy::parser::symbol_type (i, loc);
13199 return yy::parser::make_ID (yytext, loc);
13206 Note that it is possible to generate and compile type incorrect code
13207 (e.g. @samp{symbol_type (':', yytext, loc)}). It will fail at run time,
13208 provided the assertions are enabled (i.e., @option{-DNDEBUG} was not passed
13209 to the compiler). Bison supports an alternative that guarantees that type
13210 incorrect code will not even compile. Indeed, it generates @emph{named
13211 constructors} as follows.
13213 @deftypemethod {parser} {symbol_type} {make_@var{token}} (@code{const @var{value_type}&} @var{value}, @code{const location_type&} @var{location})
13214 @deftypemethodx {parser} {symbol_type} {make_@var{token}} (@code{const location_type&} @var{location})
13215 @deftypemethodx {parser} {symbol_type} {make_@var{token}} (@code{const @var{value_type}&} @var{value})
13216 @deftypemethodx {parser} {symbol_type} {make_@var{token}} ()
13217 Build a complete terminal symbol for the token kind @var{token} (not
13218 including the @code{api.token.prefix}), whose semantic value, if it has one,
13219 is @var{value} of adequate @var{value_type}. Pass the @var{location} iff
13220 location tracking is enabled.
13223 For instance, given the following declarations:
13226 %define api.token.prefix @{TOK_@}
13227 %token <std::string> IDENTIFIER;
13228 %token <int> INTEGER;
13237 symbol_type make_IDENTIFIER (const std::string&, const location_type&);
13238 symbol_type make_INTEGER (const int&, const location_type&);
13239 symbol_type make_COLON (const location_type&);
13240 symbol_type make_EOF (const location_type&);
13244 which should be used in a scanner as follows.
13247 [a-z]+ return yy::parser::make_IDENTIFIER (yytext, loc);
13248 [0-9]+ return yy::parser::make_INTEGER (text_to_int (yytext), loc);
13249 ":" return yy::parser::make_COLON (loc);
13250 <<EOF>> return yy::parser::make_EOF (loc);
13253 Tokens that do not have an identifier are not accessible: you cannot simply
13254 use characters such as @code{':'}, they must be declared with @code{%token},
13255 including the end-of-file token.
13258 @node A Complete C++ Example
13259 @subsection A Complete C++ Example
13261 This section demonstrates the use of a C++ parser with a simple but complete
13262 example. This example should be available on your system, ready to compile,
13263 in the directory @file{examples/c++/calc++}. It focuses on
13264 the use of Bison, therefore the design of the various C++ classes is very
13265 naive: no accessors, no encapsulation of members etc. We will use a Lex
13266 scanner, and more precisely, a Flex scanner, to demonstrate the various
13267 interactions. A hand-written scanner is actually easier to interface with.
13270 * Calc++ --- C++ Calculator:: The specifications
13271 * Calc++ Parsing Driver:: An active parsing context
13272 * Calc++ Parser:: A parser class
13273 * Calc++ Scanner:: A pure C++ Flex scanner
13274 * Calc++ Top Level:: Conducting the band
13277 @node Calc++ --- C++ Calculator
13278 @subsubsection Calc++ --- C++ Calculator
13280 Of course the grammar is dedicated to arithmetic, a single expression,
13281 possibly preceded by variable assignments. An environment containing
13282 possibly predefined variables such as @code{one} and @code{two}, is
13283 exchanged with the parser. An example of valid input follows.
13287 seven := one + two * three
13291 @node Calc++ Parsing Driver
13292 @subsubsection Calc++ Parsing Driver
13294 @c - A place to store error messages
13295 @c - A place for the result
13297 To support a pure interface with the parser (and the scanner) the technique
13298 of the ``parsing context'' is convenient: a structure containing all the
13299 data to exchange. Since, in addition to simply launch the parsing, there
13300 are several auxiliary tasks to execute (open the file for scanning,
13301 instantiate the parser etc.), we recommend transforming the simple parsing
13302 context structure into a fully blown @dfn{parsing driver} class.
13304 The declaration of this driver class, in @file{driver.hh}, is as follows.
13305 The first part includes the CPP guard and imports the required standard
13306 library components, and the declaration of the parser class.
13309 @comment file: c++/calc++/driver.hh
13311 /* Driver for calc++. -*- C++ -*-
13313 Copyright (C) 2005-2015, 2018-2021 Free Software Foundation, Inc.
13315 This file is part of Bison, the GNU Compiler Compiler.
13317 This program is free software: you can redistribute it and/or modify
13318 it under the terms of the GNU General Public License as published by
13319 the Free Software Foundation, either version 3 of the License, or
13320 (at your option) any later version.
13322 This program is distributed in the hope that it will be useful,
13323 but WITHOUT ANY WARRANTY; without even the implied warranty of
13324 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13325 GNU General Public License for more details.
13327 You should have received a copy of the GNU General Public License
13328 along with this program. If not, see <https://www.gnu.org/licenses/>. */
13332 @comment file: c++/calc++/driver.hh
13338 # include "parser.hh"
13343 Then comes the declaration of the scanning function. Flex expects the
13344 signature of @code{yylex} to be defined in the macro @code{YY_DECL}, and the
13345 C++ parser expects it to be declared. We can factor both as follows.
13347 @comment file: c++/calc++/driver.hh
13349 // Give Flex the prototype of yylex we want ...
13351 yy::parser::symbol_type yylex (driver& drv)
13352 // ... and declare it for the parser's sake.
13357 The @code{driver} class is then declared with its most obvious members.
13359 @comment file: c++/calc++/driver.hh
13361 // Conducting the whole scanning and parsing of Calc++.
13367 std::map<std::string, int> variables;
13373 The main routine is of course calling the parser.
13375 @comment file: c++/calc++/driver.hh
13377 // Run the parser on file F. Return 0 on success.
13378 int parse (const std::string& f);
13379 // The name of the file being parsed.
13381 // Whether to generate parser debug traces.
13382 bool trace_parsing;
13386 To encapsulate the coordination with the Flex scanner, it is useful to have
13387 member functions to open and close the scanning phase.
13389 @comment file: c++/calc++/driver.hh
13391 // Handling the scanner.
13392 void scan_begin ();
13394 // Whether to generate scanner debug traces.
13395 bool trace_scanning;
13396 // The token's location used by the scanner.
13397 yy::location location;
13399 #endif // ! DRIVER_HH
13402 The implementation of the driver (@file{driver.cc}) is straightforward.
13405 @comment file: c++/calc++/driver.cc
13407 /* Driver for calc++. -*- C++ -*-
13409 Copyright (C) 2005-2015, 2018-2021 Free Software Foundation, Inc.
13411 This file is part of Bison, the GNU Compiler Compiler.
13413 This program is free software: you can redistribute it and/or modify
13414 it under the terms of the GNU General Public License as published by
13415 the Free Software Foundation, either version 3 of the License, or
13416 (at your option) any later version.
13418 This program is distributed in the hope that it will be useful,
13419 but WITHOUT ANY WARRANTY; without even the implied warranty of
13420 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13421 GNU General Public License for more details.
13423 You should have received a copy of the GNU General Public License
13424 along with this program. If not, see <https://www.gnu.org/licenses/>. */
13428 @comment file: c++/calc++/driver.cc
13430 #include "driver.hh"
13431 #include "parser.hh"
13435 : trace_parsing (false), trace_scanning (false)
13437 variables["one"] = 1;
13438 variables["two"] = 2;
13443 The @code{parse} member function deserves some attention.
13445 @comment file: c++/calc++/driver.cc
13449 driver::parse (const std::string &f)
13452 location.initialize (&file);
13454 yy::parser parse (*this);
13455 parse.set_debug_level (trace_parsing);
13456 int res = parse ();
13463 @node Calc++ Parser
13464 @subsubsection Calc++ Parser
13466 The grammar file @file{parser.yy} starts by asking for the C++ deterministic
13467 parser skeleton, the creation of the parser header file. Because the C++
13468 skeleton changed several times, it is safer to require the version you
13469 designed the grammar for.
13472 @comment file: c++/calc++/parser.yy
13474 /* Parser for calc++. -*- C++ -*-
13476 Copyright (C) 2005-2015, 2018-2021 Free Software Foundation, Inc.
13478 This file is part of Bison, the GNU Compiler Compiler.
13480 This program is free software: you can redistribute it and/or modify
13481 it under the terms of the GNU General Public License as published by
13482 the Free Software Foundation, either version 3 of the License, or
13483 (at your option) any later version.
13485 This program is distributed in the hope that it will be useful,
13486 but WITHOUT ANY WARRANTY; without even the implied warranty of
13487 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13488 GNU General Public License for more details.
13490 You should have received a copy of the GNU General Public License
13491 along with this program. If not, see <https://www.gnu.org/licenses/>. */
13495 @comment file: c++/calc++/parser.yy
13497 %skeleton "lalr1.cc" // -*- C++ -*-
13498 %require "@value{VERSION}"
13503 @findex %define api.token.raw
13504 Because our scanner returns only genuine tokens and never simple characters
13505 (i.e., it returns @samp{PLUS}, not @samp{'+'}), we can avoid conversions.
13507 @comment file: c++/calc++/parser.yy
13509 %define api.token.raw
13513 @findex %define api.token.constructor
13514 @findex %define api.value.type variant
13515 This example uses genuine C++ objects as semantic values, therefore, we
13516 require the variant-based storage of semantic values. To make sure we
13517 properly use it, we enable assertions. To fully benefit from type-safety
13518 and more natural definition of ``symbol'', we enable
13519 @code{api.token.constructor}.
13521 @comment file: c++/calc++/parser.yy
13523 %define api.token.constructor
13524 %define api.value.type variant
13525 %define parse.assert
13529 @findex %code requires
13530 Then come the declarations/inclusions needed by the semantic values.
13531 Because the parser uses the parsing driver and reciprocally, both would like
13532 to include the header of the other, which is, of course, insane. This
13533 mutual dependency will be broken using forward declarations. Because the
13534 driver's header needs detailed knowledge about the parser class (in
13535 particular its inner types), it is the parser's header which will use a
13536 forward declaration of the driver. @xref{%code Summary}.
13538 @comment file: c++/calc++/parser.yy
13549 The driver is passed by reference to the parser and to the scanner.
13550 This provides a simple but effective pure interface, not relying on
13553 @comment file: c++/calc++/parser.yy
13555 // The parsing context.
13556 %param @{ driver& drv @}
13560 Then we request location tracking.
13562 @comment file: c++/calc++/parser.yy
13568 Use the following two directives to enable parser tracing and detailed error
13569 messages. However, detailed error messages can contain incorrect
13570 information if lookahead correction is not enabled (@pxref{LAC}).
13572 @comment file: c++/calc++/parser.yy
13574 %define parse.trace
13575 %define parse.error detailed
13576 %define parse.lac full
13581 The code between @samp{%code @{} and @samp{@}} is output in the @file{*.cc}
13582 file; it needs detailed knowledge about the driver.
13584 @comment file: c++/calc++/parser.yy
13588 # include "driver.hh"
13595 User friendly names are provided for each symbol. To avoid name clashes in
13596 the generated files (@pxref{Calc++ Scanner}), prefix tokens with @code{TOK_}
13597 (@pxref{%define Summary}).
13599 @comment file: c++/calc++/parser.yy
13601 %define api.token.prefix @{TOK_@}
13614 Since we use variant-based semantic values, @code{%union} is not used, and
13615 @code{%token}, @code{%nterm} and @code{%type} expect genuine types, not type
13618 @comment file: c++/calc++/parser.yy
13620 %token <std::string> IDENTIFIER "identifier"
13621 %token <int> NUMBER "number"
13626 No @code{%destructor} is needed to enable memory deallocation during error
13627 recovery; the memory, for strings for instance, will be reclaimed by the
13628 regular destructors. All the values are printed using their
13629 @code{operator<<} (@pxref{Printer Decl}).
13631 @comment file: c++/calc++/parser.yy
13633 %printer @{ yyo << $$; @} <*>;
13637 The grammar itself is straightforward (@pxref{Location Tracking Calc}).
13639 @comment file: c++/calc++/parser.yy
13643 unit: assignments exp @{ drv.result = $2; @};
13647 | assignments assignment @{@};
13650 "identifier" ":=" exp @{ drv.variables[$1] = $3; @};
13656 | "identifier" @{ $$ = drv.variables[$1]; @}
13657 | exp "+" exp @{ $$ = $1 + $3; @}
13658 | exp "-" exp @{ $$ = $1 - $3; @}
13659 | exp "*" exp @{ $$ = $1 * $3; @}
13660 | exp "/" exp @{ $$ = $1 / $3; @}
13661 | "(" exp ")" @{ $$ = $2; @}
13666 Finally the @code{error} member function reports the errors.
13668 @comment file: c++/calc++/parser.yy
13671 yy::parser::error (const location_type& l, const std::string& m)
13673 std::cerr << l << ": " << m << '\n';
13677 @node Calc++ Scanner
13678 @subsubsection Calc++ Scanner
13680 In addition to standard headers, the Flex scanner includes the driver's,
13681 then the parser's to get the set of defined tokens.
13684 @comment file: c++/calc++/scanner.ll
13686 /* Scanner for calc++. -*- C++ -*-
13688 Copyright (C) 2005-2015, 2018-2021 Free Software Foundation, Inc.
13690 This file is part of Bison, the GNU Compiler Compiler.
13692 This program is free software: you can redistribute it and/or modify
13693 it under the terms of the GNU General Public License as published by
13694 the Free Software Foundation, either version 3 of the License, or
13695 (at your option) any later version.
13697 This program is distributed in the hope that it will be useful,
13698 but WITHOUT ANY WARRANTY; without even the implied warranty of
13699 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13700 GNU General Public License for more details.
13702 You should have received a copy of the GNU General Public License
13703 along with this program. If not, see <https://www.gnu.org/licenses/>. */
13707 @comment file: c++/calc++/scanner.ll
13709 %@{ /* -*- C++ -*- */
13711 # include <climits>
13712 # include <cstdlib>
13713 # include <cstring> // strerror
13715 # include "driver.hh"
13716 # include "parser.hh"
13721 @comment file: c++/calc++/scanner.ll
13724 #if defined __clang__
13725 # define CLANG_VERSION (__clang_major__ * 100 + __clang_minor__)
13728 // Clang and ICC like to pretend they are GCC.
13729 #if defined __GNUC__ && !defined __clang__ && !defined __ICC
13730 # define GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
13733 // Pacify warnings in yy_init_buffer (observed with Flex 2.6.4)
13734 // and GCC 6.4.0, 7.3.0 with -O3.
13735 #if defined GCC_VERSION && 600 <= GCC_VERSION
13736 # pragma GCC diagnostic ignored "-Wnull-dereference"
13739 // This example uses Flex's C back end, yet compiles it as C++.
13740 // So expect warnings about C style casts and NULL.
13741 #if defined CLANG_VERSION && 500 <= CLANG_VERSION
13742 # pragma clang diagnostic ignored "-Wold-style-cast"
13743 # pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
13744 #elif defined GCC_VERSION && 407 <= GCC_VERSION
13745 # pragma GCC diagnostic ignored "-Wold-style-cast"
13746 # pragma GCC diagnostic ignored "-Wzero-as-null-pointer-constant"
13749 #define FLEX_VERSION (YY_FLEX_MAJOR_VERSION * 100 + YY_FLEX_MINOR_VERSION)
13751 // Old versions of Flex (2.5.35) generate an incomplete documentation comment.
13753 // In file included from src/scan-code-c.c:3:
13754 // src/scan-code.c:2198:21: error: empty paragraph passed to '@param' command
13755 // [-Werror,-Wdocumentation]
13756 // * @param line_number
13757 // ~~~~~~~~~~~~~~~~~^
13758 // 1 error generated.
13759 #if FLEX_VERSION < 206 && defined CLANG_VERSION
13760 # pragma clang diagnostic ignored "-Wdocumentation"
13763 // Old versions of Flex (2.5.35) use 'register'. Warnings introduced in
13764 // GCC 7 and Clang 6.
13765 #if FLEX_VERSION < 206
13766 # if defined CLANG_VERSION && 600 <= CLANG_VERSION
13767 # pragma clang diagnostic ignored "-Wdeprecated-register"
13768 # elif defined GCC_VERSION && 700 <= GCC_VERSION
13769 # pragma GCC diagnostic ignored "-Wregister"
13773 #if FLEX_VERSION < 206
13774 # if defined CLANG_VERSION
13775 # pragma clang diagnostic ignored "-Wconversion"
13776 # pragma clang diagnostic ignored "-Wdocumentation"
13777 # pragma clang diagnostic ignored "-Wshorten-64-to-32"
13778 # pragma clang diagnostic ignored "-Wsign-conversion"
13779 # elif defined GCC_VERSION
13780 # pragma GCC diagnostic ignored "-Wconversion"
13781 # pragma GCC diagnostic ignored "-Wsign-conversion"
13785 // Flex 2.6.4, GCC 9
13786 // warning: useless cast to type 'int' [-Wuseless-cast]
13787 // 1361 | YY_CURRENT_BUFFER_LVALUE->yy_buf_size = (int) (new_size - 2);
13789 #if defined GCC_VERSION && 900 <= GCC_VERSION
13790 # pragma GCC diagnostic ignored "-Wuseless-cast"
13797 Since our calculator has no @code{#include}-like feature, we don't need
13798 @code{yywrap}. We don't need the @code{unput} and @code{input} functions
13799 either, and we parse an actual file, this is not an interactive session with
13800 the user. Finally, we enable scanner tracing.
13802 @comment file: c++/calc++/scanner.ll
13804 %option noyywrap nounput noinput batch debug
13808 The following function will be handy to convert a string denoting a number
13809 into a @code{NUMBER} token.
13811 @comment file: c++/calc++/scanner.ll
13814 // A number symbol corresponding to the value in S.
13815 yy::parser::symbol_type
13816 make_NUMBER (const std::string &s, const yy::parser::location_type& loc);
13821 Abbreviations allow for more readable rules.
13823 @comment file: c++/calc++/scanner.ll
13825 id [a-zA-Z][a-zA-Z_0-9]*
13831 The following paragraph suffices to track locations accurately. Each time
13832 @code{yylex} is invoked, the begin position is moved onto the end position.
13833 Then when a pattern is matched, its width is added to the end column. When
13834 matching ends of lines, the end cursor is adjusted, and each time blanks are
13835 matched, the begin cursor is moved onto the end cursor to effectively ignore
13836 the blanks preceding tokens. Comments would be treated equally.
13838 @comment file: c++/calc++/scanner.ll
13842 // Take 8-space tabulations into account.
13843 void add_columns (yy::location& loc, const char *buf, int bufsize)
13845 for (int i = 0; i < bufsize; ++i)
13846 loc.columns (buf[i] == '\t' ? 8 - ((loc.end.column - 1) & 7) : 1);
13848 // Code run each time a pattern is matched.
13849 #define YY_USER_ACTION add_columns (loc, yytext, yyleng);
13855 // A handy shortcut to the location held by the driver.
13856 yy::location& loc = drv.location;
13857 // Code run each time yylex is called.
13861 @{blank@}+ loc.step ();
13862 \n+ loc.lines (yyleng); loc.step ();
13866 The rules are simple. The driver is used to report errors.
13868 @comment file: c++/calc++/scanner.ll
13870 "-" return yy::parser::make_MINUS (loc);
13871 "+" return yy::parser::make_PLUS (loc);
13872 "*" return yy::parser::make_STAR (loc);
13873 "/" return yy::parser::make_SLASH (loc);
13874 "(" return yy::parser::make_LPAREN (loc);
13875 ")" return yy::parser::make_RPAREN (loc);
13876 ":=" return yy::parser::make_ASSIGN (loc);
13878 @{int@} return make_NUMBER (yytext, loc);
13879 @{id@} return yy::parser::make_IDENTIFIER (yytext, loc);
13882 throw yy::parser::syntax_error
13883 (loc, "invalid character: " + std::string(yytext));
13886 <<EOF>> return yy::parser::make_YYEOF (loc);
13891 You should keep your rules simple, both in the parser and in the scanner.
13892 Throwing from the auxiliary functions is then very handy to report errors.
13894 @comment file: c++/calc++/scanner.ll
13897 yy::parser::symbol_type
13898 make_NUMBER (const std::string &s, const yy::parser::location_type& loc)
13901 long n = strtol (s.c_str(), NULL, 10);
13902 if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
13903 throw yy::parser::syntax_error (loc, "integer is out of range: " + s);
13904 return yy::parser::make_NUMBER ((int) n, loc);
13910 Finally, because the scanner-related driver's member-functions depend
13911 on the scanner's data, it is simpler to implement them in this file.
13913 @comment file: c++/calc++/scanner.ll
13917 driver::scan_begin ()
13919 yy_flex_debug = trace_scanning;
13920 if (file.empty () || file == "-")
13922 else if (!(yyin = fopen (file.c_str (), "r")))
13924 std::cerr << "cannot open " << file << ": " << strerror (errno) << '\n';
13925 exit (EXIT_FAILURE);
13932 driver::scan_end ()
13939 @node Calc++ Top Level
13940 @subsubsection Calc++ Top Level
13942 The top level file, @file{calc++.cc}, poses no problem.
13945 @comment file: c++/calc++/calc++.cc
13947 /* Main for calc++. -*- C++ -*-
13949 Copyright (C) 2005-2015, 2018-2021 Free Software Foundation, Inc.
13951 This file is part of Bison, the GNU Compiler Compiler.
13953 This program is free software: you can redistribute it and/or modify
13954 it under the terms of the GNU General Public License as published by
13955 the Free Software Foundation, either version 3 of the License, or
13956 (at your option) any later version.
13958 This program is distributed in the hope that it will be useful,
13959 but WITHOUT ANY WARRANTY; without even the implied warranty of
13960 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13961 GNU General Public License for more details.
13963 You should have received a copy of the GNU General Public License
13964 along with this program. If not, see <https://www.gnu.org/licenses/>. */
13968 @comment file: c++/calc++/calc++.cc
13970 #include <iostream>
13971 #include "driver.hh"
13975 main (int argc, char *argv[])
13979 for (int i = 1; i < argc; ++i)
13980 if (argv[i] == std::string ("-p"))
13981 drv.trace_parsing = true;
13982 else if (argv[i] == std::string ("-s"))
13983 drv.trace_scanning = true;
13984 else if (!drv.parse (argv[i]))
13985 std::cout << drv.result << '\n';
13997 * D Bison Interface:: Asking for D parser generation
13998 * D Semantic Values:: %token and %nterm vs. D
13999 * D Location Values:: The position and location classes
14000 * D Parser Interface:: Instantiating and running the parser
14001 * D Parser Context Interface:: Circumstances of a syntax error
14002 * D Scanner Interface:: Specifying the scanner for the parser
14003 * D Action Features:: Special features for use in actions
14004 * D Push Parser Interface:: Instantiating and running the push parser
14005 * D Complete Symbols:: Using token constructors
14008 @node D Bison Interface
14009 @subsection D Bison Interface
14012 The D parser skeletons are selected using the @code{%language "D"}
14013 directive or the @option{-L D}/@option{--language=D} option.
14015 @c FIXME: Documented bug.
14016 When generating a D parser, @samp{bison @var{basename}.y} will create a
14017 single D source file named @file{@var{basename}.d} containing the
14018 parser implementation. Using a grammar file without a @file{.y} suffix is
14019 currently broken. The basename of the parser implementation file can be
14020 changed by the @code{%file-prefix} directive or the
14021 @option{-b}/@option{--file-prefix} option. The entire parser implementation
14022 file name can be changed by the @code{%output} directive or the
14023 @option{-o}/@option{--output} option. The parser implementation file
14024 contains a single class for the parser.
14026 You can create documentation for generated parsers using Ddoc.
14028 GLR parsers are currently unsupported in D. Do not use the
14029 @code{glr-parser} directive.
14031 No header file can be generated for D parsers. Do not use the
14032 @code{%header} directive or the @option{-d}/@option{--header} options.
14034 @node D Semantic Values
14035 @subsection D Semantic Values
14037 Semantic types are handled by @code{%union} and @samp{%define api.value.type
14038 union}, similar to C/C++ parsers. In the latter case, the union of the
14039 values is handled by the backend. In D, unions can hold classes, structs,
14040 etc., so this directive is more similar to @samp{%define api.value.type
14043 D parsers do not support @code{%destructor}, since the language
14044 adopts garbage collection. The parser will try to hold references
14045 to semantic values for as little time as needed.
14047 D parsers support @code{%printer}. An example for the output of type
14048 @code{int}, where @code{yyo} is the parser's debug output:
14051 %printer @{ yyo.write($$); @} <int>
14055 @node D Location Values
14056 @subsection D Location Values
14058 @c - class Position
14059 @c - class Location
14061 When the directive @code{%locations} is used, the D parser supports location
14062 tracking, see @ref{Tracking Locations}. The position and the location
14063 structures are provided.
14065 @deftypeivar {Location} {Position} begin
14066 @deftypeivarx {Location} {Position} end
14067 The first, inclusive, position of the range, and the first beyond.
14070 @deftypeop {Constructor} {Location} {} this(@code{Position} @var{loc})
14071 Create a @code{Location} denoting an empty range located at a given point.
14074 @deftypeop {Constructor} {Location} {} this(@code{Position} @var{begin}, @code{Position} @var{end})
14075 Create a @code{Location} from the endpoints of the range.
14078 @deftypemethod {Location} {string} toString()
14079 The range represented by the location as a string.
14083 @node D Parser Interface
14084 @subsection D Parser Interface
14086 The name of the generated parser class defaults to @code{YYParser}. The
14087 @code{YY} prefix may be changed using the @samp{%define api.prefix}.
14088 Alternatively, use @samp{%define api.parser.class @{@var{name}@}} to give a
14089 custom name to the class. The interface of this class is detailed below.
14091 By default, the parser class has public visibility. To add modifiers to the
14092 parser class, @code{%define} @code{api.parser.public},
14093 @code{api.parser.abstract} and/or @code{api.parser.final}.
14095 The superclass and the implemented interfaces of the parser class can be
14096 specified with the @samp{%define api.parser.extends} and @samp{%define
14097 api.parser.implements} directives.
14099 The parser class defines an interface, @code{Lexer} (@pxref{D Scanner
14100 Interface}). Other than this interface and the members described in the
14101 interface below, all the other members and fields are preceded with a
14102 @code{yy} or @code{YY} prefix to avoid clashes with user code.
14104 The parser class can be extended using the @code{%parse-param}
14105 directive. Each occurrence of the directive will add a by default public
14106 field to the parser class, and an argument to its constructor, which
14107 initializes them automatically.
14109 @deftypeop {Constructor} {YYParser} {} this(@var{lex_param}, @dots{}, @var{parse_param}, @dots{})
14110 Build a new parser object with embedded @samp{%code lexer}. There are no
14111 parameters, unless @code{%param}s and/or @code{%parse-param}s and/or
14112 @code{%lex-param}s are used.
14115 @deftypeop {Constructor} {YYParser} {} this(@code{Lexer} @var{lexer}, @var{parse_param}, @dots{})
14116 Build a new parser object using the specified scanner. There are no
14117 additional parameters unless @code{%param}s and/or @code{%parse-param}s are
14121 @deftypemethod {YYParser} {boolean} parse()
14122 Run the syntactic analysis, and return @code{true} on success,
14123 @code{false} otherwise.
14126 @deftypemethod {YYParser} {boolean} getErrorVerbose()
14127 @deftypemethodx {YYParser} {void} setErrorVerbose(boolean @var{verbose})
14128 Get or set the option to produce verbose error messages. These are only
14129 available with @samp{%define parse.error detailed},
14130 which also turns on verbose error messages.
14133 @deftypemethod {YYParser} {void} yyerror(@code{string} @var{msg})
14134 @deftypemethodx {YYParser} {void} yyerror(@code{Location} @var{loc}, @code{string} @var{msg})
14135 Print an error message using the @code{yyerror} method of the scanner
14136 instance in use. The @code{Location} and @code{Position} parameters are
14137 available only if location tracking is active.
14140 @deftypemethod {YYParser} {boolean} recovering()
14141 During the syntactic analysis, return @code{true} if recovering
14142 from a syntax error.
14143 @xref{Error Recovery}.
14146 @deftypemethod {YYParser} {File} getDebugStream()
14147 @deftypemethodx {YYParser} {void} setDebugStream(@code{File} @var{o})
14148 Get or set the stream used for tracing the parsing. It defaults to
14152 @deftypemethod {YYParser} {int} getDebugLevel()
14153 @deftypemethodx {YYParser} {void} setDebugLevel(@code{int} @var{l})
14154 Get or set the tracing level. Currently its value is either 0, no trace,
14155 or nonzero, full tracing.
14158 @deftypecv {Constant} {YYParser} {string} {bisonVersion}
14159 @deftypecvx {Constant} {YYParser} {string} {bisonSkeleton}
14160 Identify the Bison version and skeleton used to generate this parser.
14163 The internationalization in D is very similar to the one in C. The D
14164 parser uses @code{dgettext} for translating Bison messages.
14166 To enable internationalization, compile using @samp{-version ENABLE_NLS
14167 -version YYENABLE_NLS} and import @code{bindtextdomain} and
14168 @code{textdomain} from C:
14171 extern(C) char* bindtextdomain(const char* domainname, const char* dirname);
14172 extern(C) char* textdomain(const char* domainname);
14175 The main function should load the translation catalogs, similarly to the
14176 @file{c/bistromathic} example:
14181 import core.stdc.locale;
14183 // Set up internationalization.
14184 setlocale(LC_ALL, "");
14185 // Use Bison's standard translation catalog for error messages
14186 // (the generated messages).
14187 bindtextdomain("bison-runtime", BISON_LOCALEDIR);
14188 // For the translation catalog of your own project, use the
14189 // name of your project.
14190 bindtextdomain("bison", LOCALEDIR);
14191 textdomain("bison");
14193 // usual main content
14198 For user message translations, the user must implement the @samp{string
14199 _(const char* @var{msg})} function. It is recommended to use
14204 static if (!is(typeof(_)))
14206 version(ENABLE_NLS)
14208 extern(C) char* gettext(const char*);
14209 string _(const char* s)
14211 return to!string(gettext(s));
14215 static if (!is(typeof(_)))
14217 pragma(inline, true)
14218 string _(string msg) @{ return msg; @}
14223 @node D Parser Context Interface
14224 @subsection D Parser Context Interface
14225 The parser context provides information to build error reports when you
14226 invoke @samp{%define parse.error custom}.
14228 @defcv {Type} {YYParser} {SymbolKind}
14229 A struct containing an enum of all the grammar symbols, tokens and
14230 nonterminals. Its enumerators are forged from the symbol names. Use
14231 @samp{void toString(W)(W sink)} to get the symbol names.
14234 @deftypemethod {YYParser.Context} {YYParser.SymbolKind} getToken()
14235 The kind of the lookahead. Return @code{null} iff there is no lookahead.
14238 @deftypemethod {YYParser.Context} {YYParser.Location} getLocation()
14239 The location of the lookahead.
14242 @deftypemethod {YYParser.Context} {int} getExpectedTokens(@code{YYParser.SymbolKind[]} @var{argv}, @code{int} @var{argc})
14243 Fill @var{argv} with the expected tokens, which never includes
14244 @code{SymbolKind.YYERROR}, or @code{SymbolKind.YYUNDEF}.
14246 Never put more than @var{argc} elements into @var{argv}, and on success
14247 return the number of tokens stored in @var{argv}. If there are more
14248 expected tokens than @var{argc}, fill @var{argv} up to @var{argc} and return
14249 0. If there are no expected tokens, also return 0, but set @code{argv[0]}
14252 If @var{argv} is null, return the size needed to store all the possible
14253 values, which is always less than @code{YYNTOKENS}.
14257 @node D Scanner Interface
14258 @subsection D Scanner Interface
14261 @c - Lexer interface
14263 There are two possible ways to interface a Bison-generated D parser
14264 with a scanner: the scanner may be defined by @code{%code lexer}, or
14265 defined elsewhere. In either case, the scanner has to implement the
14266 @code{Lexer} inner interface of the parser class. This interface also
14267 contains constants for all user-defined token names and the predefined
14268 @code{YYEOF} token.
14270 In the first case, the body of the scanner class is placed in
14271 @code{%code lexer} blocks. If you want to pass parameters from the
14272 parser constructor to the scanner constructor, specify them with
14273 @code{%lex-param}; they are passed before @code{%parse-param}s to the
14276 In the second case, the scanner has to implement the @code{Lexer} interface,
14277 which is defined within the parser class (e.g., @code{YYParser.Lexer}).
14278 The constructor of the parser object will then accept an object
14279 implementing the interface; @code{%lex-param} is not used in this
14282 In both cases, the scanner has to implement the following methods.
14284 @deftypemethod {Lexer} {void} yyerror(@code{Location} @var{loc}, @code{string} @var{msg})
14285 This method is defined by the user to emit an error message. The first
14286 parameter is omitted if location tracking is not active.
14289 @deftypemethod {Lexer} {Symbol} yylex()
14290 Return the next token. The return value is of type @code{Symbol}, which
14291 binds together the kind, the semantic value and the location.
14294 @deftypemethod {Lexer} {void} reportSyntaxError(@code{YYParser.Context} @var{ctx})
14295 If you invoke @samp{%define parse.error custom} (@pxref{Bison
14296 Declarations}), then the parser no longer passes syntax error messages to
14297 @code{yyerror}, rather it delegates that task to the user by calling the
14298 @code{reportSyntaxError} function.
14300 Whether it uses @code{yyerror} is up to the user.
14302 Here is an example of a reporting function (@pxref{D Parser Context
14306 public void reportSyntaxError(YYParser.Context ctx)
14308 stderr.write(ctx.getLocation(), ": syntax error");
14309 // Report the expected tokens.
14311 immutable int TOKENMAX = 5;
14312 YYParser.SymbolKind[] arg = new YYParser.SymbolKind[TOKENMAX];
14313 int n = ctx.getExpectedTokens(arg, TOKENMAX);
14315 for (int i = 0; i < n; ++i)
14316 stderr.write((i == 0 ? ": expected " : " or "), arg[i]);
14318 // Report the unexpected token which triggered the error.
14320 YYParser.SymbolKind lookahead = ctx.getToken();
14321 stderr.writeln(" before ", lookahead);
14327 This implementation is inappropriate for internationalization, see
14328 the @file{c/bistromathic} example for a better alternative.
14331 @node D Action Features
14332 @subsection Special Features for Use in D Actions
14334 Here is a table of Bison constructs, variables and functions that are useful in
14337 @deffn {Variable} $$
14338 Acts like a variable that contains the semantic value for the
14339 grouping made by the current rule. @xref{Actions}.
14342 @deffn {Variable} $@var{n}
14343 Acts like a variable that contains the semantic value for the
14344 @var{n}th component of the current rule. @xref{Actions}.
14347 @deffn {Function} yyerrok
14348 Resume generating error messages immediately for subsequent syntax
14349 errors. This is useful primarily in error rules.
14350 @xref{Error Recovery}.
14353 @node D Push Parser Interface
14354 @subsection D Push Parser Interface
14355 @c - define push_parse
14356 @findex %define api.push-pull
14358 Normally, Bison generates a pull parser for D.
14359 The following Bison declaration says that you want the parser to be a push
14360 parser (@pxref{%define Summary}):
14363 %define api.push-pull push
14366 Most of the discussion about the D pull Parser Interface, (@pxref{D
14367 Parser Interface}) applies to the push parser interface as well.
14369 When generating a push parser, the method @code{pushParse} is created with
14370 the following signature:
14372 @deftypemethod {YYParser} {int} pushParse (@code{Symbol} @var{sym})
14375 The primary difference with respect to a pull parser is that the parser
14376 method @code{pushParse} is invoked repeatedly to parse each token. This
14377 function is available if either the @samp{%define api.push-pull push} or
14378 @samp{%define api.push-pull both} declaration is used (@pxref{%define
14381 The value returned by the @code{pushParse} method is one of the following:
14382 @code{ACCEPT}, @code{ABORT}, or @code{PUSH_MORE}. This new value,
14383 @code{PUSH_MORE}, may be returned if more input is required to finish
14386 If @code{api.push-pull} is defined as @code{both}, then the generated parser
14387 class will also implement the @code{parse} method. This method's body is a
14388 loop that repeatedly invokes the scanner and then passes the values obtained
14389 from the scanner to the @code{pushParse} method.
14391 @node D Complete Symbols
14392 @subsection D Complete Symbols
14394 To build return values for @code{yylex}, call the @code{Symbol} method of
14395 the same name as the token kind reported, and adding the parameters for
14396 value and location if necessary. These methods generate compile-time errors
14397 if the parameters are inconsistent. Token constructors work with both
14398 @code{%union} and @samp{%define api.value.type union}.
14400 The order of the parameters is the same as for the @code{Symbol}
14401 constructor. An example for the token kind @code{NUM}, which has value
14402 @code{ival} and with location tracking activated:
14405 Symbol.NUM(ival, location);
14409 @section Java Parsers
14412 * Java Bison Interface:: Asking for Java parser generation
14413 * Java Semantic Values:: %token and %nterm vs. Java
14414 * Java Location Values:: The position and location classes
14415 * Java Parser Interface:: Instantiating and running the parser
14416 * Java Parser Context Interface:: Circumstances of a syntax error
14417 * Java Scanner Interface:: Specifying the scanner for the parser
14418 * Java Action Features:: Special features for use in actions
14419 * Java Push Parser Interface:: Instantiating and running the push parser
14420 * Java Differences:: Differences between C/C++ and Java Grammars
14421 * Java Declarations Summary:: List of Bison declarations used with Java
14424 @node Java Bison Interface
14425 @subsection Java Bison Interface
14426 @c - %language "Java"
14428 The Java parser skeletons are selected using the @code{%language "Java"}
14429 directive or the @option{-L java}/@option{--language=java} option.
14431 @c FIXME: Documented bug.
14432 When generating a Java parser, @samp{bison @var{basename}.y} will create a
14433 single Java source file named @file{@var{basename}.java} containing the
14434 parser implementation. Using a grammar file without a @file{.y} suffix is
14435 currently broken. The basename of the parser implementation file can be
14436 changed by the @code{%file-prefix} directive or the
14437 @option{-b}/@option{--file-prefix} option. The entire parser implementation
14438 file name can be changed by the @code{%output} directive or the
14439 @option{-o}/@option{--output} option. The parser implementation file
14440 contains a single class for the parser.
14442 You can create documentation for generated parsers using Javadoc.
14444 Contrary to C parsers, Java parsers do not use global variables; the state
14445 of the parser is always local to an instance of the parser class.
14446 Therefore, all Java parsers are ``pure'', and the @code{%define api.pure}
14447 directive does nothing when used in Java.
14449 GLR parsers are currently unsupported in Java. Do not use the
14450 @code{glr-parser} directive.
14452 No header file can be generated for Java parsers. Do not use the
14453 @code{%header} directive or the @option{-d}/@option{-H}/@option{--header}
14456 @c FIXME: Possible code change.
14457 Currently, support for tracing is always compiled in. Thus the
14458 @samp{%define parse.trace} and @samp{%token-table} directives and the
14459 @option{-t}/@option{--debug} and @option{-k}/@option{--token-table} options
14460 have no effect. This may change in the future to eliminate unused code in
14461 the generated parser, so use @samp{%define parse.trace} explicitly if
14462 needed. Also, in the future the @code{%token-table} directive might enable
14463 a public interface to access the token names and codes.
14465 Getting a ``code too large'' error from the Java compiler means the code hit
14466 the 64KB bytecode per method limitation of the Java class file. Try
14467 reducing the amount of code in actions and static initializers; otherwise,
14468 report a bug so that the parser skeleton will be improved.
14471 @node Java Semantic Values
14472 @subsection Java Semantic Values
14474 There is no @code{%union} directive in Java parsers. Instead, the semantic
14475 values' types (class names) should be specified in the @code{%nterm} or
14476 @code{%token} directive:
14479 %nterm <Expression> expr assignment_expr term factor
14480 %nterm <Integer> number
14483 By default, the semantic stack is declared to have @code{Object} members,
14484 which means that the class types you specify can be of any class.
14485 To improve the type safety of the parser, you can declare the common
14486 superclass of all the semantic values using the @samp{%define api.value.type}
14487 directive. For example, after the following declaration:
14490 %define api.value.type @{ASTNode@}
14494 any @code{%token}, @code{%nterm} or @code{%type} specifying a semantic type
14495 which is not a subclass of @code{ASTNode}, will cause a compile-time error.
14497 @c FIXME: Documented bug.
14498 Types used in the directives may be qualified with a package name.
14499 Primitive data types are accepted for Java version 1.5 or later. Note
14500 that in this case the autoboxing feature of Java 1.5 will be used.
14501 Generic types may not be used; this is due to a limitation in the
14502 implementation of Bison, and may change in future releases.
14504 Java parsers do not support @code{%destructor}, since the language
14505 adopts garbage collection. The parser will try to hold references
14506 to semantic values for as little time as needed.
14508 Java parsers do not support @code{%printer}, as @code{toString()}
14509 can be used to print the semantic values. This however may change
14510 (in a backwards-compatible way) in future versions of Bison.
14513 @node Java Location Values
14514 @subsection Java Location Values
14516 @c - class Position
14517 @c - class Location
14519 When the directive @code{%locations} is used, the Java parser supports
14520 location tracking, see @ref{Tracking Locations}. An auxiliary user-defined
14521 class defines a @dfn{position}, a single point in a file; Bison itself
14522 defines a class representing a @dfn{location}, a range composed of a pair of
14523 positions (possibly spanning several files). The location class is an inner
14524 class of the parser; the name is @code{Location} by default, and may also be
14525 renamed using @code{%define api.location.type @{@var{class-name}@}}.
14527 The location class treats the position as a completely opaque value.
14528 By default, the class name is @code{Position}, but this can be changed
14529 with @code{%define api.position.type @{@var{class-name}@}}. This class must
14530 be supplied by the user.
14533 @deftypeivar {Location} {Position} begin
14534 @deftypeivarx {Location} {Position} end
14535 The first, inclusive, position of the range, and the first beyond.
14538 @deftypeop {Constructor} {Location} {} Location (@code{Position} @var{loc})
14539 Create a @code{Location} denoting an empty range located at a given point.
14542 @deftypeop {Constructor} {Location} {} Location (@code{Position} @var{begin}, @code{Position} @var{end})
14543 Create a @code{Location} from the endpoints of the range.
14546 @deftypemethod {Location} {String} toString ()
14547 Prints the range represented by the location. For this to work
14548 properly, the position class should override the @code{equals} and
14549 @code{toString} methods appropriately.
14553 @node Java Parser Interface
14554 @subsection Java Parser Interface
14556 The name of the generated parser class defaults to @code{YYParser}. The
14557 @code{YY} prefix may be changed using the @samp{%define api.prefix}.
14558 Alternatively, use @samp{%define api.parser.class @{@var{name}@}} to give a
14559 custom name to the class. The interface of this class is detailed below.
14561 By default, the parser class has package visibility. A declaration
14562 @samp{%define api.parser.public} will change to public visibility. Remember
14563 that, according to the Java language specification, the name of the
14564 @file{.java} file should match the name of the class in this case.
14565 Similarly, you can use @code{api.parser.abstract}, @code{api.parser.final}
14566 and @code{api.parser.strictfp} with the @code{%define} declaration to add
14567 other modifiers to the parser class. A single @samp{%define
14568 api.parser.annotations @{@var{annotations}@}} directive can be used to add
14569 any number of annotations to the parser class.
14571 The Java package name of the parser class can be specified using the
14572 @samp{%define package} directive. The superclass and the implemented
14573 interfaces of the parser class can be specified with the @code{%define
14574 api.parser.extends} and @samp{%define api.parser.implements} directives.
14576 The parser class defines an inner class, @code{Location}, that is used
14577 for location tracking (see @ref{Java Location Values}), and a inner
14578 interface, @code{Lexer} (see @ref{Java Scanner Interface}). Other than
14579 these inner class/interface, and the members described in the interface
14580 below, all the other members and fields are preceded with a @code{yy} or
14581 @code{YY} prefix to avoid clashes with user code.
14583 The parser class can be extended using the @code{%parse-param}
14584 directive. Each occurrence of the directive will add a @code{protected
14585 final} field to the parser class, and an argument to its constructor,
14586 which initializes them automatically.
14588 @deftypeop {Constructor} {YYParser} {} YYParser (@var{lex_param}, @dots{}, @var{parse_param}, @dots{})
14589 Build a new parser object with embedded @code{%code lexer}. There are
14590 no parameters, unless @code{%param}s and/or @code{%parse-param}s and/or
14591 @code{%lex-param}s are used.
14593 Use @code{%code init} for code added to the start of the constructor
14594 body. This is especially useful to initialize superclasses. Use
14595 @samp{%define init_throws} to specify any uncaught exceptions.
14598 @deftypeop {Constructor} {YYParser} {} YYParser (@code{Lexer} @var{lexer}, @var{parse_param}, @dots{})
14599 Build a new parser object using the specified scanner. There are no
14600 additional parameters unless @code{%param}s and/or @code{%parse-param}s are
14603 If the scanner is defined by @code{%code lexer}, this constructor is
14604 declared @code{protected} and is called automatically with a scanner
14605 created with the correct @code{%param}s and/or @code{%lex-param}s.
14607 Use @code{%code init} for code added to the start of the constructor
14608 body. This is especially useful to initialize superclasses. Use
14609 @samp{%define init_throws} to specify any uncaught exceptions.
14612 @deftypemethod {YYParser} {boolean} parse ()
14613 Run the syntactic analysis, and return @code{true} on success,
14614 @code{false} otherwise.
14617 @deftypemethod {YYParser} {boolean} getErrorVerbose ()
14618 @deftypemethodx {YYParser} {void} setErrorVerbose (boolean @var{verbose})
14619 Get or set the option to produce verbose error messages. These are only
14620 available with @samp{%define parse.error detailed} (or @samp{verbose}),
14621 which also turns on verbose error messages.
14624 @deftypemethod {YYParser} {void} yyerror (@code{String} @var{msg})
14625 @deftypemethodx {YYParser} {void} yyerror (@code{Position} @var{pos}, @code{String} @var{msg})
14626 @deftypemethodx {YYParser} {void} yyerror (@code{Location} @var{loc}, @code{String} @var{msg})
14627 Print an error message using the @code{yyerror} method of the scanner
14628 instance in use. The @code{Location} and @code{Position} parameters are
14629 available only if location tracking is active.
14632 @deftypemethod {YYParser} {boolean} recovering ()
14633 During the syntactic analysis, return @code{true} if recovering
14634 from a syntax error.
14635 @xref{Error Recovery}.
14638 @deftypemethod {YYParser} {java.io.PrintStream} getDebugStream ()
14639 @deftypemethodx {YYParser} {void} setDebugStream (@code{java.io.PrintStream} @var{o})
14640 Get or set the stream used for tracing the parsing. It defaults to
14644 @deftypemethod {YYParser} {int} getDebugLevel ()
14645 @deftypemethodx {YYParser} {void} setDebugLevel (@code{int} @var{l})
14646 Get or set the tracing level. Currently its value is either 0, no trace,
14647 or nonzero, full tracing.
14650 @deftypecv {Constant} {YYParser} {String} {bisonVersion}
14651 @deftypecvx {Constant} {YYParser} {String} {bisonSkeleton}
14652 Identify the Bison version and skeleton used to generate this parser.
14655 If you enabled token internationalization (@pxref{Token I18n}), you must
14656 provide the parser with the following function:
14658 @deftypecv {Static Method} {YYParser} {String} {i18n} (@code{string} @var{s})
14659 Return the translation of @var{s} in the user's language. As an example:
14663 static ResourceBundle myResources
14664 = ResourceBundle.getBundle("domain-name");
14665 static final String i18n(String s) @{
14666 return myResources.getString(s);
14672 @node Java Parser Context Interface
14673 @subsection Java Parser Context Interface
14675 The parser context provides information to build error reports when you
14676 invoke @samp{%define parse.error custom}.
14678 @defcv {Type} {YYParser} {SymbolKind}
14679 An enum of all the grammar symbols, tokens and nonterminals. Its
14680 enumerators are forged from the symbol names:
14683 public enum SymbolKind
14685 S_YYEOF(0), /* "end of file" */
14686 S_YYERROR(1), /* error */
14687 S_YYUNDEF(2), /* "invalid token" */
14688 S_BANG(3), /* "!" */
14689 S_PLUS(4), /* "+" */
14690 S_MINUS(5), /* "-" */
14692 S_NUM(13), /* "number" */
14693 S_NEG(14), /* NEG */
14694 S_YYACCEPT(15), /* $accept */
14695 S_input(16), /* input */
14696 S_line(17); /* line */
14701 @deftypemethod {YYParser.SymbolKind} {String} getName ()
14702 The name of this symbol, possibly translated.
14705 @deftypemethod {YYParser.Context} {YYParser.SymbolKind} getToken ()
14706 The kind of the lookahead. Return @code{null} iff there is no lookahead.
14709 @deftypemethod {YYParser.Context} {YYParser.Location} getLocation ()
14710 The location of the lookahead.
14713 @deftypemethod {YYParser.Context} {int} getExpectedTokens (@code{YYParser.SymbolKind[]} @var{argv}, @code{int} @var{argc})
14714 Fill @var{argv} with the expected tokens, which never includes
14715 @code{SymbolKind.S_YYERROR}, or @code{SymbolKind.S_YYUNDEF}.
14717 Never put more than @var{argc} elements into @var{argv}, and on success
14718 return the number of tokens stored in @var{argv}. If there are more
14719 expected tokens than @var{argc}, fill @var{argv} up to @var{argc} and return
14720 0. If there are no expected tokens, also return 0, but set @code{argv[0]}
14723 If @var{argv} is null, return the size needed to store all the possible
14724 values, which is always less than @code{YYNTOKENS}.
14728 @node Java Scanner Interface
14729 @subsection Java Scanner Interface
14732 @c - Lexer interface
14734 There are two possible ways to interface a Bison-generated Java parser
14735 with a scanner: the scanner may be defined by @code{%code lexer}, or
14736 defined elsewhere. In either case, the scanner has to implement the
14737 @code{Lexer} inner interface of the parser class. This interface also
14738 contains constants for all user-defined token names and the predefined
14739 @code{YYEOF} token.
14741 In the first case, the body of the scanner class is placed in
14742 @code{%code lexer} blocks. If you want to pass parameters from the
14743 parser constructor to the scanner constructor, specify them with
14744 @code{%lex-param}; they are passed before @code{%parse-param}s to the
14747 In the second case, the scanner has to implement the @code{Lexer} interface,
14748 which is defined within the parser class (e.g., @code{YYParser.Lexer}).
14749 The constructor of the parser object will then accept an object
14750 implementing the interface; @code{%lex-param} is not used in this
14753 In both cases, the scanner has to implement the following methods.
14755 @deftypemethod {Lexer} {void} yyerror (@code{Location} @var{loc}, @code{String} @var{msg})
14756 This method is defined by the user to emit an error message. The first
14757 parameter is omitted if location tracking is not active. Its type can be
14758 changed using @code{%define api.location.type @{@var{class-name}@}}.
14761 @deftypemethod {Lexer} {int} yylex ()
14762 Return the next token. Its type is the return value, its semantic value and
14763 location are saved and returned by the their methods in the interface. Not
14764 needed for push-only parsers.
14766 Use @samp{%define lex_throws} to specify any uncaught exceptions.
14767 Default is @code{java.io.IOException}.
14770 @deftypemethod {Lexer} {Position} getStartPos ()
14771 @deftypemethodx {Lexer} {Position} getEndPos ()
14772 Return respectively the first position of the last token that @code{yylex}
14773 returned, and the first position beyond it. These methods are not needed
14774 unless location tracking and pull parsing are active.
14776 They should return new objects for each call, to avoid that all the symbol
14777 share the same Position boundaries.
14779 The return type can be changed using @code{%define api.position.type
14780 @{@var{class-name}@}}.
14783 @deftypemethod {Lexer} {Object} getLVal ()
14784 Return the semantic value of the last token that yylex returned. Not needed
14785 for push-only parsers.
14787 The return type can be changed using @samp{%define api.value.type
14788 @{@var{class-name}@}}.
14791 @deftypemethod {Lexer} {void} reportSyntaxError (@code{YYParser.Context} @var{ctx})
14792 If you invoke @samp{%define parse.error custom} (@pxref{Bison
14793 Declarations}), then the parser no longer passes syntax error messages to
14794 @code{yyerror}, rather it delegates that task to the user by calling the
14795 @code{reportSyntaxError} function.
14797 Whether it uses @code{yyerror} is up to the user.
14799 Here is an example of a reporting function (@pxref{Java Parser Context
14803 public void reportSyntaxError(YYParser.Context ctx) @{
14804 System.err.print(ctx.getLocation() + ": syntax error");
14805 // Report the expected tokens.
14807 final int TOKENMAX = 5;
14808 YYParser.SymbolKind[] arg = new YYParser.SymbolKind[TOKENMAX];
14809 int n = ctx.getExpectedTokens(arg, TOKENMAX);
14810 for (int i = 0; i < n; ++i)
14811 System.err.print((i == 0 ? ": expected " : " or ")
14812 + arg[i].getName());
14814 // Report the unexpected token which triggered the error.
14816 YYParser.SymbolKind lookahead = ctx.getToken();
14817 if (lookahead != null)
14818 System.err.print(" before " + lookahead.getName());
14820 System.err.println("");
14825 This implementation is inappropriate for internationalization, see the
14826 @file{c/bistromathic} example for a better alternative.
14829 @node Java Action Features
14830 @subsection Special Features for Use in Java Actions
14832 The following special constructs can be uses in Java actions.
14833 Other analogous C action features are currently unavailable for Java.
14835 Use @samp{%define throws} to specify any uncaught exceptions from parser
14836 actions, and initial actions specified by @code{%initial-action}.
14839 The semantic value for the @var{n}th component of the current rule.
14840 This may not be assigned to.
14841 @xref{Java Semantic Values}.
14844 @defvar $<@var{typealt}>@var{n}
14845 Like @code{$@var{n}} but specifies a alternative type @var{typealt}.
14846 @xref{Java Semantic Values}.
14850 The semantic value for the grouping made by the current rule. As a
14851 value, this is in the base type (@code{Object} or as specified by
14852 @samp{%define api.value.type}) as in not cast to the declared subtype because
14853 casts are not allowed on the left-hand side of Java assignments.
14854 Use an explicit Java cast if the correct subtype is needed.
14855 @xref{Java Semantic Values}.
14858 @defvar $<@var{typealt}>$
14859 Same as @code{$$} since Java always allow assigning to the base type.
14860 Perhaps we should use this and @code{$<>$} for the value and @code{$$}
14861 for setting the value but there is currently no easy way to distinguish
14863 @xref{Java Semantic Values}.
14867 The location information of the @var{n}th component of the current rule.
14868 This may not be assigned to.
14869 @xref{Java Location Values}.
14873 The location information of the grouping made by the current rule.
14874 @xref{Java Location Values}.
14877 @deftypefn {Statement} return YYABORT @code{;}
14878 Return immediately from the parser, indicating failure.
14879 @xref{Java Parser Interface}.
14882 @deftypefn {Statement} return YYACCEPT @code{;}
14883 Return immediately from the parser, indicating success.
14884 @xref{Java Parser Interface}.
14887 @deftypefn {Statement} {return} YYERROR @code{;}
14888 Start error recovery (without printing an error message).
14889 @xref{Error Recovery}.
14892 @deftypefn {Function} {boolean} recovering ()
14893 Return whether error recovery is being done. In this state, the parser
14894 reads token until it reaches a known state, and then restarts normal
14896 @xref{Error Recovery}.
14899 @deftypefn {Function} {void} yyerror (@code{String} @var{msg})
14900 @deftypefnx {Function} {void} yyerror (@code{Position} @var{loc}, @code{String} @var{msg})
14901 @deftypefnx {Function} {void} yyerror (@code{Location} @var{loc}, @code{String} @var{msg})
14902 Print an error message using the @code{yyerror} method of the scanner
14903 instance in use. The @code{Location} and @code{Position} parameters are
14904 available only if location tracking is active.
14907 @node Java Push Parser Interface
14908 @subsection Java Push Parser Interface
14909 @c - define push_parse
14910 @findex %define api.push-pull
14912 Normally, Bison generates a pull parser for Java.
14913 The following Bison declaration says that you want the parser to be a push
14914 parser (@pxref{%define Summary}):
14917 %define api.push-pull push
14920 Most of the discussion about the Java pull Parser Interface, (@pxref{Java
14921 Parser Interface}) applies to the push parser interface as well.
14923 When generating a push parser, the method @code{push_parse} is created with
14924 the following signature (depending on if locations are enabled).
14926 @deftypemethod {YYParser} {void} push_parse (@code{int} @var{token}, @code{Object} @var{yylval})
14927 @deftypemethodx {YYParser} {void} push_parse (@code{int} @var{token}, @code{Object} @var{yylval}, @code{Location} @var{yyloc})
14928 @deftypemethodx {YYParser} {void} push_parse (@code{int} @var{token}, @code{Object} @var{yylval}, @code{Position} @var{yypos})
14931 The primary difference with respect to a pull parser is that the parser
14932 method @code{push_parse} is invoked repeatedly to parse each token. This
14933 function is available if either the @samp{%define api.push-pull push} or
14934 @samp{%define api.push-pull both} declaration is used (@pxref{%define
14935 Summary}). The @code{Location} and @code{Position} parameters are available
14936 only if location tracking is active.
14938 The value returned by the @code{push_parse} method is one of the following:
14939 0 (success), 1 (abort), 2 (memory exhaustion), or @code{YYPUSH_MORE}. This
14940 new value, @code{YYPUSH_MORE}, may be returned if more input is required to
14941 finish parsing the grammar.
14943 If @code{api.push-pull} is defined as @code{both}, then the generated parser
14944 class will also implement the @code{parse} method. This method's body is a
14945 loop that repeatedly invokes the scanner and then passes the values obtained
14946 from the scanner to the @code{push_parse} method.
14948 There is one additional complication. Technically, the push parser does not
14949 need to know about the scanner (i.e. an object implementing the
14950 @code{YYParser.Lexer} interface), but it does need access to the
14951 @code{yyerror} method. Currently, the @code{yyerror} method is defined in
14952 the @code{YYParser.Lexer} interface. Hence, an implementation of that
14953 interface is still required in order to provide an implementation of
14954 @code{yyerror}. The current approach (and subject to change) is to require
14955 the @code{YYParser} constructor to be given an object implementing the
14956 @code{YYParser.Lexer} interface. This object need only implement the
14957 @code{yyerror} method; the other methods can be stubbed since they will
14958 never be invoked. The simplest way to do this is to add a trivial scanner
14959 implementation to your grammar file using whatever implementation of
14960 @code{yyerror} is desired. The following code sample shows a simple way to
14966 public Object getLVal () @{return null;@}
14967 public int yylex () @{return 0;@}
14968 public void yyerror (String s) @{System.err.println(s);@}
14972 @node Java Differences
14973 @subsection Differences between C/C++ and Java Grammars
14975 The different structure of the Java language forces several differences
14976 between C/C++ grammars, and grammars designed for Java parsers. This
14977 section summarizes these differences.
14981 Java has no a preprocessor, so obviously the @code{YYERROR},
14982 @code{YYACCEPT}, @code{YYABORT} symbols (@pxref{Table of Symbols}) cannot be
14983 macros. Instead, they should be preceded by @code{return} when they appear
14984 in an action. The actual definition of these symbols is opaque to the Bison
14985 grammar, and it might change in the future. The only meaningful operation
14986 that you can do, is to return them. @xref{Java Action Features}.
14988 Note that of these three symbols, only @code{YYACCEPT} and
14989 @code{YYABORT} will cause a return from the @code{yyparse}
14990 method@footnote{Java parsers include the actions in a separate
14991 method than @code{yyparse} in order to have an intuitive syntax that
14992 corresponds to these C macros.}.
14995 Java lacks unions, so @code{%union} has no effect. Instead, semantic
14996 values have a common base type: @code{Object} or as specified by
14997 @samp{%define api.value.type}. Angle brackets on @code{%token}, @code{type},
14998 @code{$@var{n}} and @code{$$} specify subtypes rather than fields of
14999 an union. The type of @code{$$}, even with angle brackets, is the base
15000 type since Java casts are not allow on the left-hand side of assignments.
15001 Also, @code{$@var{n}} and @code{@@@var{n}} are not allowed on the
15002 left-hand side of assignments. @xref{Java Semantic Values}, and
15003 @ref{Java Action Features}.
15006 The prologue declarations have a different meaning than in C/C++ code.
15008 @item @code{%code imports}
15009 blocks are placed at the beginning of the Java source code. They may
15010 include copyright notices. For a @code{package} declarations, use
15011 @samp{%define api.package} instead.
15013 @item unqualified @code{%code}
15014 blocks are placed inside the parser class.
15016 @item @code{%code lexer}
15017 blocks, if specified, should include the implementation of the
15018 scanner. If there is no such block, the scanner can be any class
15019 that implements the appropriate interface (@pxref{Java Scanner
15023 Other @code{%code} blocks are not supported in Java parsers.
15024 In particular, @code{%@{ @dots{} %@}} blocks should not be used
15025 and may give an error in future versions of Bison.
15027 The epilogue has the same meaning as in C/C++ code and it can
15028 be used to define other classes used by the parser @emph{outside}
15033 @node Java Declarations Summary
15034 @subsection Java Declarations Summary
15036 This summary only include declarations specific to Java or have special
15037 meaning when used in a Java parser.
15039 @deffn {Directive} {%language "Java"}
15040 Generate a Java class for the parser.
15043 @deffn {Directive} %lex-param @{@var{type} @var{name}@}
15044 A parameter for the lexer class defined by @code{%code lexer}
15045 @emph{only}, added as parameters to the lexer constructor and the parser
15046 constructor that @emph{creates} a lexer. Default is none.
15047 @xref{Java Scanner Interface}.
15050 @deffn {Directive} %parse-param @{@var{type} @var{name}@}
15051 A parameter for the parser class added as parameters to constructor(s)
15052 and as fields initialized by the constructor(s). Default is none.
15053 @xref{Java Parser Interface}.
15056 @deffn {Directive} %token <@var{type}> @var{token} @dots{}
15057 Declare tokens. Note that the angle brackets enclose a Java @emph{type}.
15058 @xref{Java Semantic Values}.
15061 @deffn {Directive} %nterm <@var{type}> @var{nonterminal} @dots{}
15062 Declare the type of nonterminals. Note that the angle brackets enclose
15063 a Java @emph{type}.
15064 @xref{Java Semantic Values}.
15067 @deffn {Directive} %code @{ @var{code} @dots{} @}
15068 Code appended to the inside of the parser class.
15069 @xref{Java Differences}.
15072 @deffn {Directive} {%code imports} @{ @var{code} @dots{} @}
15073 Code inserted just after the @code{package} declaration.
15074 @xref{Java Differences}.
15077 @deffn {Directive} {%code init} @{ @var{code} @dots{} @}
15078 Code inserted at the beginning of the parser constructor body.
15079 @xref{Java Parser Interface}.
15082 @deffn {Directive} {%code lexer} @{ @var{code} @dots{} @}
15083 Code added to the body of a inner lexer class within the parser class.
15084 @xref{Java Scanner Interface}.
15087 @deffn {Directive} %% @var{code} @dots{}
15088 Code (after the second @code{%%}) appended to the end of the file,
15089 @emph{outside} the parser class.
15090 @xref{Java Differences}.
15093 @deffn {Directive} %@{ @var{code} @dots{} %@}
15094 Not supported. Use @code{%code imports} instead.
15095 @xref{Java Differences}.
15098 @deffn {Directive} {%define api.prefix} @{@var{prefix}@}
15099 The prefix of the parser class name @code{@var{prefix}Parser} if
15100 @samp{%define api.parser.class} is not used. Default is @code{YY}.
15101 @xref{Java Bison Interface}.
15104 @deffn {Directive} {%define api.parser.abstract}
15105 Whether the parser class is declared @code{abstract}. Default is false.
15106 @xref{Java Bison Interface}.
15109 @deffn {Directive} {%define api.parser.annotations} @{@var{annotations}@}
15110 The Java annotations for the parser class. Default is none.
15111 @xref{Java Bison Interface}.
15114 @deffn {Directive} {%define api.parser.class} @{@var{name}@}
15115 The name of the parser class. Default is @code{YYParser} or
15116 @code{@var{api.prefix}Parser}. @xref{Java Bison Interface}.
15119 @deffn {Directive} {%define api.parser.extends} @{@var{superclass}@}
15120 The superclass of the parser class. Default is none.
15121 @xref{Java Bison Interface}.
15124 @deffn {Directive} {%define api.parser.final}
15125 Whether the parser class is declared @code{final}. Default is false.
15126 @xref{Java Bison Interface}.
15129 @deffn {Directive} {%define api.parser.implements} @{@var{interfaces}@}
15130 The implemented interfaces of the parser class, a comma-separated list.
15132 @xref{Java Bison Interface}.
15135 @deffn {Directive} {%define api.parser.public}
15136 Whether the parser class is declared @code{public}. Default is false.
15137 @xref{Java Bison Interface}.
15140 @deffn {Directive} {%define api.parser.strictfp}
15141 Whether the parser class is declared @code{strictfp}. Default is false.
15142 @xref{Java Bison Interface}.
15145 @deffn {Directive} {%define init_throws} @{@var{exceptions}@}
15146 The exceptions thrown by @code{%code init} from the parser class
15147 constructor. Default is none.
15148 @xref{Java Parser Interface}.
15151 @deffn {Directive} {%define lex_throws} @{@var{exceptions}@}
15152 The exceptions thrown by the @code{yylex} method of the lexer, a
15153 comma-separated list. Default is @code{java.io.IOException}.
15154 @xref{Java Scanner Interface}.
15157 @deffn {Directive} {%define api.location.type} @{@var{class}@}
15158 The name of the class used for locations (a range between two
15159 positions). This class is generated as an inner class of the parser
15160 class by @command{bison}. Default is @code{Location}.
15161 Formerly named @code{location_type}.
15162 @xref{Java Location Values}.
15165 @deffn {Directive} {%define api.package} @{@var{package}@}
15166 The package to put the parser class in. Default is none.
15167 @xref{Java Bison Interface}.
15168 Renamed from @code{package} in Bison 3.7.
15171 @deffn {Directive} {%define api.position.type} @{@var{class}@}
15172 The name of the class used for positions. This class must be supplied by
15173 the user. Default is @code{Position}.
15174 Formerly named @code{position_type}.
15175 @xref{Java Location Values}.
15178 @deffn {Directive} {%define api.value.type} @{@var{class}@}
15179 The base type of semantic values. Default is @code{Object}.
15180 @xref{Java Semantic Values}.
15183 @deffn {Directive} {%define throws} @{@var{exceptions}@}
15184 The exceptions thrown by user-supplied parser actions and
15185 @code{%initial-action}, a comma-separated list. Default is none.
15186 @xref{Java Parser Interface}.
15190 @c ================================================= History
15193 @chapter A Brief History of the Greater Ungulates
15198 * Yacc:: The original Yacc
15199 * yacchack:: An obscure early implementation of reentrancy
15200 * Byacc:: Berkeley Yacc
15201 * Bison:: This program
15202 * Other Ungulates:: Similar programs
15206 @section The ancestral Yacc
15208 Bison originated as a workalike of a program called Yacc --- Yet Another
15209 Compiler Compiler.@footnote{Because of the acronym, the name is sometimes
15210 given as ``YACC'', but Johnson used ``Yacc'' in the descriptive paper
15212 @url{https://s3.amazonaws.com/plan9-bell-labs/7thEdMan/v7vol2b.pdf, Version
15213 7 Unix Manual}.} Yacc was written at Bell Labs as part of the very early
15214 development of Unix; one of its first uses was to develop the original
15215 Portable C Compiler, pcc. The same person, Steven C. Johnson, wrote Yacc and
15218 According to the author
15219 @footnote{@url{https://lists.gnu.org/r/bison-patches/2019-02/msg00061.html}},
15220 Yacc was first invented in 1971 and reached a form recognizably similar to
15221 the C version in 1973. Johnson published @cite{A Portable Compiler: Theory
15222 and Practice} @pcite{Johnson 1978}.
15224 Yacc was not itself originally written in C but in its predecessor language,
15225 B. This goes far to explain its odd interface, which exposes a large number
15226 of global variables rather than bundling them into a C struct. All other
15227 Yacc-like programs are descended from the C port of Yacc.
15229 Yacc, through both its deployment in pcc and as a standalone tool for
15230 generating other parsers, helped drive the early spread of Unix. Yacc
15231 itself, however, passed out of use after around 1990 when workalikes
15232 with less restrictive licenses and more features became available.
15234 Original Yacc became generally available when Caldera released the sources
15235 of old versions of Unix up to V7 and 32V in 2002. By that time it had been
15236 long superseded in practical use by Bison even on Yacc's native Unix
15243 One of the deficiencies of original Yacc was its inability to produce
15244 reentrant parsers. This was first remedied by a set of drop-in
15245 modifications called ``yacchack'', published by Eric S. Raymond on USENET
15246 around 1983. This code was quickly forgotten when zoo and Berkeley Yacc
15247 became available a few years later.
15250 @section Berkeley Yacc
15253 Berkeley Yacc was originated in 1985 by Robert Corbett @pcite{Corbett
15254 1984}. It was originally named ``zoo'', but by October 1989 it became
15255 known as Berkeley Yacc or byacc.
15257 Berkeley Yacc had three advantages over the ancestral Yacc: it generated
15258 faster parsers, it could generate reentrant parsers, and the source code was
15259 released to the public domain rather than being under an AT&T proprietary
15260 license. The better performance came from implementing techniques from
15261 DeRemer and Penello's seminal paper on LALR parsing @pcite{DeRemer 1982}.
15263 Use of byacc spread rapidly due to its public domain license. However, once
15264 Bison became available, byacc itself passed out of general use.
15270 Robert Corbett actually wrote two (closely related) LALR parsers in 1985,
15271 both using the DeRemer/Penello techniques. One was ``zoo'', the other was
15272 ``Byson''. In 1987 Richard Stallman began working on Byson; the name changed
15273 to Bison and the interface became Yacc-compatible.
15275 The main visible difference between Yacc and Byson/Bison at the time of
15276 Byson's first release is that Byson supported the @code{@@@var{n}} construct
15277 (giving access to the starting and ending line number and character number
15278 associated with any of the symbols in the current rule).
15280 There was also the command @samp{%expect @var{n}} which said not to mention the
15281 conflicts if there are @var{n} shift/reduce conflicts and no reduce/reduce
15282 conflicts. In more recent versions of Bison, @code{%expect} and its
15283 @code{%expect-rr} variant for reduce/reduce conflicts can be applied to
15286 Later versions of Bison added many more new features.
15288 Bison error reporting has been improved in various ways. Notably. ancestral
15289 Yacc and Byson did not have carets in error messages.
15291 Compared to Yacc Bison uses a faster but less space-efficient encoding for
15292 the parse tables @pcite{Corbett 1984}, and more modern techniques for
15293 generating the lookahead sets @pcite{DeRemer 1982}. This approach is the
15294 standard one since then.
15296 (It has also been plausibly alleged the differences in the algorithms stem
15297 mainly from the horrible kludges that Johnson had to perpetrate to make
15298 the original Yacc fit in a PDP-11.)
15300 Named references, semantic predicates, @code{%locations},
15301 @code{%glr-parser}, @code{%printer}, %destructor, dumps to DOT,
15302 @code{%parse-param}, @code{%lex-param}, and dumps to XSLT, LAC, and IELR(1)
15303 generation are new in Bison.
15305 Bison also has many features to support C++ that were not present in the
15306 ancestral Yacc or Byson.
15308 Bison obsolesced all previous Yacc variants and workalikes generating C by
15311 @node Other Ungulates
15312 @section Other Ungulates
15314 The Yacc concept has frequently been ported to other languages. Some of the
15315 early ports are extinct along with the languages that hosted them; others
15316 have been superseded by parser skeletons shipped with Bison.
15318 However, independent implementations persist. One of the best-known
15319 still in use is David Beazley's ``PLY'' (Python Lex-Yacc) for
15320 Python. Another is goyacc, supporting the Go language. An ``ocamlyacc''
15321 is shipped as part of the Ocaml compiler suite.
15323 @c ================================================= Version Compatibility
15326 @chapter Bison Version Compatibility: Best Practices
15328 @cindex compatibility
15330 Bison provides a Yacc compatibility mode in which it strives to conform with
15331 the POSIX standard. Grammar files which are written to the POSIX standard, and
15332 do not take advantage of any of the special capabilities of Bison, should
15333 work with many versions of Bison without modification.
15335 All other features of Bison are particular to Bison, and are changing. Bison
15336 is actively maintained and continuously evolving. It should come as no
15337 surprise that an older version of Bison will not accept Bison source code which
15338 uses newer features that do no not exist at all in the older Bison.
15339 Regrettably, in spite of reasonable effort to maintain compatibility, the
15340 reverse situation may also occur: it may happen that code developed using an
15341 older version of Bison does not build with a newer version of Bison without
15344 Because Bison is a code generation tool, it is possible to retain its output
15345 and distribute that to the users of the program. The users are then not
15346 required to have Bison installed at all, only an implementation of the
15347 programming language, such as C, which is required for processing the generated
15350 It is the output of Bison that is intended to be of the utmost portability.
15351 So, that is to say, whereas the Bison grammar source code may have a dependency
15352 on specific versions of Bison, the generated parser from any version of Bison
15353 should work with with a large number of implementations of C, or whatever
15354 language is applicable.
15356 The recommended best practice for using Bison (in the context of software that
15357 is distributed in source code form) is to ship the generated parser to the
15358 downstream users. Only those downstream users who engage in active development
15359 of the program who need to make changes to the grammar file need to have Bison
15360 installed at all, and those users can install the specific version of Bison
15363 Following this recommended practice also makes it possible to use a more recent
15364 Bison than what is available to users through operating system distributions,
15365 thereby taking advantage of the latest techniques that Bison allows.
15367 Some features of Bison have been, or are being adopted into other Yacc-like
15368 programs. Therefore it might seem that is a good idea to write grammar code
15369 which targets multiple implementations, similarly to the way C programs are
15370 often written to target multiple compilers and language versions. Other than
15371 the Yacc subset described by POSIX, the Bison language is not rigorously
15372 standardized. When a Bison feature is adopted by another parser generator, it
15373 may be initially compatible with that version of Bison on which it was based,
15374 but the compatibility may degrade going forward. Developers who strive to make
15375 their Bison code simultaneously compatible with other parser generators are
15376 encouraged to nevertheless use specific versions of all generators, and still
15377 follow the recommended practice of shipping generated output. For example,
15378 a project can internally maintain compatibility with multiple generators,
15379 and choose the output of a particular one to ship to the users. Or else,
15380 the project could ship all of the outputs, arranging for a way for the user
15381 to specify which one is used to build the program.
15383 @c ================================================= FAQ
15386 @chapter Frequently Asked Questions
15387 @cindex frequently asked questions
15390 Several questions about Bison come up occasionally. Here some of them
15394 * Memory Exhausted:: Breaking the Stack Limits
15395 * How Can I Reset the Parser:: @code{yyparse} Keeps some State
15396 * Strings are Destroyed:: @code{yylval} Loses Track of Strings
15397 * Implementing Gotos/Loops:: Control Flow in the Calculator
15398 * Multiple start-symbols:: Factoring closely related grammars
15399 * Secure? Conform?:: Is Bison POSIX safe?
15400 * Enabling Relocatability:: Moving Bison/using it through network shares
15401 * I can't build Bison:: Troubleshooting
15402 * Where can I find help?:: Troubleshouting
15403 * Bug Reports:: Troublereporting
15404 * More Languages:: Parsers in C++, Java, and so on
15405 * Beta Testing:: Experimenting development versions
15406 * Mailing Lists:: Meeting other Bison users
15409 @node Memory Exhausted
15410 @section Memory Exhausted
15413 My parser returns with error with a @samp{memory exhausted}
15414 message. What can I do?
15417 This question is already addressed elsewhere, see @ref{Recursion}.
15419 @node How Can I Reset the Parser
15420 @section How Can I Reset the Parser
15422 The following phenomenon has several symptoms, resulting in the
15423 following typical questions:
15426 I invoke @code{yyparse} several times, and on correct input it works
15427 properly; but when a parse error is found, all the other calls fail
15428 too. How can I reset the error flag of @code{yyparse}?
15435 My parser includes support for an @samp{#include}-like feature, in which
15436 case I run @code{yyparse} from @code{yyparse}. This fails although I did
15437 specify @samp{%define api.pure full}.
15440 These problems typically come not from Bison itself, but from
15441 Lex-generated scanners. Because these scanners use large buffers for
15442 speed, they might not notice a change of input file. As a
15443 demonstration, consider the following source file,
15444 @file{first-line.l}:
15450 #include <stdlib.h>
15454 .*\n ECHO; return 1;
15458 yyparse (char const *file)
15460 yyin = fopen (file, "r");
15464 exit (EXIT_FAILURE);
15468 /* One token only. */
15470 if (fclose (yyin) != 0)
15473 exit (EXIT_FAILURE);
15491 If the file @file{input} contains
15499 then instead of getting the first line twice, you get:
15502 $ @kbd{flex -ofirst-line.c first-line.l}
15503 $ @kbd{gcc -ofirst-line first-line.c -ll}
15504 $ @kbd{./first-line}
15509 Therefore, whenever you change @code{yyin}, you must tell the
15510 Lex-generated scanner to discard its current buffer and switch to the
15511 new one. This depends upon your implementation of Lex; see its
15512 documentation for more. For Flex, it suffices to call
15513 @samp{YY_FLUSH_BUFFER} after each change to @code{yyin}. If your
15514 Flex-generated scanner needs to read from several input streams to
15515 handle features like include files, you might consider using Flex
15516 functions like @samp{yy_switch_to_buffer} that manipulate multiple
15519 If your Flex-generated scanner uses start conditions (@pxref{Start
15520 conditions, , Start conditions, flex, The Flex Manual}), you might
15521 also want to reset the scanner's state, i.e., go back to the initial
15522 start condition, through a call to @samp{BEGIN (0)}.
15524 @node Strings are Destroyed
15525 @section Strings are Destroyed
15528 My parser seems to destroy old strings, or maybe it loses track of
15529 them. Instead of reporting @samp{"foo", "bar"}, it reports
15530 @samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
15533 This error is probably the single most frequent ``bug report'' sent to
15534 Bison lists, but is only concerned with a misunderstanding of the role
15535 of the scanner. Consider the following Lex code:
15541 char *yylval = NULL;
15546 .* yylval = yytext; return 1;
15554 /* Similar to using $1, $2 in a Bison action. */
15555 char *fst = (yylex (), yylval);
15556 char *snd = (yylex (), yylval);
15557 printf ("\"%s\", \"%s\"\n", fst, snd);
15563 If you compile and run this code, you get:
15566 $ @kbd{flex -osplit-lines.c split-lines.l}
15567 $ @kbd{gcc -osplit-lines split-lines.c -ll}
15568 $ @kbd{printf 'one\ntwo\n' | ./split-lines}
15574 this is because @code{yytext} is a buffer provided for @emph{reading}
15575 in the action, but if you want to keep it, you have to duplicate it
15576 (e.g., using @code{strdup}). Note that the output may depend on how
15577 your implementation of Lex handles @code{yytext}. For instance, when
15578 given the Lex compatibility option @option{-l} (which triggers the
15579 option @samp{%array}) Flex generates a different behavior:
15582 $ @kbd{flex -l -osplit-lines.c split-lines.l}
15583 $ @kbd{gcc -osplit-lines split-lines.c -ll}
15584 $ @kbd{printf 'one\ntwo\n' | ./split-lines}
15589 @node Implementing Gotos/Loops
15590 @section Implementing Gotos/Loops
15593 My simple calculator supports variables, assignments, and functions,
15594 but how can I implement gotos, or loops?
15597 Although very pedagogical, the examples included in the document blur
15598 the distinction to make between the parser---whose job is to recover
15599 the structure of a text and to transmit it to subsequent modules of
15600 the program---and the processing (such as the execution) of this
15601 structure. This works well with so called straight line programs,
15602 i.e., precisely those that have a straightforward execution model:
15603 execute simple instructions one after the others.
15605 @cindex abstract syntax tree
15607 If you want a richer model, you will probably need to use the parser
15608 to construct a tree that does represent the structure it has
15609 recovered; this tree is usually called the @dfn{abstract syntax tree},
15610 or @dfn{AST} for short. Then, walking through this tree,
15611 traversing it in various ways, will enable treatments such as its
15612 execution or its translation, which will result in an interpreter or a
15615 This topic is way beyond the scope of this manual, and the reader is
15616 invited to consult the dedicated literature.
15619 @node Multiple start-symbols
15620 @section Multiple start-symbols
15623 I have several closely related grammars, and I would like to share their
15624 implementations. In fact, I could use a single grammar but with multiple
15628 Bison does not support multiple start-symbols, but there is a very simple
15629 means to simulate them. If @code{foo} and @code{bar} are the two pseudo
15630 start-symbols, then introduce two new tokens, say @code{START_FOO} and
15631 @code{START_BAR}, and use them as switches from the real start-symbol:
15634 %token START_FOO START_BAR;
15641 These tokens prevent the introduction of new conflicts. As far as the
15642 parser goes, that is all that is needed.
15644 Now the difficult part is ensuring that the scanner will send these tokens
15645 first. If your scanner is hand-written, that should be straightforward. If
15646 your scanner is generated by Lex, them there is simple means to do it:
15647 recall that anything between @samp{%@{ ... %@}} after the first @code{%%} is
15648 copied verbatim in the top of the generated @code{yylex} function. Make
15649 sure a variable @code{start_token} is available in the scanner (e.g., a
15650 global variable or using @code{%lex-param} etc.), and use the following:
15653 /* @r{Prologue.} */
15658 int t = start_token;
15663 /* @r{The rules.} */
15667 @node Secure? Conform?
15668 @section Secure? Conform?
15671 Is Bison secure? Does it conform to POSIX?
15674 If you're looking for a guarantee or certification, we don't provide it.
15675 However, Bison is intended to be a reliable program that conforms to the
15676 POSIX specification for Yacc. If you run into problems, please send us a
15679 @include relocatable.texi
15681 @node I can't build Bison
15682 @section I can't build Bison
15685 I can't build Bison because @command{make} complains that
15686 @code{msgfmt} is not found.
15690 Like most GNU packages with internationalization support, that feature
15691 is turned on by default. If you have problems building in the @file{po}
15692 subdirectory, it indicates that your system's internationalization
15693 support is lacking. You can re-configure Bison with
15694 @option{--disable-nls} to turn off this support, or you can install GNU
15695 gettext from @url{https://ftp.gnu.org/gnu/gettext/} and re-configure
15696 Bison. See the file @file{ABOUT-NLS} for more information.
15699 I can't build Bison because my C compiler is too old.
15702 Except for GLR parsers (which require C99), the C code that Bison generates
15703 requires only C89 or later. However, Bison itself requires common C99
15704 features such as declarations after statements. Bison's @code{configure}
15705 script attempts to enable C99 (or later) support on compilers that default
15706 to pre-C99. If your compiler lacks these C99 features entirely, GCC may
15707 well be a better choice; or you can try upgrading to your compiler's latest
15710 @node Where can I find help?
15711 @section Where can I find help?
15714 I'm having trouble using Bison. Where can I find help?
15717 First, read this fine manual. Beyond that, you can send mail to
15718 @email{help-bison@@gnu.org}. This mailing list is intended to be
15719 populated with people who are willing to answer questions about using
15720 and installing Bison. Please keep in mind that (most of) the people on
15721 the list have aspects of their lives which are not related to Bison (!),
15722 so you may not receive an answer to your question right away. This can
15723 be frustrating, but please try not to honk them off; remember that any
15724 help they provide is purely voluntary and out of the kindness of their
15728 @section Bug Reports
15731 I found a bug. What should I include in the bug report?
15734 Before sending a bug report, make sure you are using the latest
15735 version. Check @url{https://ftp.gnu.org/pub/gnu/bison/} or one of its
15736 mirrors. Be sure to include the version number in your bug report. If
15737 the bug is present in the latest version but not in a previous version,
15738 try to determine the most recent version which did not contain the bug.
15740 If the bug is parser-related, you should include the smallest grammar
15741 you can which demonstrates the bug. The grammar file should also be
15742 complete (i.e., I should be able to run it through Bison without having
15743 to edit or add anything). The smaller and simpler the grammar, the
15744 easier it will be to fix the bug.
15746 Include information about your compilation environment, including your
15747 operating system's name and version and your compiler's name and
15748 version. If you have trouble compiling, you should also include a
15749 transcript of the build session, starting with the invocation of
15750 @code{configure}. Depending on the nature of the bug, you may be asked to
15751 send additional files as well (such as @file{config.h} or @file{config.cache}).
15753 Patches are most welcome, but not required. That is, do not hesitate to
15754 send a bug report just because you cannot provide a fix.
15756 Send bug reports to @email{bug-bison@@gnu.org}.
15758 @node More Languages
15759 @section More Languages
15762 Will Bison ever have C++ and Java support? How about @var{insert your
15763 favorite language here}?
15766 C++, D and Java are supported. We'd love to add other languages;
15767 contributions are welcome.
15770 @section Beta Testing
15773 What is involved in being a beta tester?
15776 It's not terribly involved. Basically, you would download a test
15777 release, compile it, and use it to build and run a parser or two. After
15778 that, you would submit either a bug report or a message saying that
15779 everything is okay. It is important to report successes as well as
15780 failures because test releases eventually become mainstream releases,
15781 but only if they are adequately tested. If no one tests, development is
15782 essentially halted.
15784 Beta testers are particularly needed for operating systems to which the
15785 developers do not have easy access. They currently have easy access to
15786 recent GNU/Linux and Solaris versions. Reports about other operating
15787 systems are especially welcome.
15789 @node Mailing Lists
15790 @section Mailing Lists
15793 How do I join the help-bison and bug-bison mailing lists?
15796 See @url{https://lists.gnu.org/}.
15798 @c ================================================= Table of Symbols
15800 @node Table of Symbols
15801 @appendix Bison Symbols
15802 @cindex Bison symbols, table of
15803 @cindex symbols in Bison, table of
15805 @deffn {Variable} @@$
15806 In an action, the location of the left-hand side of the rule.
15807 @xref{Tracking Locations}.
15810 @deffn {Variable} @@@var{n}
15811 @deffnx {Symbol} @@@var{n}
15812 In an action, the location of the @var{n}-th symbol of the right-hand side
15813 of the rule. @xref{Tracking Locations}.
15815 In a grammar, the Bison-generated nonterminal symbol for a midrule action
15816 with a semantic value. @xref{Midrule Action Translation}.
15819 @deffn {Variable} @@@var{name}
15820 @deffnx {Variable} @@[@var{name}]
15821 In an action, the location of a symbol addressed by @var{name}.
15822 @xref{Tracking Locations}.
15825 @deffn {Symbol} $@@@var{n}
15826 In a grammar, the Bison-generated nonterminal symbol for a midrule action
15827 with no semantics value. @xref{Midrule Action Translation}.
15830 @deffn {Variable} $$
15831 In an action, the semantic value of the left-hand side of the rule.
15835 @deffn {Variable} $@var{n}
15836 In an action, the semantic value of the @var{n}-th symbol of the
15837 right-hand side of the rule. @xref{Actions}.
15840 @deffn {Variable} $@var{name}
15841 @deffnx {Variable} $[@var{name}]
15842 In an action, the semantic value of a symbol addressed by @var{name}.
15846 @deffn {Delimiter} %%
15847 Delimiter used to separate the grammar rule section from the
15848 Bison declarations section or the epilogue.
15849 @xref{Grammar Layout}.
15852 @c Don't insert spaces, or check the DVI output.
15853 @deffn {Delimiter} %@{@var{code}%@}
15854 All code listed between @samp{%@{} and @samp{%@}} is copied verbatim
15855 to the parser implementation file. Such code forms the prologue of
15856 the grammar file. @xref{Grammar Outline}.
15859 @deffn {Directive} %?@{@var{expression}@}
15860 Predicate actions. This is a type of action clause that may appear in
15861 rules. The expression is evaluated, and if false, causes a syntax error. In
15862 GLR parsers during nondeterministic operation,
15863 this silently causes an alternative parse to die. During deterministic
15864 operation, it is the same as the effect of YYERROR.
15865 @xref{Semantic Predicates}.
15868 @deffn {Construct} /* @dots{} */
15869 @deffnx {Construct} // @dots{}
15870 Comments, as in C/C++.
15873 @deffn {Delimiter} :
15874 Separates a rule's result from its components. @xref{Rules}.
15877 @deffn {Delimiter} ;
15878 Terminates a rule. @xref{Rules}.
15881 @deffn {Delimiter} |
15882 Separates alternate rules for the same result nonterminal.
15886 @deffn {Directive} <*>
15887 Used to define a default tagged @code{%destructor} or default tagged
15890 @xref{Destructor Decl}.
15893 @deffn {Directive} <>
15894 Used to define a default tagless @code{%destructor} or default tagless
15897 @xref{Destructor Decl}.
15900 @deffn {Symbol} $accept
15901 The predefined nonterminal whose only rule is @samp{$accept: @var{start}
15902 $end}, where @var{start} is the start symbol. @xref{Start Decl}. It cannot
15903 be used in the grammar.
15906 @deffn {Directive} %code @{@var{code}@}
15907 @deffnx {Directive} %code @var{qualifier} @{@var{code}@}
15908 Insert @var{code} verbatim into the output parser source at the
15909 default location or at the location specified by @var{qualifier}.
15910 @xref{%code Summary}.
15913 @deffn {Directive} %debug
15914 Equip the parser for debugging. @xref{Decl Summary}.
15918 @deffn {Directive} %default-prec
15919 Assign a precedence to rules that lack an explicit @samp{%prec}
15920 modifier. @xref{Contextual Precedence}.
15924 @deffn {Directive} %define @var{variable}
15925 @deffnx {Directive} %define @var{variable} @var{value}
15926 @deffnx {Directive} %define @var{variable} @{@var{value}@}
15927 @deffnx {Directive} %define @var{variable} "@var{value}"
15928 Define a variable to adjust Bison's behavior. @xref{%define Summary}.
15931 @deffn {Directive} %defines
15932 @deffnx {Directive} %defines @var{defines-file}
15933 Historical name for @code{%header}.
15934 @xref{Decl Summary}.
15937 @deffn {Directive} %destructor
15938 Specify how the parser should reclaim the memory associated to
15939 discarded symbols. @xref{Destructor Decl}.
15942 @deffn {Directive} %dprec
15943 Bison declaration to assign a precedence to a rule that is used at parse
15944 time to resolve reduce/reduce conflicts. @xref{GLR Parsers}.
15947 @deffn {Directive} %empty
15948 Bison declaration to declare make explicit that a rule has an empty
15949 right-hand side. @xref{Empty Rules}.
15952 @deffn {Symbol} $end
15953 The predefined token marking the end of the token stream. It cannot be
15954 used in the grammar.
15957 @deffn {Symbol} error
15958 A token name reserved for error recovery. This token may be used in
15959 grammar rules so as to allow the Bison parser to recognize an error in
15960 the grammar without halting the process. In effect, a sentence
15961 containing an error may be recognized as valid. On a syntax error, the
15962 token @code{error} becomes the current lookahead token. Actions
15963 corresponding to @code{error} are then executed, and the lookahead
15964 token is reset to the token that originally caused the violation.
15965 @xref{Error Recovery}.
15968 @deffn {Directive} %error-verbose
15969 An obsolete directive standing for @samp{%define parse.error verbose}.
15972 @deffn {Directive} %file-prefix "@var{prefix}"
15973 Bison declaration to set the prefix of the output files. @xref{Decl
15977 @deffn {Directive} %glr-parser
15978 Bison declaration to produce a GLR parser. @xref{GLR
15982 @deffn {Directive} %header
15983 Bison declaration to create a parser header file, which is usually
15984 meant for the scanner. @xref{Decl Summary}.
15987 @deffn {Directive} %header @var{header-file}
15988 Same as above, but save in the file @var{header-file}.
15989 @xref{Decl Summary}.
15992 @deffn {Directive} %initial-action
15993 Run user code before parsing. @xref{Initial Action Decl}.
15996 @deffn {Directive} %language
15997 Specify the programming language for the generated parser.
15998 @xref{Decl Summary}.
16001 @deffn {Directive} %left
16002 Bison declaration to assign precedence and left associativity to token(s).
16003 @xref{Precedence Decl}.
16006 @deffn {Directive} %lex-param @{@var{argument-declaration}@} @dots{}
16007 Bison declaration to specifying additional arguments that
16008 @code{yylex} should accept. @xref{Pure Calling}.
16011 @deffn {Directive} %merge
16012 Bison declaration to assign a merging function to a rule. If there is a
16013 reduce/reduce conflict with a rule having the same merging function, the
16014 function is applied to the two semantic values to get a single result.
16015 @xref{GLR Parsers}.
16018 @deffn {Directive} %name-prefix "@var{prefix}"
16019 Obsoleted by the @code{%define} variable @code{api.prefix} (@pxref{Multiple
16022 Rename the external symbols (variables and functions) used in the parser so
16023 that they start with @var{prefix} instead of @samp{yy}. Contrary to
16024 @code{api.prefix}, do no rename types and macros.
16026 The precise list of symbols renamed in C parsers is @code{yyparse},
16027 @code{yylex}, @code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yychar},
16028 @code{yydebug}, and (if locations are used) @code{yylloc}. If you use a
16029 push parser, @code{yypush_parse}, @code{yypull_parse}, @code{yypstate},
16030 @code{yypstate_new} and @code{yypstate_delete} will also be renamed. For
16031 example, if you use @samp{%name-prefix "c_"}, the names become
16032 @code{c_parse}, @code{c_lex}, and so on. For C++ parsers, see the
16033 @code{%define api.namespace} documentation in this section.
16038 @deffn {Directive} %no-default-prec
16039 Do not assign a precedence to rules that lack an explicit @samp{%prec}
16040 modifier. @xref{Contextual Precedence}.
16044 @deffn {Directive} %no-lines
16045 Bison declaration to avoid generating @code{#line} directives in the
16046 parser implementation file. @xref{Decl Summary}.
16049 @deffn {Directive} %nonassoc
16050 Bison declaration to assign precedence and nonassociativity to token(s).
16051 @xref{Precedence Decl}.
16054 @deffn {Directive} %nterm
16055 Bison declaration to declare nonterminals. @xref{Type Decl}.
16058 @deffn {Directive} %output "@var{file}"
16059 Bison declaration to set the name of the parser implementation file.
16060 @xref{Decl Summary}.
16063 @deffn {Directive} %param @{@var{argument-declaration}@} @dots{}
16064 Bison declaration to specify additional arguments that both
16065 @code{yylex} and @code{yyparse} should accept. @xref{Parser Function}.
16068 @deffn {Directive} %parse-param @{@var{argument-declaration}@} @dots{}
16069 Bison declaration to specify additional arguments that @code{yyparse}
16070 should accept. @xref{Parser Function}.
16073 @deffn {Directive} %prec
16074 Bison declaration to assign a precedence to a specific rule.
16075 @xref{Contextual Precedence}.
16078 @deffn {Directive} %precedence
16079 Bison declaration to assign precedence to token(s), but no associativity
16080 @xref{Precedence Decl}.
16083 @deffn {Directive} %pure-parser
16084 Deprecated version of @samp{%define api.pure} (@pxref{%define
16085 Summary}), for which Bison is more careful to warn about
16086 unreasonable usage.
16089 @deffn {Directive} %require "@var{version}"
16090 Require version @var{version} or higher of Bison. @xref{Require Decl}.
16093 @deffn {Directive} %right
16094 Bison declaration to assign precedence and right associativity to token(s).
16095 @xref{Precedence Decl}.
16098 @deffn {Directive} %skeleton
16099 Specify the skeleton to use; usually for development.
16100 @xref{Decl Summary}.
16103 @deffn {Directive} %start
16104 Bison declaration to specify the start symbol. @xref{Start Decl}.
16107 @deffn {Directive} %token
16108 Bison declaration to declare token(s) without specifying precedence.
16112 @deffn {Directive} %token-table
16113 Bison declaration to include a token name table in the parser implementation
16114 file. @xref{Decl Summary}.
16117 @deffn {Directive} %type
16118 Bison declaration to declare symbol value types. @xref{Type Decl}.
16121 @deffn {Symbol} $undefined
16122 The predefined token onto which all undefined values returned by
16123 @code{yylex} are mapped. It cannot be used in the grammar, rather, use
16127 @deffn {Directive} %union
16128 Bison declaration to specify several possible data types for semantic
16129 values. @xref{Union Decl}.
16132 @deffn {Macro} YYABORT
16133 Macro to pretend that an unrecoverable syntax error has occurred, by making
16134 @code{yyparse} return 1 immediately. The error reporting function
16135 @code{yyerror} is not called. @xref{Parser Function}.
16137 For Java parsers, this functionality is invoked using @code{return YYABORT;}
16141 @deffn {Macro} YYACCEPT
16142 Macro to pretend that a complete utterance of the language has been
16143 read, by making @code{yyparse} return 0 immediately.
16144 @xref{Parser Function}.
16146 For Java parsers, this functionality is invoked using @code{return YYACCEPT;}
16150 @deffn {Macro} YYBACKUP
16151 Macro to discard a value from the parser stack and fake a lookahead
16152 token. @xref{Action Features}.
16155 @deffn {Macro} YYBISON
16156 The version of Bison as an integer, for instance 30704 for version 3.7.4.
16157 Defined in @file{yacc.c} only. Before version 3.7.4, @code{YYBISON} was
16161 @deffn {Variable} yychar
16162 External integer variable that contains the integer value of the
16163 lookahead token. (In a pure parser, it is a local variable within
16164 @code{yyparse}.) Error-recovery rule actions may examine this variable.
16165 @xref{Action Features}.
16168 @deffn {Variable} yyclearin
16169 Macro used in error-recovery rule actions. It clears the previous
16170 lookahead token. @xref{Error Recovery}.
16173 @deffn {Macro} YYDEBUG
16174 Macro to define to equip the parser with tracing code. @xref{Tracing}.
16177 @deffn {Variable} yydebug
16178 External integer variable set to zero by default. If @code{yydebug}
16179 is given a nonzero value, the parser will output information on input
16180 symbols and parser action. @xref{Tracing}.
16183 @deffn {Value} YYEMPTY
16184 The pseudo token kind when there is no lookahead token.
16187 @deffn {Value} YYEOF
16188 The token kind denoting is the end of the input stream.
16191 @deffn {Macro} yyerrok
16192 Macro to cause parser to recover immediately to its normal mode
16193 after a syntax error. @xref{Error Recovery}.
16196 @deffn {Macro} YYERROR
16197 Cause an immediate syntax error. This statement initiates error
16198 recovery just as if the parser itself had detected an error; however, it
16199 does not call @code{yyerror}, and does not print any message. If you
16200 want to print an error message, call @code{yyerror} explicitly before
16201 the @samp{YYERROR;} statement. @xref{Error Recovery}.
16203 For Java parsers, this functionality is invoked using @code{return YYERROR;}
16207 @deffn {Function} yyerror
16208 User-supplied function to be called by @code{yyparse} on error.
16209 @xref{Error Reporting Function}.
16212 @deffn {Macro} YYFPRINTF
16213 Macro used to output run-time traces in C.
16214 @xref{Enabling Traces}.
16217 @deffn {Macro} YYINITDEPTH
16218 Macro for specifying the initial size of the parser stack.
16219 @xref{Memory Management}.
16222 @deffn {Function} yylex
16223 User-supplied lexical analyzer function, called with no arguments to get
16224 the next token. @xref{Lexical}.
16227 @deffn {Variable} yylloc
16228 External variable in which @code{yylex} should place the line and column
16229 numbers associated with a token. (In a pure parser, it is a local
16230 variable within @code{yyparse}, and its address is passed to
16232 You can ignore this variable if you don't use the @samp{@@} feature in the
16234 @xref{Token Locations}.
16235 In semantic actions, it stores the location of the lookahead token.
16236 @xref{Actions and Locations}.
16239 @deffn {Type} YYLTYPE
16240 Data type of @code{yylloc}. By default in C, a structure with four members
16241 (start/end line/column). @xref{Location Type}.
16244 @deffn {Variable} yylval
16245 External variable in which @code{yylex} should place the semantic
16246 value associated with a token. (In a pure parser, it is a local
16247 variable within @code{yyparse}, and its address is passed to
16249 @xref{Token Values}.
16250 In semantic actions, it stores the semantic value of the lookahead token.
16254 @deffn {Macro} YYMAXDEPTH
16255 Macro for specifying the maximum size of the parser stack. @xref{Memory
16259 @deffn {Variable} yynerrs
16260 Global variable which Bison increments each time it reports a syntax error.
16261 (In a pure parser, it is a local variable within @code{yyparse}. In a
16262 pure push parser, it is a member of @code{yypstate}.)
16263 @xref{Error Reporting Function}.
16266 @deffn {Macro} YYNOMEM
16267 Macro to pretend that memory is exhausted, by making @code{yyparse} return 2
16268 immediately. The error reporting function @code{yyerror} is called.
16269 @xref{Parser Function}.
16272 @deffn {Function} yyparse
16273 The parser function produced by Bison; call this function to start
16274 parsing. @xref{Parser Function}.
16277 @deffn {Function} yypstate_delete
16278 The function to delete a parser instance, produced by Bison in push mode;
16279 call this function to delete the memory associated with a parser.
16280 @xref{yypstate_delete,,@code{yypstate_delete}}. Does nothing when called
16281 with a null pointer.
16284 @deffn {Function} yypstate_new
16285 The function to create a parser instance, produced by Bison in push mode;
16286 call this function to create a new parser.
16287 @xref{yypstate_new,,@code{yypstate_new}}.
16290 @deffn {Function} yypull_parse
16291 The parser function produced by Bison in push mode; call this function to
16292 parse the rest of the input stream.
16293 @xref{yypull_parse,,@code{yypull_parse}}.
16296 @deffn {Function} yypush_parse
16297 The parser function produced by Bison in push mode; call this function to
16298 parse a single token.
16299 @xref{yypush_parse,,@code{yypush_parse}}.
16302 @deffn {Macro} YYRECOVERING
16303 The expression @code{YYRECOVERING ()} yields 1 when the parser
16304 is recovering from a syntax error, and 0 otherwise.
16305 @xref{Action Features}.
16308 @deffn {Macro} YYSTACK_USE_ALLOCA
16309 Macro used to control the use of @code{alloca} when the
16310 deterministic parser in C needs to extend its stacks. If defined to 0,
16311 the parser will use @code{malloc} to extend its stacks and memory exhaustion
16312 occurs if @code{malloc} fails (@pxref{Memory Management}). If defined to
16313 1, the parser will use @code{alloca}. Values other than 0 and 1 are
16314 reserved for future Bison extensions. If not defined,
16315 @code{YYSTACK_USE_ALLOCA} defaults to 0.
16317 In the all-too-common case where your code may run on a host with a
16318 limited stack and with unreliable stack-overflow checking, you should
16319 set @code{YYMAXDEPTH} to a value that cannot possibly result in
16320 unchecked stack overflow on any of your target hosts when
16321 @code{alloca} is called. You can inspect the code that Bison
16322 generates in order to determine the proper numeric values. This will
16323 require some expertise in low-level implementation details.
16326 @deffn {Type} YYSTYPE
16327 In C, data type of semantic values; @code{int} by default.
16328 Deprecated in favor of the @code{%define} variable @code{api.value.type}.
16332 @deffn {Type} yysymbol_kind_t
16333 An enum of all the symbols, tokens and nonterminals, of the grammar.
16334 @xref{Syntax Error Reporting Function}. The symbol kinds are used
16335 internally by the parser, and should not be confused with the token kinds:
16336 the symbol kind of a terminal symbol is not equal to its token kind! (Unless
16337 @samp{%define api.token.raw} was used.)
16340 @deffn {Type} yytoken_kind_t
16341 An enum of all the @dfn{token kinds} declared with @code{%token}
16342 (@pxref{Token Decl}). These are the return values for @code{yylex}. They
16343 should not be confused with the @emph{symbol kinds}, used internally by the
16347 @deffn {Value} YYUNDEF
16348 The token kind denoting an unknown token.
16357 @item Accepting state
16358 A state whose only action is the accept action.
16359 The accepting state is thus a consistent state.
16360 @xref{Understanding}.
16362 @item Backus-Naur Form (BNF; also called ``Backus Normal Form'')
16363 Formal method of specifying context-free grammars originally proposed
16364 by John Backus, and slightly improved by Peter Naur in his 1960-01-02
16365 committee document contributing to what became the Algol 60 report.
16366 @xref{Language and Grammar}.
16368 @item Consistent state
16369 A state containing only one possible action. @xref{Default Reductions}.
16371 @item Context-free grammars
16372 Grammars specified as rules that can be applied regardless of context.
16373 Thus, if there is a rule which says that an integer can be used as an
16374 expression, integers are allowed @emph{anywhere} an expression is
16375 permitted. @xref{Language and Grammar}.
16377 @item Counterexample
16378 A sequence of tokens and/or nonterminals, with one dot, that demonstrates a
16379 conflict. The dot marks the place where the conflict occurs.
16381 @cindex unifying counterexample
16382 @cindex counterexample, unifying
16383 @cindex nonunifying counterexample
16384 @cindex counterexample, nonunifying
16385 A @emph{unifying} counterexample is a single string that has two different
16386 parses; its existence proves that the grammar is ambiguous. When a unifying
16387 counterexample cannot be found in reasonable time, a @emph{nonunifying}
16388 counterexample is built: @emph{two} different string sharing the prefix up
16391 @xref{Counterexamples}
16393 @item Default reduction
16394 The reduction that a parser should perform if the current parser state
16395 contains no other action for the lookahead token. In permitted parser
16396 states, Bison declares the reduction with the largest lookahead set to be
16397 the default reduction and removes that lookahead set. @xref{Default
16400 @item Defaulted state
16401 A consistent state with a default reduction. @xref{Default Reductions}.
16403 @item Dynamic allocation
16404 Allocation of memory that occurs during execution, rather than at
16405 compile time or on entry to a function.
16408 Analogous to the empty set in set theory, the empty string is a
16409 character string of length zero.
16411 @item Finite-state stack machine
16412 A ``machine'' that has discrete states in which it is said to exist at
16413 each instant in time. As input to the machine is processed, the
16414 machine moves from state to state as specified by the logic of the
16415 machine. In the case of the parser, the input is the language being
16416 parsed, and the states correspond to various stages in the grammar
16417 rules. @xref{Algorithm}.
16419 @item Generalized LR (GLR)
16420 A parsing algorithm that can handle all context-free grammars, including those
16421 that are not LR(1). It resolves situations that Bison's
16422 deterministic parsing
16423 algorithm cannot by effectively splitting off multiple parsers, trying all
16424 possible parsers, and discarding those that fail in the light of additional
16425 right context. @xref{Generalized LR Parsing}.
16428 A language construct that is (in general) grammatically divisible;
16429 for example, `expression' or `declaration' in C@.
16430 @xref{Language and Grammar}.
16432 @item IELR(1) (Inadequacy Elimination LR(1))
16433 A minimal LR(1) parser table construction algorithm. That is, given any
16434 context-free grammar, IELR(1) generates parser tables with the full
16435 language-recognition power of canonical LR(1) but with nearly the same
16436 number of parser states as LALR(1). This reduction in parser states is
16437 often an order of magnitude. More importantly, because canonical LR(1)'s
16438 extra parser states may contain duplicate conflicts in the case of non-LR(1)
16439 grammars, the number of conflicts for IELR(1) is often an order of magnitude
16440 less as well. This can significantly reduce the complexity of developing a
16441 grammar. @xref{LR Table Construction}.
16443 @item Infix operator
16444 An arithmetic operator that is placed between the operands on which it
16445 performs some operation.
16448 A continuous flow of data between devices or programs.
16451 ``Token'' and ``symbol'' are each overloaded to mean either a grammar symbol
16452 (kind) or all parse info (kind, value, location) associated with occurrences
16453 of that grammar symbol from the input. To disambiguate,
16457 we use ``token kind'' and ``symbol kind'' to mean both grammar symbols and
16458 the values that represent them in a base programming language (C, C++,
16459 etc.). The names of the types of these values are typically
16460 @code{token_kind_t}, or @code{token_kind_type}, or @code{TokenKind},
16461 depending on the programming language.
16464 we use ``token'' and ``symbol'' without the word ``kind'' to mean parsed
16465 occurrences, and we append the word ``type'' to refer to the types that
16466 represent them in a base programming language.
16469 In summary: When you see ``kind'', interpret ``symbol'' or ``token'' to mean
16470 a @emph{grammar symbol}. When you don't see ``kind'' (including when you
16471 see ``type''), interpret ``symbol'' or ``token'' to mean a @emph{parsed
16474 @item LAC (Lookahead Correction)
16475 A parsing mechanism that fixes the problem of delayed syntax error
16476 detection, which is caused by LR state merging, default reductions, and the
16477 use of @code{%nonassoc}. Delayed syntax error detection results in
16478 unexpected semantic actions, initiation of error recovery in the wrong
16479 syntactic context, and an incorrect list of expected tokens in a verbose
16480 syntax error message. @xref{LAC}.
16482 @item Language construct
16483 One of the typical usage schemas of the language. For example, one of
16484 the constructs of the C language is the @code{if} statement.
16485 @xref{Language and Grammar}.
16487 @item Left associativity
16488 Operators having left associativity are analyzed from left to right:
16489 @samp{a+b+c} first computes @samp{a+b} and then combines with
16490 @samp{c}. @xref{Precedence}.
16492 @item Left recursion
16493 A rule whose result symbol is also its first component symbol; for
16494 example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion}.
16496 @item Left-to-right parsing
16497 Parsing a sentence of a language by analyzing it token by token from
16498 left to right. @xref{Algorithm}.
16500 @item Lexical analyzer (scanner)
16501 A function that reads an input stream and returns tokens one by one.
16504 @item Lexical tie-in
16505 A flag, set by actions in the grammar rules, which alters the way
16506 tokens are parsed. @xref{Lexical Tie-ins}.
16508 @item Literal string token
16509 A token which consists of two or more fixed characters. @xref{Symbols}.
16511 @item Lookahead token
16512 A token already read but not yet shifted. @xref{Lookahead}.
16515 The class of context-free grammars that Bison (like most other parser
16516 generators) can handle by default; a subset of LR(1).
16517 @xref{Mysterious Conflicts}.
16520 The class of context-free grammars in which at most one token of
16521 lookahead is needed to disambiguate the parsing of any piece of input.
16523 @item Nonterminal symbol
16524 A grammar symbol standing for a grammatical construct that can
16525 be expressed through rules in terms of smaller constructs; in other
16526 words, a construct that is not a token. @xref{Symbols}.
16529 A function that recognizes valid sentences of a language by analyzing
16530 the syntax structure of a set of tokens passed to it from a lexical
16533 @item Postfix operator
16534 An arithmetic operator that is placed after the operands upon which it
16535 performs some operation.
16538 Replacing a string of nonterminals and/or terminals with a single
16539 nonterminal, according to a grammar rule. @xref{Algorithm}.
16542 A reentrant subprogram is a subprogram which can be in invoked any
16543 number of times in parallel, without interference between the various
16544 invocations. @xref{Pure Decl}.
16546 @item Reverse Polish Notation
16547 A language in which all operators are postfix operators.
16549 @item Right recursion
16550 A rule whose result symbol is also its last component symbol; for
16551 example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion}.
16554 In computer languages, the semantics are specified by the actions
16555 taken for each instance of the language, i.e., the meaning of
16556 each statement. @xref{Semantics}.
16559 A parser is said to shift when it makes the choice of analyzing
16560 further input from the stream rather than reducing immediately some
16561 already-recognized rule. @xref{Algorithm}.
16563 @item Single-character literal
16564 A single character that is recognized and interpreted as is.
16565 @xref{Grammar in Bison}.
16568 The nonterminal symbol that stands for a complete valid utterance in
16569 the language being parsed. The start symbol is usually listed as the
16570 first nonterminal symbol in a language specification.
16574 A (finite) enumeration of the grammar symbols, as processed by the parser.
16578 A data structure where symbol names and associated data are stored during
16579 parsing to allow for recognition and use of existing information in repeated
16580 uses of a symbol. @xref{Multi-function Calc}.
16583 An error encountered during parsing of an input stream due to invalid
16584 syntax. @xref{Error Recovery}.
16586 @item Terminal symbol
16587 A grammar symbol that has no rules in the grammar and therefore is
16588 grammatically indivisible. The piece of text it represents is a token.
16589 @xref{Language and Grammar}.
16592 A basic, grammatically indivisible unit of a language. The symbol that
16593 describes a token in the grammar is a terminal symbol. The input of the
16594 Bison parser is a stream of tokens which comes from the lexical analyzer.
16598 A (finite) enumeration of the grammar terminals, as discriminated by the
16599 scanner. @xref{Symbols}.
16601 @item Unreachable state
16602 A parser state to which there does not exist a sequence of transitions from
16603 the parser's start state. A state can become unreachable during conflict
16604 resolution. @xref{Unreachable States}.
16607 @node GNU Free Documentation License
16608 @appendix GNU Free Documentation License
16613 @unnumbered Bibliography
16615 @c Please follow the following canvas to add more references.
16616 @c And keep sorted alphabetically.
16619 @anchor{Corbett 1984}
16620 @item [Corbett 1984]
16622 Robert Paul Corbett,
16624 Static Semantics in Compiler Error Recovery
16626 Ph.D. Dissertation, Report No. UCB/CSD 85/251,
16628 Department of Electrical Engineering and Computer Science, Compute Science
16629 Division, University of California, Berkeley, California
16633 @uref{https://digicoll.lib.berkeley.edu/record/135875}
16635 @anchor{Denny 2008}
16637 Joel E. Denny and Brian A. Malloy, IELR(1): Practical LR(1) Parser Tables
16638 for Non-LR(1) Grammars with Conflict Resolution, in @cite{Proceedings of the
16639 2008 ACM Symposium on Applied Computing} (SAC'08), ACM, New York, NY, USA,
16640 pp.@: 240--245. @uref{https://dx.doi.org/10.1145/1363686.1363747}
16642 @anchor{Denny 2010 May}
16643 @item [Denny 2010 May]
16644 Joel E. Denny, PSLR(1): Pseudo-Scannerless Minimal LR(1) for the
16645 Deterministic Parsing of Composite Languages, Ph.D. Dissertation, Clemson
16646 University, Clemson, SC, USA (May 2010).
16647 @uref{https://tigerprints.clemson.edu/all_dissertations/519/}
16649 @anchor{Denny 2010 November}
16650 @item [Denny 2010 November]
16651 Joel E. Denny and Brian A. Malloy, The IELR(1) Algorithm for Generating
16652 Minimal LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution,
16653 in @cite{Science of Computer Programming}, Vol.@: 75, Issue 11 (November
16654 2010), pp.@: 943--979. @uref{https://dx.doi.org/10.1016/j.scico.2009.08.001}
16656 @anchor{DeRemer 1982}
16657 @item [DeRemer 1982]
16658 Frank DeRemer and Thomas Pennello, Efficient Computation of LALR(1)
16659 Look-Ahead Sets, in @cite{ACM Transactions on Programming Languages and
16660 Systems}, Vol.@: 4, No.@: 4 (October 1982), pp.@:
16661 615--649. @uref{https://dx.doi.org/10.1145/69622.357187}
16663 @anchor{Isradisaikul 2015}
16664 @item [Isradisaikul 2015]
16665 Chinawat Isradisaikul, Andrew Myers,
16666 Finding Counterexamples from Parsing Conflicts,
16667 in @cite{Proceedings of the 36th ACM SIGPLAN Conference on
16668 Programming Language Design and Implementation} (PLDI '15),
16669 ACM, pp.@: 555--564.
16670 @uref{https://www.cs.cornell.edu/andru/papers/cupex/cupex.pdf}
16672 @anchor{Johnson 1978}
16673 @item [Johnson 1978]
16675 A portable compiler: theory and practice,
16676 in @cite{Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on
16677 Principles of programming languages} (POPL '78),
16679 @uref{https://dx.doi.org/10.1145/512760.512771}.
16681 @anchor{Knuth 1965}
16683 Donald E. Knuth, On the Translation of Languages from Left to Right, in
16684 @cite{Information and Control}, Vol.@: 8, Issue 6 (December 1965), pp.@:
16685 607--639. @uref{https://dx.doi.org/10.1016/S0019-9958(65)90426-2}
16687 @anchor{Scott 2000}
16689 Elizabeth Scott, Adrian Johnstone, and Shamsa Sadaf Hussain,
16690 @cite{Tomita-Style Generalised LR Parsers}, Royal Holloway, University of
16691 London, Department of Computer Science, TR-00-12 (December 2000).
16692 @uref{https://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps}
16695 @node Index of Terms
16696 @unnumbered Index of Terms
16702 @c LocalWords: texinfo setfilename settitle setchapternewpage finalout texi FSF
16703 @c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex FSF's
16704 @c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry Naur
16705 @c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa Multi
16706 @c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc multi
16707 @c LocalWords: rpcalc Lexer Expr ltcalc mfcalc yylex defaultprec Donnelly Gotos
16708 @c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref yypush
16709 @c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex lr
16710 @c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init POSIX ODR
16711 @c LocalWords: pre STDC GNUC endif yy YY alloca lf stddef stdlib YYDEBUG yypull
16712 @c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit nonfree
16713 @c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok rr
16714 @c LocalWords: longjmp fprintf stderr yylloc YYLTYPE cos ln Stallman Destructor
16715 @c LocalWords: symrec val tptr FUN func struct sym enum IEC syntaxes Byacc
16716 @c LocalWords: fun putsym getsym arith funs atan ptr malloc sizeof Lex pcc
16717 @c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum DOTDOT
16718 @c LocalWords: ptypes itype trigraphs yytname expseq vindex dtype Unary usr
16719 @c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless yynerrs nonterminal
16720 @c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES reentrant
16721 @c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param yypstate
16722 @c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP subrange
16723 @c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword loc
16724 @c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH inline
16725 @c LocalWords: YYINITDEPTH stmts ref initdcl maybeasm notype Lookahead ctx
16726 @c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args Autoconf
16727 @c LocalWords: ypp yxx itemx tex leaderfill Troubleshouting sqrt Graphviz
16728 @c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
16729 @c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
16730 @c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
16731 @c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC
16732 @c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs sr
16733 @c LocalWords: yytokentype destructor multicharacter nonnull EBCDIC nterm LR's
16734 @c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK
16735 @c LocalWords: destructors Reentrancy nonreentrant subgrammar nonassociative Ph
16736 @c LocalWords: deffnx namespace xml goto lalr ielr runtime lex yacc yyps env
16737 @c LocalWords: yystate variadic Unshift NLS gettext po UTF Automake LOCALEDIR
16738 @c LocalWords: YYENABLE bindtextdomain Makefile DEFS CPPFLAGS DBISON DeRemer
16739 @c LocalWords: autoreconf Pennello multisets nondeterminism Generalised baz ACM
16740 @c LocalWords: redeclare automata Dparse localedir datadir XSLT midrule Wno
16741 @c LocalWords: multitable headitem hh basename Doxygen fno filename gdef de
16742 @c LocalWords: doxygen ival sval deftypemethod deallocate pos deftypemethodx
16743 @c LocalWords: Ctor defcv defcvx arg accessors CPP ifndef CALCXX YYerror
16744 @c LocalWords: lexer's calcxx bool LPAREN RPAREN deallocation cerrno climits
16745 @c LocalWords: cstdlib Debian undef yywrap unput noyywrap nounput zA yyleng
16746 @c LocalWords: errno strtol ERANGE str strerror iostream argc argv Javadoc PSLR
16747 @c LocalWords: bytecode initializers superclass stype ASTNode autoboxing nls
16748 @c LocalWords: toString deftypeivar deftypeivarx deftypeop YYParser strictfp
16749 @c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv
16750 @c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url
16751 @c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos
16752 @c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett LALR's
16753 @c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy ints
16754 @c LocalWords: Scannerless ispell american ChangeLog smallexample CSTYPE CLTYPE
16755 @c LocalWords: clval CDEBUG cdebug deftypeopx yyterminate LocationType yyo
16756 @c LocalWords: parsers parser's documentencoding documentlanguage Wempty ss
16757 @c LocalWords: associativity subclasses precedences unresolvable runnable
16758 @c LocalWords: allocators subunit initializations unreferenced untyped dir
16759 @c LocalWords: errorVerbose subtype subtypes Wmidrule midrule's src rvalues
16760 @c LocalWords: automove evolutions Wother Wconflicts PNG lookaheads Acc sep
16761 @c LocalWords: xsltproc XSL xsl xhtml html num Wprecedence Werror fcaret gv
16762 @c LocalWords: fdiagnostics setlocale nullptr ast srcdir iff drv rgbWarning
16763 @c LocalWords: deftypefunx pragma Wnull dereference Wdocumentation elif ish
16764 @c LocalWords: Wdeprecated Wregister noinput yyloc yypos PODs sstream Wsign
16765 @c LocalWords: typename emplace Wconversion Wshorten yacchack reentrancy ou
16766 @c LocalWords: Relocatability exprs fixit Wyacc parseable fixits ffixit svg
16767 @c LocalWords: DNDEBUG cstring Wzero workalike POPL workalikes byacc UCB
16768 @c LocalWords: Penello's Penello Byson Byson's Corbett's CSD TOPLAS PDP cex
16769 @c LocalWords: Beazley's goyacc ocamlyacc SIGACT SIGPLAN colorWarning exVal
16770 @c LocalWords: setcolor rgbError colorError rgbNotice colorNotice derror
16771 @c LocalWords: colorOff maincolor inlineraw darkviolet darkcyan dwarning
16772 @c LocalWords: dnotice copyable stdint ptrdiff bufsize yyreport invariants
16773 @c LocalWords: xrefautomaticsectiontitle yysyntax yysymbol ARGMAX cond RTTI
16774 @c LocalWords: Wdangling yytoken erreur syntaxe inattendu attendait nombre
16775 @c LocalWords: YYUNDEF SymbolKind yypcontext YYENOMEM TOKENMAX getBundle
16776 @c LocalWords: ResourceBundle myResources getString getName getToken ylwrap
16777 @c LocalWords: getLocation getExpectedTokens reportSyntaxError bistromathic
16778 @c LocalWords: TokenKind Automake's rtti Wcounterexamples Chinawat PLDI buf
16779 @c LocalWords: Isradisaikul tcite pcite rgbGreen colorGreen rgbYellow Wcex
16780 @c LocalWords: colorYellow rgbRed colorRed rgbBlue colorBlue rgbPurple Ddoc
16781 @c LocalWords: colorPurple ifhtml ifnothtml situ rcex MERCHANTABILITY Wnone
16782 @c LocalWords: diagError diagNotice diagWarning diagOff danglingElseCex
16783 @c LocalWords: nonunifying YYNOMEM Wuseless dgettext textdomain domainname
16784 @c LocalWords: dirname typeof writeln YYBISON YYLOCATION backend structs
16785 @c LocalWords: pushParse
16787 @c Local Variables:
16788 @c ispell-dictionary: "american"