1 <html><head><link type=
"text/css" href=
"./screen.css" rel=
"stylesheet" />
2 <script src=
"http://www.google-analytics.com/urchin.js" type=
"text/javascript">
4 <script type=
"text/javascript">
5 _uacct
= "UA-3418876-1";
8 </head><body><div id=
"top"><div id=
"main_navigation"><ul><li>Documentation
</li><li><a href=
"contribute.html">Contribute
</a></li><li><a href=
"index.html">Home
</a></li></ul></div></div><div id=
"middle"><div id=
"content"><div id=
"secondary_navigation"><ul><li>Syntax
</li><li><a href=
"semantic_interpretation.html">Semantics
</a></li><li><a href=
"using_in_ruby.html">Using In Ruby
</a></li><li><a href=
"pitfalls_and_advanced_techniques.html">Advanced Techniques
</a></li></ul></div><div id=
"documentation_content"><h1>Syntactic Recognition
</h1>
10 <p>Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of
<a href=
"http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammars
</a> is useful in writing Treetop grammars.
</p>
12 <h1>Grammar Structure
</h1>
14 <p>Treetop grammars look like this:
</p>
16 <pre><code>grammar GrammarName
29 <p>The main keywords are:
</p>
32 <li><p><code>grammar
</code> : This introduces a new grammar. It is followed by a constant name to which the grammar will be bound when it is loaded.
</p></li>
33 <li><p><code>rule
</code> : This defines a parsing rule within the grammar. It is followed by a name by which this rule can be referenced within other rules. It is then followed by a parsing expression defining the rule.
</p></li>
36 <h1>Parsing Expressions
</h1>
38 <p>Each rule associates a name with a
<em>parsing expression
</em>. Parsing expressions are a generalization of vanilla regular expressions. Their key feature is the ability to reference other expressions in the grammar by name.
</p>
40 <h2>Terminal Symbols
</h2>
44 <p>Strings are surrounded in double or single quotes and must be matched exactly.
</p>
47 <li><code>"foo"</code></li>
48 <li><code>'foo'
</code></li>
51 <h3>Character Classes
</h3>
53 <p>Character classes are surrounded by brackets. Their semantics are identical to those used in Ruby's regular expressions.
</p>
56 <li><code>[a-zA-Z]
</code></li>
57 <li><code>[
0-
9]
</code></li>
60 <h3>The Anything Symbol
</h3>
62 <p>The anything symbol is represented by a dot (
<code>.
</code>) and matches any single character.
</p>
64 <h2>Nonterminal Symbols
</h2>
66 <p>Nonterminal symbols are unquoted references to other named rules. They are equivalent to an inline substitution of the named expression.
</p>
77 <p>The above grammar is equivalent to:
</p>
84 <h2>Ordered Choice
</h2>
86 <p>Parsers attempt to match ordered choices in left-to-right order, and stop after the first successful match.
</p>
88 <pre><code>"foobar" /
"foo" /
"bar"
91 <p>Note that if
<code>"foo"</code> in the above expression came first,
<code>"foobar"</code> would never be matched.
</p>
95 <p>Sequences are a space-separated list of parsing expressions. They have higher precedence than choices, so choices must be parenthesized to be used as the elements of a sequence.
</p>
97 <pre><code>"foo" "bar" (
"baz" /
"bop")
100 <h2>Zero or More
</h2>
102 <p>Parsers will greedily match an expression zero or more times if it is followed by the star (
<code>*
</code>) symbol.
</p>
105 <li><code>'foo'*
</code> matches the empty string,
<code>"foo"</code>,
<code>"foofoo"</code>, etc.
</li>
110 <p>Parsers will greedily match an expression one or more times if it is followed by the star (
<code>+
</code>) symbol.
</p>
113 <li><code>'foo'+
</code> does not match the empty string, but matches
<code>"foo"</code>,
<code>"foofoo"</code>, etc.
</li>
116 <h2>Optional Expressions
</h2>
118 <p>An expression can be declared optional by following it with a question mark (
<code>?
</code>).
</p>
121 <li><code>'foo'?
</code> matches
<code>"foo"</code> or the empty string.
</li>
124 <h2>Lookahead Assertions
</h2>
126 <p>Lookahead assertions can be used to give parsing expressions a limited degree of context-sensitivity. The parser will look ahead into the buffer and attempt to match an expression without consuming input.
</p>
128 <h3>Positive Lookahead Assertion
</h3>
130 <p>Preceding an expression with an ampersand
<code>(
&)
</code> indicates that it must match, but no input will be consumed in the process of determining whether this is true.
</p>
133 <li><code>"foo" &"bar"</code> matches
<code>"foobar"</code> but only consumes up to the end
<code>"foo"</code>. It will not match
<code>"foobaz"</code>.
</li>
136 <h3>Negative Lookahead Assertion
</h3>
138 <p>Preceding an expression with a bang
<code>(!)
</code> indicates that the expression must not match, but no input will be consumed in the process of determining whether this is true.
</p>
141 <li><code>"foo" !
"bar"</code> matches
<code>"foobaz"</code> but only consumes up to the end
<code>"foo"</code>. It will not match
<code>"foobar"</code>.
</li>
142 </ul></div></div></div><div id=
"bottom"></div></body></html>