Broke dependency on facets and trying to coexist with Rails.
[treetop.git] / doc / site / semantic_interpretation.html
blobe4e952e6d7af14963183e6928841077225cb6c70
1 <html><head><link type="text/css" href="./screen.css" rel="stylesheet" />
2 <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3 </script>
4 <script type="text/javascript">
5 _uacct = "UA-3418876-1";
6 urchinTracker();
7 </script>
8 </head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li><a href="syntactic_recognition.html">Syntax</a></li><li>Semantics</li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Semantic Interpretation</h1>
10 <p>Lets use the below grammar as an example. It describes parentheses wrapping a single character to an arbitrary depth.</p>
12 <pre><code>grammar ParenLanguage
13 rule parenthesized_letter
14 '(' parenthesized_letter ')'
16 [a-z]
17 end
18 end
19 </code></pre>
21 <p>Matches:</p>
23 <ul>
24 <li><code>'a'</code></li>
25 <li><code>'(a)'</code></li>
26 <li><code>'((a))'</code></li>
27 <li>etc.</li>
28 </ul>
30 <p>Output from a parser for this grammar looks like this:</p>
32 <p><img src="./images/paren_language_output.png" alt="Tree Returned By ParenLanguageParser"/></p>
34 <p>This is a parse tree whose nodes are instances of <code>Treetop::Runtime::SyntaxNode</code>. What if we could define methods on these node objects? We would then have an object-oriented program whose structure corresponded to the structure of our language. Treetop provides two techniques for doing just this.</p>
36 <h2>Associating Methods with Node-Instantiating Expressions</h2>
38 <p>Sequences and all types of terminals are node-instantiating expressions. When they match, they create instances of <code>Treetop::Runtime::SyntaxNode</code>. Methods can be added to these nodes in the following ways:</p>
40 <h3>Inline Method Definition</h3>
42 <p>Methods can be added to the nodes instantiated by the successful match of an expression</p>
44 <pre><code>grammar ParenLanguage
45 rule parenthesized_letter
46 '(' parenthesized_letter ')' {
47 def depth
48 parenthesized_letter.depth + 1
49 end
52 [a-z] {
53 def depth
55 end
57 end
58 end
59 </code></pre>
61 <p>Note that each alternative expression is followed by a block containing a method definition. A <code>depth</code> method is defined on both expressions. The recursive <code>depth</code> method defined in the block following the first expression determines the depth of the nested parentheses and adds one two it. The base case is implemented in the block following the second expression; a single character has a depth of 0.</p>
63 <h3>Custom <code>SyntaxNode</code> Subclass Declarations</h3>
65 <p>You can instruct the parser to instantiate a custom subclass of Treetop::Runtime::SyntaxNode for an expression by following it by the name of that class enclosed in angle brackets (<code>&lt;&gt;</code>). The above inline method definitions could have been moved out into a single class like so.</p>
67 <pre><code># in .treetop file
68 grammar ParenLanguage
69 rule parenthesized_letter
70 '(' parenthesized_letter ')' &lt;ParenNode&gt;
72 [a-z] &lt;ParenNode&gt;
73 end
74 end
76 # in separate .rb file
77 class ParenNode &lt; Treetop::Runtime::SyntaxNode
78 def depth
79 if nonterminal?
80 parenthesized_letter.depth + 1
81 else
83 end
84 end
85 end
86 </code></pre>
88 <h2>Automatic Extension of Results</h2>
90 <p>Nonterminal and ordered choice expressions do not instantiate new nodes, but rather pass through nodes that are instantiated by other expressions. They can extend nodes they propagate with anonymous or declared modules, using similar constructs used with expressions that instantiate their own syntax nodes.</p>
92 <h3>Extending a Propagated Node with an Anonymous Module</h3>
94 <pre><code>rule parenthesized_letter
95 ('(' parenthesized_letter ')' / [a-z]) {
96 def depth
97 if nonterminal?
98 parenthesized_letter.depth + 1
99 else
105 </code></pre>
107 <p>The parenthesized choice above can result in a node matching either of the two choices. Than node will be extended with methods defined in the subsequent block. Note that a choice must always be parenthesized to be associated with a following block.</p>
109 <h3>Extending A Propagated Node with a Declared Module</h3>
111 <pre><code># in .treetop file
112 rule parenthesized_letter
113 ('(' parenthesized_letter ')' / [a-z]) &lt;ParenNode&gt;
116 # in separate .rb file
117 module ParenNode
118 def depth
119 if nonterminal?
120 parenthesized_letter.depth + 1
121 else
126 </code></pre>
128 <p>Here the result is extended with the <code>ParenNode</code> module. Note the previous example for node-instantiating expressions, the constant in the declaration must be a module because the result is extended with it.</p>
130 <h2>Automatically-Defined Element Accessor Methods</h2>
132 <h3>Default Accessors</h3>
134 <p>Nodes instantiated upon the matching of sequences have methods automatically defined for any nonterminals in the sequence.</p>
136 <pre><code>rule abc
137 a b c {
138 def to_s
139 a.to_s + b.to_s + c.to_s
143 </code></pre>
145 <p>In the above code, the <code>to_s</code> method calls automatically-defined element accessors for the nodes returned by parsing nonterminals <code>a</code>, <code>b</code>, and <code>c</code>. </p>
147 <h3>Labels</h3>
149 <p>Subexpressions can be given an explicit label to have an element accessor method defined for them. This is useful in cases of ambiguity between two references to the same nonterminal or when you need to access an unnamed subexpression.</p>
151 <pre><code>rule labels
152 first_letter:[a-z] rest_letters:(', ' letter:[a-z])* {
153 def letters
154 [first_letter] + rest_letters.map do |comma_and_letter|
155 comma_and_letter.letter
160 </code></pre>
162 <p>The above grammar uses label-derived accessors to determine the letters in a comma-delimited list of letters. The labeled expressions <em>could</em> have been extracted to their own rules, but if they aren't used elsewhere, labels still enable them to be referenced by a name within the expression's methods.</p>
164 <h3>Overriding Element Accessors</h3>
166 <p>The module containing automatically defined element accessor methods is an ancestor of the module in which you define your own methods, meaning you can override them with access to the <code>super</code> keyword. Here's an example of how this fact can improve the readability of the example above.</p>
168 <pre><code>rule labels
169 first_letter:[a-z] rest_letters:(', ' letter:[a-z])* {
170 def letters
171 [first_letter] + rest_letters
174 def rest_letters
175 super.map { |comma_and_letter| comma_and_letter.letter }
179 </code></pre>
181 <h2>Methods Available on <code>Treetop::Runtime::SyntaxNode</code></h2>
183 <table>
184 <tr>
185 <td>
186 <code>terminal?</code>
187 </td>
188 <td>
189 Was this node produced by the matching of a terminal symbol?
190 </td>
191 </tr>
192 <tr>
193 <td>
194 <code>nonterminal?</code>
195 </td>
196 <td>
197 Was this node produced by the matching of a nonterminal symbol?
198 </td>
199 <tr>
200 <td>
201 <code>text_value</code>
202 </td>
203 <td>
204 The substring of the input represented by this node.
205 </td>
206 <tr>
207 <td>
208 <code>elements</code>
209 </td>
210 <td>
211 Available only on nonterminal nodes, returns the nodes parsed by the elements of the matched sequence.
212 </td>
213 </tr>
214 </table></div></div></div><div id="bottom"></div></body></html>