1 # Copyright (C) 2001-2006, The Perl Foundation.
6 docs/pdds/pdd20_lexical_vars.pod - Lexical variables
14 This document defines the requirements and implementation strategy for
15 lexically scoped variables.
24 print P0 # prints 13013
28 P0 = find_lex "$a" # may succeed; depends on closure creation
32 P0 = find_lex "$a" # guaranteed to fail: no .lex, no :outer()
37 .end # no .lex and no :lex, thus: no LexInfo, no LexPad
40 # Lexical behavior varies by HLL. For example,
41 # Tcl's lexicals are not declared at compile time.
43 .HLL "Tcl", "tcl_group"
45 .sub grault :lex # without ":lex", Tcl subs have no lexicals
46 P0 = find_lex "x" # FAILS
48 P0 = new Integer # really TclInteger
50 store_lex "x", P0 # creates lexical "x"
52 P0 = find_lex "x" # SUCCEEDS
57 For Parrot purposes, "lexical variables" are variables stored in a
58 hash (or hash-like) PMC associated with a subroutine invocation,
61 =head2 Conceptual Model
65 LexInfo PMCs contain what is known at compile time about lexical variables of a
66 given subroutine: their names (for most languages), perhaps their types, etc.
67 They are the interface through which the PIR compiler stores and validates
68 compile-time information about lexical variables.
70 At compile time, each newly created Subroutine (or Subroutine derivative,
71 e.g. Closure) that uses lexical variables will be populated with a PMC of
72 HLL-mapped type LexInfo. (Note that this type may actually be Null in some
77 LexPads hold what becomes known at run time about lexical variables of a given
78 invocation of a given subroutine: their values, of course, and for some
79 languages (e.g. Tcl) their names. They are the interface through which the
80 Parrot runtime stores and fetches lexical variables.
82 At run time, each call frame for a Subroutine (or Subroutine derivative) that
83 uses lexical variables will be populated with a PMC of HLL-mapped
84 type LexPad. Note that call frames for subroutines without lexical
85 variables will omit the LexPad.
87 From the interface perspective, LexPads are basically Hashes, with strings as
88 keys and PMCs as values. They extend the basic Hash interface with
89 specialized initialization (requiring a reference to an associated LexInfo)
90 and the query METHOD C<get_lexinfo()> (to return it).
92 LexPad keys are unique. Therefore, in each subroutine, there can be only one
93 lexical variable with a given name.
95 In the normal use case, LexPads are not exposed to user code (not for any
96 special reason; it just worked out that way). Instead, specialized opcodes
97 implement the common use cases. Specialized opcodes are particularly a Good
98 Idea here because most lexical usage involves searching more than one LexPad,
99 so a single LexPad reference wouldn't be as useful as one might expect. And,
100 of course, opcodes can cheat ... er, can be written in optimized C. :-)
102 TODO: Describe how lexical naming system interacts with non-ASCII character
105 =head3 Lexical Lookup Algorithm
107 If Parrot is asked to access a lexical variable named $var, Parrot
108 follows the following strategy. Note that fetch and store use the
111 Parrot starts with the currently executing subroutine $sub, then loops
114 1. Starting at the current call frame, walk back until an active frame is
115 found that is executing $sub. Call it $frame.
117 (NOTE: The first time through, $sub is the current subroutine and $frame
118 is the currently live frame.)
120 2. Look for $var in $frame.get_lexpad using standard Hash methods.
122 3. If the given pad contains $var, fetch/store it and REPORT SUCCESS.
124 4. Set $sub to $sub.outer. (That is, the textually enclosing subroutine.)
125 But if $sub has no outer sub, REPORT FAILURE.
127 =head3 LexPad and LexInfo are optional; the ":lex" attribute
129 Parrot does not assume that every subroutine needs lexical variables.
130 Therefore, Parrot defaults to I<not> creating LexInfo or LexPad PMCs. It only
131 creates a LexInfo when it first encounters a ".lex" directive in the
132 subroutine. If no such directive is found, Parrot does not create a LexInfo
133 for it at compile time, and therefore cannot create a LexPad for it at run
136 However, an absence of ".lex" directives is normal for some languages
137 (e.g. Tcl) which lack compile-time knowledge of lexicals. For these
138 languages, the additional Subroutine attribute ":lex" should be specified. It
139 forces Parrot to create LexInfo and LexPads.
143 NOTE: This section should be taken using the "as-if" rule: Parrot behaves as
144 if this section were literally true. As always, short cuts (development and
145 runtime) may be taken.
147 Closures are specialized Subroutines that carry their I<lexical environment>
148 along with them. A lexical environment, which we will call a "LexEnv" for
149 brevity, is a list of LexPads to be searched when looking for lexical
150 variables. Its implementation may be as simple as a basic PMC array, but any
151 ordered integer-indexed collection will do.
153 =head3 Closure creation: Capturing the lexical environment
155 The C<newclosure> op creates a Closure from a Subroutine and gives that
156 Closure a new LexEnv attribute. The LexEnv is then populated with pointers to
157 the current I<enclosing> LexPads. The definition of "enclosing" is not
160 The algorithm used to find "enclosing" LexPads is a loop of the following
161 steps, starting with $sub set to the running Subroutine (which is a Closure):
163 1. Starting at the current call frame, walk back until an active frame is
164 found that is executing $sub. Call it $frame.
166 (NOTE: The first time through, $sub is the current subroutine and $frame
167 is the currently live frame.)
169 2. Append $frame's LexPad to the LexEnv.
171 3. If $sub has a LexEnv, append $sub's LexEnv to the LexEnv being built,
172 and END LOOP. Otherwise:
174 4. Set $sub to $sub.outer. (That is, the textually enclosing subroutine.)
175 But if $sub has no outer sub, END LOOP.
177 NOTE: The C<newclosure> opcode should check to make sure that the target
178 Subroutine has an C<:outer()> attribute that points back to the currently
179 running Subroutine. This is a requirement for closures.
181 =head3 Closure runtime: Using the lexical environment
183 At runtime, the C<find_lex> opcode behaves differently in closures. It has no
184 need to walk the call stack finding LexPads - they have all already been
185 collected conveniently together in the LexEnv. Therefore, in a Closure,
186 C<find_lex> I<ignores> the call stack, and instead searches (1) the current
187 call frame's LexPad - i.e. the Closure's own lexicals -- and then (2) the
188 LexPads in the LexEnv.
190 =head3 HLL Type Mapping
192 The implementation of lexical variables in the PIR compiler depends on two new
193 PMCs: LexPad and LexInfo. However, the default Parrot LexPad and LexInfo PMCs
194 will not meet the needs of all languages. They should suit Perl 6, for
195 example, but not Tcl.
197 Therefore, it is expected that HLLs will map the LexPad and LexInfo types to
198 something more appropriate (e.g. TclLexPad and TclLexInfo). That mapping will
199 automatically occur when the appropriate ".HLL" directive is in force.
201 Using Tcl as an extreme example: TclLexPad will likely be a thin veneer on
202 PMCHash. Meanwhile, TclLexInfo will likely map to Null: Tcl provides no
203 reliable compile-time information about lexicals; without any compile-time
204 information to store, there's no need for TclLexInfo to do anything
207 =head3 Nested Subroutines Have Outies; the ":outer" attribute
209 For HLLs that support nested subroutines, Parrot provides a way to denote that
210 a given subroutine is conceptually "inside" another. Lookup for lexical
211 variables starts at the current call frame and proceeds through call frames
212 that invoke "outer" subroutines. The specific meaning of "outer" is defined
213 below, but it's designed to support the common linguistic structure of nested
214 subroutines where inner subs refer to lexical variables contained in outer
217 Note that "outer" and "caller" are very different concepts! For example,
218 given the Perl 6 code:
222 my sub a { eval '$a' }
226 The C<&foo> subroutine is the outer subroutine of C<&a>, but it is not the
229 In the above example, the definition of the Parrot subroutine implementing
230 &a must include a notation that it is textually enclosed within C<&foo>.
231 This is a static attribute of a Subroutine, set at compile time and never
232 changed thereafter. (Unless you're evil, or Damian. But I repeat myself.)
233 This information is given through an C<:outer()> subroutine attribute, e.g.:
237 Note that the "foo" sub B<must> be compiled first; in other words, "foo" must
238 appear before "a" in the source text. Compilers can easily do this via
239 preorder traversal of lexically-nested subs.
241 =head2 Required Interfaces: LexPad, LexInfo, Closure
245 Below are the standard LexInfo methods that all HLL LexInfo PMCs may support.
246 Each LexInfo PMC should only define the methods that it can usefully
247 implement, so the compiler can use method lookup failure to generate useful
248 diagnostics (e.g. "register aliasing not supported by Tcl lexicals").
250 Each language's LexInfo will implement methods that are helpful to that
251 language's LexPad. In the extreme case, LexInfo can be Null -- but if it is,
252 the given HLL should not generate any ".lex*" directives.
256 =item B<void init_pmc(PMC *sub)>
260 =item B<PMC *get_sub()>
262 Return the associated Subroutine.
264 =item B<void declare_lex_preg(STRING *name, INTVAL preg)>
266 Declare a lexical variable that is an alias for a PMC register. The PIR
267 compiler calls this method in response to a C<.lex STRING, PREG> directive.
268 For example, given this preamble:
273 These two opcodes have an identical effect:
278 And these two opcodes also have an identical effect:
287 LexPads start by implementing the Hash interface: variable names are string
288 keys, and variable values are PMCs.
290 In addition, LexPads must implement the following methods:
294 =item B<void init_pmc(PMC *lexinfo)>
296 Called exactly once. Note that Parrot guarantees that this method will be
297 called after the new Context object is made current. It is recommended that
298 any LexPad that aliases registers take a pointer to the current Context at
301 =item B<PMC *get_lexinfo()>
303 Return the associated LexInfo.
309 For debugging and introspection, the Closure PMC should support:
313 =item B<PMC *get_lexenv()>
315 Return the associated LexEnv, an ordered integer-index collection (e.g. an
316 Array) of LexPads captured at C<newclosure> time.
320 =head2 Default Parrot LexPad and LexInfo
322 The default LexInfo supports lexicals only as aliases for PMC registers. It
323 therefore implements C<declare_lex_preg()>. (Internally, it could be a Hash of
324 some kind, where keys are String variable names and values are integer
327 The default LexPad (like all LexPads) implements the Hash interface. When
328 asked to look up a variable, it finds the corresponding register number by
329 querying its associated LexInfo. It then gets or sets the given numbered
330 register in its associated Parrot Context structure.
332 =head2 Introspection without Call Frame PMCs
334 Due to implementation concerns, it will not be until late in Parrot
335 development -- if ever -- that call frames will be available to user code as
336 PMCs. Until then, the interpreter and continuation PMCs will be the interface
337 to use to get frame info.
339 For example, to get the immediate caller's LexPad, use:
342 $P1 = $P0["lexpad"; 1]
344 It's likely that this interface will continue to be available even once call
345 frames become visible as PMCs.
347 TODO: Full interpreter introspection interface.