misc/pm.txt

   1 This file contains miscellaneous questions about design+spec
   2 that Pm has come up with that are awaiting answers from TimToady
   3 and/or others.  We'll likely record the answers here as well.
   4
   5 Unanswered questions:
   6
   7 Pm-20:  What's the result of something like...?
   8             my $x = [<a b c d>];  'a b c d' ~~ / $x /;
   9
  10 Pm-23:  Should TOP (or any other grammar regex) be considered
  11     to have implied $-anchors when invoked from .parse?
  12     (See RT #77022.) Example:
  13
  14          grammar B { regex TOP { b } };
  15          say B.parse('bcd');                # failed match?
  16
  17 Pm-24:  Is               token { <abc>* }
  18         the same as      regex { [ <abc>* ]: }
  19         or               regex { [ <abc>: ]*: }  ?
  20
  21 ===========================================================
  22
  23 Answered questions:
  24
  25
  26 Pm-1:  In STD.pm, what is the semantic or key difference between <noun>
  27     and <term>?
  28
  29 None, they are now unified under <term>
  30
  31 ----------------------
  32
  33 Pm-2:  Are calls to subrules in other grammars still valid as
  34     C<<  / abc <OtherGrammar::xyz> def / >> ?  If so, then
  35     for the invocation of the subrule, do we construct a new
  36     cursor of type OtherGrammar and invoke the 'xyz' method on it?
  37     (Pm's preference is "yes" and "yes", but want confirmation.)
  38
  39 Yes, and yes.  That's essentially what STD is already doing for
  40 Regexen and such.
  41
  42 ----------------------
  43
  44 Pm-3:  When we generate a metaop, where does it live?  Lexical?
  45     Package?  If package, then what package?
  46
  47 Good question, if CORE is immutable, we can't add to it.  Probably
  48 UNIT, though that might prevent sharing of common definitions
  49 among different compilation units.  On the other hand, being sure
  50 you're based on the same underlying semantics is difficult anyway
  51 without a lot of lifting, so maybe UNIT is good enough for now.
  52
  53 ----------------------
  54
  55 Pm-4:  The C<.ast> method on a Match object returns the matched
  56     text if no abstract object has been set.  Is there (or should
  57     there be) a method to determine if an abstract object has been
  58     set?  (Currently I'm using C<.peek_ast> in nqp-rx for this.)
  59
  60 Uh, surely .ast is false if there isn't one.
  61
  62 [S05:2434 r28936 currently says that .ast returns the matched text,
  63 I expect this to change shortly.  --Pm]
  64
  65 ----------------------
  66
  67 Pm-5:  How "read-only" are subroutine parameters?  For example, given
  68     a subroutine like  C<<  sub abc($x) { ... }  >>, I know we can't
  69     assign to C<$x>, but can it be rebound using &infix:<:=> in the
  70     body of the sub?
  71
  72 That seems okay to me, since the intent is to not modify the
  73 passed in object, and the rebinding effectively anonymizes the
  74 argument that was bound.
  75
  76 ----------------------
  77
  78 Pm-6:  Is there a syntax that allows a (trusted) routine to access
  79     the private attributes of another object without going through
  80     an accessor?  For example, if object $b has a private attribute
  81     of "$!xyz", is there a syntax for me to get to that attribute?
  82
  83 See r28932.
  84
  85 [Answer:  as of r28932, C<< $b!SomeClass::xyz >>.  --Pm]
  86
  87 ----------------------
  88
  89 Pm-7:  S05 says that a match's reduction object is given by the
  90     C<:action> parameter, STD.pm's initparse has C<:actions> and
  91     $*ACTIONS.  Should we pick one and stick with it?  PGE and
  92     Rakudo have traditionally used C<:action> -- if there's to
  93     be a change, now would be a good time for it.  :-)
  94     (After looking at the way I typically use it, I'm leaning
  95     towards the plural form. --Pm)
  96
  97 Now pluralized in specland.
  98
  99 ----------------------
 100
 101 Pm-8: Are closures embedded in regexes creating a new lexical scope,
 102     or do they share the same scope as the regex block itself?
 103     (Currently I'm assuming they create a new scope, to be consistent
 104     with other uses of curlies. --Pm
 105
 106 Yes.
 107
 108 ----------------------
 109
 110 Pm-9: Inside of a regex, what happens with C<< <[z..a]> >> ?  Is it
 111     a compile-time error, an empty range, or ... ?
 112
 113 Let's make it a compile-time error for direct code, and a failure
 114 with warning for code compiled indirectly via <$code>.
 115
 116 ----------------------
 117
 118 Pm-10:  Subs are canonically considered to be stored in symbol
 119     tables (lexpads, namespaces) with the & sigil.  Is the same
 120     true for methods?  If we ask a method for its name or otherwise
 121     obtain a list of method names from an object, would we expect
 122     those names to have a & sigil as well?  (For HLL interop reasons
 123     Pm tends to want methods to not include a & sigil, but it's not
 124     a strong tendency.)
 125
 126 Whether there's the & sigil or not depends on where you're storing
 127 the name.  The MOP doesn't keep the sigil, but if a method is declared
 128 "my" or "our", the alias in the symbol table does have the &, since as
 129 far as the symbol table is concerned, the method is just a subroutine.
 130 (Also note that, while the MOP doesn't track the sigil on methods,
 131 it probably does track the sigil on attributes, such as $!foo, &!foo.)
 132 One could view exportation of methods as multi as a two-step process; first,
 133 declare my or our to make the & alias in the current lexpad, then
 134 "is export" does no more magic than it ordinarily does.  I suppose
 135 method aliases are always considered multi when they show up in a
 136 symbol table.
 137
 138 ----------------------
 139
 140 Kh-1: Is it true that Bool::True.name should work?  Should it return
 141     the string 'True'?
 142     I ask because of http://rt.perl.org/rt3/Public/Bug/Display.html?id=66576
 143
 144 Yes, because Bool is an object type, so it knows its .name.  (A native
 145 bool would not know its name, according to S12:1649.)
 146
 147 ----------------------
 148
 149 Jw-1: What is the difference between :(\$x) and :($x is ref)? Both
 150 would need to leave the original argument untouched. Since C<is ref>
 151 is meant to be used for e.g. map, which needs to work on lists like
 152 (1,2,3), it can't cut constants out, so it seems no more constraining
 153 than :(\$x) too. Should one of them go away?
 154
 155 No difference that I can see offhand.  I'm inclined to make 'is ref' go away
 156 and keep the backslashed form.
 157
 158 ----------------------
 159
 160 Pm-11: S11:257 says "Without an import list, C<import> imports
 161     the C<:DEFAULT> imports."  How does one import one of the
 162     other tagsets?  (I think I'm missing something obvious here.)
 163
 164 I believe the idea is simply to use the tagset pair directly as an argument.
 165 And since lexical importation is assumed, there's an implicit :MY() around it.
 166 So 'use Foo :tag' is probably short for 'use Foo :MY(:tag)'.
 167
 168 ----------------------
 169
 170 Ml-1: In my $x; say($x~'a'), what's the output? "a\n" or "Object()\n"?
 171       If it's "a\n", what kind of undef is stored in $x?
 172
 173 Here is perhaps a decent compromise, based on the recent Num vs
 174 Numeric distinction.  If we assume a similar Str vs Stringy difference
 175 and give the ~ operators Stringy semantics, we could distinguish basic
 176 stringification from the more abstract ~ like so:
 177
 178     my $a;              # always stores Object()
 179
 180     say $a.perl         # produces 'Object'
 181
 182     say $a.Str          # stringifies to 'Object()'
 183     say $a              # uses .Str internaly, so also stringifies to 'Object()'
 184
 185     say $a.Stringy      # says '' with warning
 186     say ~$a             # says '' with warning
 187     say $a ~ 'tion'     # says 'tion' with warning
 188
 189 As with type objects, junctions also need different abstraction levels,
 190 so we could say that
 191
 192     any(1,2,3).Str      # 'any(1,2,3)'
 193     ~any(1,2,3)         # any('1','2','3')
 194
 195 Note that most methods would still autothread on junctions by virtue of
 196 not being defined in the junction class.
 197
 198 ----------------------
 199
 200 Pm-12: S05:2121 says that smartmatching against regex/token/rule
 201     automatically anchors the match at both ends.  What construct is
 202     actually responsible for performing the anchor checks?  Is it the
 203     smart match operator (infix:<~~>), the .ACCEPTS method on the
 204     regex/token/rule, an option/flag passed to the regex/token/rule,
 205     or something else?
 206
 207 S05:2114 says that these are of type Method, not Regex, and therefore
 208 it's probable that Method.ACCEPTS sets up the .parse with :p semantics
 209 and also checks that the returned Cursor ends at the end of the string.
 210
 211 Or maybe we add a flag to .parse that checks anchoring on both ends,
 212 if it seems generally useful.  That'd help with the / ^ a | b | c $/,
 213 if people instead got into the habit of saying mt/ a | b | c / or
 214 some such, where :t would mean "totally", or "token", or some such.
 215 (Unfortunately, most of the other letters are taken, like :a (for
 216 :all, but already means :accent), or would at least be confusing
 217 if overloaded.  For instance, we could use :x without an arg to mean
 218 match exactly, but :x(1) would mean something rather different.  mx//
 219 reads well though.  Could also make a case for overloading :p without
 220 an argument, but mp doesn't do much for me.)
 221
 222 Anyway, it's Method.ACCEPTS that makes the decision, I suspect,
 223 regardless of how it's implemented.  We can always switch to
 224 a .parse option later.
 225
 226 ----------------------
 227
 228 Pm-13:  What's the parent type of C<Regex>?  (I hope it's C<Method>.)
 229
 230 Don't see any reason why not, offhand, unless Regex is really a role
 231 that gets mixed into ordinary methods (hence we could mix it into
 232 such methods as the opp as well).
 233
 234 ----------------------
 235
 236 Pm-14:  Is the :c modifier allowed on token/regex/rule?  I.e., could
 237     someone do...?
 238         'abcdef' ~~ token :c(1) { cdef }
 239
 240 :c and :p are intended primarily as mods to the Cursor constructor, not to
 241 the matcher method, so I don't think so, unless we decide to support them
 242 anywhere within a regex, which seems relatively useless and confusing.  (Well,
 243 a better case can be made for internal :p(1) than for :c(1), in any case.)
 244
 245 ----------------------
 246
 247 Pm-15:  S11:300 gives the following as an example of importing
 248     a tagset into package scope:
 249
 250         require Sense :OUR<ALL> # but this works
 251
 252     Should this be :OUR(:ALL) instead?  It seems to me that
 253     :OUR<ALL> would attempt to import a &ALL symbol (since
 254     :MY<common> imports the &common symbol).
 255
 256 Yes, just a typo.
 257
 258 ----------------------
 259
 260 Pm-16:  S03:1996 looks like a fossil, or at least inconsistent
 261     with S07:72.  Any clarifications?
 262
 263 See r29571.
 264
 265     [Pm]  Okay, that helps, but there still seems to be an
 266     inconsistency.  r29571 says that list assignment is "mostly
 267     eager", but then says it evaluates leading iterators as long as
 268     they're known to be finite.  In other words, it suspends on the
 269     first iterator that is not provably finite.  But S07 says that
 270     "mostly eager" obtains all items until reaching a value that is
 271     known to be infinite.  "Not provably finite" is not the same as
 272     "known to be infinite"; the S03 interpretation would stop obtaining
 273     values on anything that _might_ be infinite, the S07 interpretation
 274     would eagerly obtain values until it reaches something that is
 275     _known_ to be (or declares itself to be) infinite.
 276
 277 "Mostly Eager" is now allowed to slack off after chewing on (or refusing
 278 to chew on) something that is 'not provably infinite'.
 279
 280 ----------------------
 281
 282 Pm-17:  Are the builtin types such as C<Num>, C<Int>, C<Rat>,
 283     C<Str>, C<List>, etc.  subclasses of C<Cool> or do they
 284     rely on C<Cool>'s method fallback mechanisms for the
 285     common methods?  (Or, another way of asking:  is
 286     C<List ~~ Cool> true?)
 287
 288 As it currently stands the fallback is only in the multi dispatcher.
 289 For single dispatch BuiltinType isa Cool.
 290
 291 ----------------------
 292
 293 Pm-18:  (Confirmation request)   S03:2111 indicates that the C< @(...) >
 294     sigil contextualizer is the same as the C<list> listop.  Is this correct?
 295     In particular, given C< my $a = [1,2,3]; >, then C< +@($a) >  is 1 and
 296     C< +@$a > is 3 ?   (For some reason I had been thinking that @($a) would
 297     act more like $a.list  than list($a).)
 298
 299 This part of the design seems rather confused in several dimensions.
 300 We will need to detangle some things.  First off, we will probably
 301 need to decouple the sigils from specific flattening/non-flattening
 302 behavior.  The @ sigil is probably going to mean some combination of
 303 Positional/Iterable, and not imply anything further such as .list.
 304 (Further context is supplied by how you use it.)  The @@ sigil as
 305 slice context is almost certainly going away just for being too ugly,
 306 and slices are also fundamentally ordered just like flat lists.
 307
 308 I think that @($a) and @$a must be made to mean the same thing.
 309 The form is fundamentally macroish, not function-callish.
 310 so @($a,$b) is not making a Capture out of $a and $b.  It does
 311 get a parcel, and whatever processes it treats that as an item,
 312 not as a list.  Parcel is iterable, so @ is basically a no-op.
 313 For @($a) or @$a, we have a degenerate parcel which would unwrap
 314 to $a in either case, so in both cases the make-iterable
 315 code sees $a direction, and not the parcel around it.
 316
 317 Similarly for any other coercion, we want to force macroization to
 318 treat both Foo(...) and (...).Foo the same, so ... should be
 319 treated as bare Parcel without any additional assumptions.
 320
 321 Therefore list($a) and $a.list should do the same thing, if indeed
 322 it's a coercion.  But again, that thing it's doing is something
 323 different from @.  We won't use the sigils to indicate the
 324 difference between flat/slice, just as we don't with lazy/eager/hyper.
 325 These contexts are looking more like dynamic variables or parameters
 326 internally, but attaching dynamic context to sigils is just as
 327 bad as trying to make other data structures intrinsically carry
 328 context they shouldn't know.
 329
 330 I'm trying to come up with a new slice notation that isn't sigil
 331 based.
 332
 333 ----------------------
 334
 335 Pm-19:  In each statement below, how many times is the block argument
 336     to .map() executed?  (Assume the block has arity/count of 1.)
 337
 338         my @b  = (1,2,3 Z 4,5,6).map({ ... });
 339         my @@c = (1,2,3 Z 4,5,6).map({ ... });
 340
 341         my ($x, $y, @@z) = (1,2,3 Z 4,5,6).map({ ... });
 342
 343 Iterating subsignatures need a rethink, as discussed on IRC.  If people
 344 expect a map block of -> $x { $x } to flatten, then $x is being
 345 bound in the variadic/flattening part of a bind, not in the positional,
 346 which defaults to getobj.  So either we force people to write
 347
 348     -> *$x, *$y {...}
 349
 350 (ick, doesn't extend to $^x or $_ easily) or we find some way
 351 of flipping the default binding of some block sigs to being slurpy
 352 scalars that use .get by default, and then use whatever slice notation
 353 we come up with to change the default back to .getobj.  So there's
 354 some hidden transmogrifier that map/grep use on their block parameter
 355 to make its sig behave, or they don't call the block directly, but
 356 rely on something else to do the transformation to a subpattern match.
 357
 358 The former approach:
 359
 360     sub map (&block, \$orig) {
 361         my &flatblock := &block.default_to_slurpy;
 362         my $cursor = $orig.get_iter_cursor;
 363         gather loop {
 364             take &flatblock($cursor, $newcursor) E last;
 365             $cursor = $newcursor; # or some such
 366         }
 367     }
 368
 369 where .default_to_slurpy takes the sig of -> $x and turns it into -> *$x,
 370 and get_iter_cursor gets a thread of pattern-matching iteration from the original
 371 capture.  (Which we don't strictly have to keep around here, since there's
 372 no nextsame, but still.)
 373
 374 The latter approach would be more like:
 375
 376     sub map (&block, \$orig) {
 377         my $cursor = $orig.get_iter_cursor;
 378         gather loop {
 379             take $cursor.apply(&block) E last;
 380         }
 381     }
 382
 383 where .apply would both mutate the block's sig semantics somehow and also
 384 mutate the cursor so we don't have to track it.
 385
 386 Update: more recent discussion on IRC suggests that the instruction for whether
 387 to anchor the sig match at the end should not come in either of those ways,
 388 but as a flag within the cursor itself: whereas a normal capture always
 389 anchors, a CaptureCursor has a flag that can go either way, but is set by
 390 map/grep to not anchor, but just advance the pointer.  (Or possibly they never
 391 anchor, and you have to convert back to a Capture, if we want it type based.)
 392 I will assume there's a Capture method, .get_unanchored_cursor for the nonce,
 393 which does the right thing.
 394
 395 With a mutable cursor, the notation is rather simple, (but the semantics
 396 are more problematic):
 397
 398     sub map (&block, \$cap) {
 399         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 400         gather loop {
 401             take block(|$cursor) E last;
 402         }
 403     }
 404
 405 (That's assuming that |$cursor does the right thing there, and doesn't
 406 convert back to Capture.)
 407
 408 Alternately, we could use immutable cursors, but then we need to tap in
 409 at a lower level for the invocation, in order to return both a new capture
 410 and a result parcel, something like:
 411
 412     sub map (&block, \$cap) {
 413         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 414         gather loop {
 415             my ($newcursor, $parcel) := INVOKE_WITH_CURSOR(&block, $cursor);
 416             take |$parcel E last;
 417         }
 418     }
 419
 420 Or maybe the last goes on the invoke:
 421
 422     sub map (&block, \$cap) {
 423         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 424         gather loop {
 425             my ($newcursor, $parcel) := INVOKE_WITH_CURSOR(&block, $cursor) // last;
 426             take |$parcel;
 427         }
 428     }
 429
 430 The immutable form is more amenable to invocation that has to track
 431 how many elements were consumed (like 'reduce' maybe?), since you
 432 could just compare the position of old cursor with the new, without
 433 having to remember the old position specially.  (I'm imagining cursors
 434 as lightweight objects pointing into the original capture here, not
 435 as heavy capture copies, much as a grammar parser doesn't pass the
 436 original text around, but just a match position into the original.)
 437
 438 Multiple return values make it harder to do fancy junctional tricks
 439 on Z without temporaries, though.  Self-mutating cursors jangle my
 440 FP nerves, but maybe I can argue myself into liking them from an
 441 OO perspective.  Maybe a mutable cursor is just a container of
 442 the current immutable cursor, and we could perhaps have it both ways,
 443 modulo yet another level of indirection.  Performance suffers, maybe.
 444
 445 ----------------------
 446
 447 Pm-21:  Does a regex in a grammar definition "hides" an outer-scoped
 448     lexical regex of the same name?  (I'm guessing "yes" but want
 449     to confirm.)
 450
 451         my regex abc { ... };
 452
 453         grammar XYZ {
 454             regex TOP { <abc> };
 455             regex abc { ... };
 456         }
 457
 458         # XYZ.TOP ends up calling the XYZ.abc, and not &abc, yes?
 459
 460 Pm-21 and Pm-22 have been abandoned; regexes outside of grammars
 461 are expected to make use of an anonymous Grammar slang that is
 462 updated with the new regexes.
 463
 464 Pm-22:  What about something like ... ?
 465
 466         my regex alpha { <[A..Z]> }
 467
 468         grammar XYZ {
 469             regex TOP { <alpha> };   # uses the lexical &alpha?
 470         }
 471
 472 Pm-21 and Pm-22 have been abandoned; regexes outside of grammars
 473 are expected to make use of an anonymous Grammar slang that is
 474 updated with the new regexes.
 475
 476 ----------