misc/pm.txt

   1 This file contains miscellaneous questions about design+spec
   2 that Pm has come up with that are awaiting answers from TimToady
   3 and/or others.  We'll likely record the answers here as well.
   4
   5 Unanswered questions:
   6
   7 Pm-20:  What's the result of something like...?
   8             my $x = [<a b c d>];  'a b c d' ~~ / $x /;
   9
  10 ===========================================================
  11
  12 Answered questions:
  13
  14
  15 Pm-1:  In STD.pm, what is the semantic or key difference between <noun>
  16     and <term>?
  17
  18 None, they are now unified under <term>
  19
  20 ----------------------
  21
  22 Pm-2:  Are calls to subrules in other grammars still valid as
  23     C<<  / abc <OtherGrammar::xyz> def / >> ?  If so, then
  24     for the invocation of the subrule, do we construct a new
  25     cursor of type OtherGrammar and invoke the 'xyz' method on it?
  26     (Pm's preference is "yes" and "yes", but want confirmation.)
  27
  28 Yes, and yes.  That's essentially what STD is already doing for
  29 Regexen and such.
  30
  31 ----------------------
  32
  33 Pm-3:  When we generate a metaop, where does it live?  Lexical?
  34     Package?  If package, then what package?
  35
  36 Good question, if CORE is immutable, we can't add to it.  Probably
  37 UNIT, though that might prevent sharing of common definitions
  38 among different compilation units.  On the other hand, being sure
  39 you're based on the same underlying semantics is difficult anyway
  40 without a lot of lifting, so maybe UNIT is good enough for now.
  41
  42 ----------------------
  43
  44 Pm-4:  The C<.ast> method on a Match object returns the matched
  45     text if no abstract object has been set.  Is there (or should
  46     there be) a method to determine if an abstract object has been
  47     set?  (Currently I'm using C<.peek_ast> in nqp-rx for this.)
  48
  49 Uh, surely .ast is false if there isn't one.
  50
  51 [S05:2434 r28936 currently says that .ast returns the matched text,
  52 I expect this to change shortly.  --Pm]
  53
  54 ----------------------
  55
  56 Pm-5:  How "read-only" are subroutine parameters?  For example, given
  57     a subroutine like  C<<  sub abc($x) { ... }  >>, I know we can't
  58     assign to C<$x>, but can it be rebound using &infix:<:=> in the
  59     body of the sub?
  60
  61 That seems okay to me, since the intent is to not modify the
  62 passed in object, and the rebinding effectively anonymizes the
  63 argument that was bound.
  64
  65 ----------------------
  66
  67 Pm-6:  Is there a syntax that allows a (trusted) routine to access
  68     the private attributes of another object without going through
  69     an accessor?  For example, if object $b has a private attribute
  70     of "$!xyz", is there a syntax for me to get to that attribute?
  71
  72 See r28932.
  73
  74 [Answer:  as of r28932, C<< $b!SomeClass::xyz >>.  --Pm]
  75
  76 ----------------------
  77
  78 Pm-7:  S05 says that a match's reduction object is given by the
  79     C<:action> parameter, STD.pm's initparse has C<:actions> and
  80     $*ACTIONS.  Should we pick one and stick with it?  PGE and
  81     Rakudo have traditionally used C<:action> -- if there's to
  82     be a change, now would be a good time for it.  :-)
  83     (After looking at the way I typically use it, I'm leaning
  84     towards the plural form. --Pm)
  85
  86 Now pluralized in specland.
  87
  88 ----------------------
  89
  90 Pm-8: Are closures embedded in regexes creating a new lexical scope,
  91     or do they share the same scope as the regex block itself?
  92     (Currently I'm assuming they create a new scope, to be consistent
  93     with other uses of curlies. --Pm
  94
  95 Yes.
  96
  97 ----------------------
  98
  99 Pm-9: Inside of a regex, what happens with C<< <[z..a]> >> ?  Is it
 100     a compile-time error, an empty range, or ... ?
 101
 102 Let's make it a compile-time error for direct code, and a failure
 103 with warning for code compiled indirectly via <$code>.
 104
 105 ----------------------
 106
 107 Pm-10:  Subs are canonically considered to be stored in symbol
 108     tables (lexpads, namespaces) with the & sigil.  Is the same
 109     true for methods?  If we ask a method for its name or otherwise
 110     obtain a list of method names from an object, would we expect
 111     those names to have a & sigil as well?  (For HLL interop reasons
 112     Pm tends to want methods to not include a & sigil, but it's not
 113     a strong tendency.)
 114
 115 Whether there's the & sigil or not depends on where you're storing
 116 the name.  The MOP doesn't keep the sigil, but if a method is declared
 117 "my" or "our", the alias in the symbol table does have the &, since as
 118 far as the symbol table is concerned, the method is just a subroutine.
 119 (Also note that, while the MOP doesn't track the sigil on methods,
 120 it probably does track the sigil on attributes, such as $!foo, &!foo.)
 121 One could view exportation of methods as multi as a two-step process; first,
 122 declare my or our to make the & alias in the current lexpad, then
 123 "is export" does no more magic than it ordinarily does.  I suppose
 124 method aliases are always considered multi when they show up in a
 125 symbol table.
 126
 127 ----------------------
 128
 129 Kh-1: Is it true that Bool::True.name should work?  Should it return
 130     the string 'True'?
 131     I ask because of http://rt.perl.org/rt3/Public/Bug/Display.html?id=66576
 132
 133 Yes, because Bool is an object type, so it knows its .name.  (A native
 134 bool would not know its name, according to S12:1649.)
 135
 136 ----------------------
 137
 138 Jw-1: What is the difference between :(\$x) and :($x is ref)? Both
 139 would need to leave the original argument untouched. Since C<is ref>
 140 is meant to be used for e.g. map, which needs to work on lists like
 141 (1,2,3), it can't cut constants out, so it seems no more constraining
 142 than :(\$x) too. Should one of them go away?
 143
 144 No difference that I can see offhand.  I'm inclined to make 'is ref' go away
 145 and keep the backslashed form.
 146
 147 ----------------------
 148
 149 Pm-11: S11:257 says "Without an import list, C<import> imports
 150     the C<:DEFAULT> imports."  How does one import one of the
 151     other tagsets?  (I think I'm missing something obvious here.)
 152
 153 I believe the idea is simply to use the tagset pair directly as an argument.
 154 And since lexical importation is assumed, there's an implicit :MY() around it.
 155 So 'use Foo :tag' is probably short for 'use Foo :MY(:tag)'.
 156
 157 ----------------------
 158
 159 Ml-1: In my $x; say($x~'a'), what's the output? "a\n" or "Object()\n"?
 160       If it's "a\n", what kind of undef is stored in $x?
 161
 162 Here is perhaps a decent compromise, based on the recent Num vs
 163 Numeric distinction.  If we assume a similar Str vs Stringy difference
 164 and give the ~ operators Stringy semantics, we could distinguish basic
 165 stringification from the more abstract ~ like so:
 166
 167     my $a;              # always stores Object()
 168
 169     say $a.perl         # produces 'Object'
 170
 171     say $a.Str          # stringifies to 'Object()'
 172     say $a              # uses .Str internaly, so also stringifies to 'Object()'
 173
 174     say $a.Stringy      # says '' with warning
 175     say ~$a             # says '' with warning
 176     say $a ~ 'tion'     # says 'tion' with warning
 177
 178 As with type objects, junctions also need different abstraction levels,
 179 so we could say that
 180
 181     any(1,2,3).Str      # 'any(1,2,3)'
 182     ~any(1,2,3)         # any('1','2','3')
 183
 184 Note that most methods would still autothread on junctions by virtue of
 185 not being defined in the junction class.
 186
 187 ----------------------
 188
 189 Pm-12: S05:2121 says that smartmatching against regex/token/rule
 190     automatically anchors the match at both ends.  What construct is
 191     actually responsible for performing the anchor checks?  Is it the
 192     smart match operator (infix:<~~>), the .ACCEPTS method on the
 193     regex/token/rule, an option/flag passed to the regex/token/rule,
 194     or something else?
 195
 196 S05:2114 says that these are of type Method, not Regex, and therefore
 197 it's probable that Method.ACCEPTS sets up the .parse with :p semantics
 198 and also checks that the returned Cursor ends at the end of the string.
 199
 200 Or maybe we add a flag to .parse that checks anchoring on both ends,
 201 if it seems generally useful.  That'd help with the / ^ a | b | c $/,
 202 if people instead got into the habit of saying mt/ a | b | c / or
 203 some such, where :t would mean "totally", or "token", or some such.
 204 (Unfortunately, most of the other letters are taken, like :a (for
 205 :all, but already means :accent), or would at least be confusing
 206 if overloaded.  For instance, we could use :x without an arg to mean
 207 match exactly, but :x(1) would mean something rather different.  mx//
 208 reads well though.  Could also make a case for overloading :p without
 209 an argument, but mp doesn't do much for me.)
 210
 211 Anyway, it's Method.ACCEPTS that makes the decision, I suspect,
 212 regardless of how it's implemented.  We can always switch to
 213 a .parse option later.
 214
 215 ----------------------
 216
 217 Pm-13:  What's the parent type of C<Regex>?  (I hope it's C<Method>.)
 218
 219 Don't see any reason why not, offhand, unless Regex is really a role
 220 that gets mixed into ordinary methods (hence we could mix it into
 221 such methods as the opp as well).
 222
 223 ----------------------
 224
 225 Pm-14:  Is the :c modifier allowed on token/regex/rule?  I.e., could
 226     someone do...?
 227         'abcdef' ~~ token :c(1) { cdef }
 228
 229 :c and :p are intended primarily as mods to the Cursor constructor, not to
 230 the matcher method, so I don't think so, unless we decide to support them
 231 anywhere within a regex, which seems relatively useless and confusing.  (Well,
 232 a better case can be made for internal :p(1) than for :c(1), in any case.)
 233
 234 ----------------------
 235
 236 Pm-15:  S11:300 gives the following as an example of importing
 237     a tagset into package scope:
 238
 239         require Sense :OUR<ALL> # but this works
 240
 241     Should this be :OUR(:ALL) instead?  It seems to me that
 242     :OUR<ALL> would attempt to import a &ALL symbol (since
 243     :MY<common> imports the &common symbol).
 244
 245 Yes, just a typo.
 246
 247 ----------------------
 248
 249 Pm-16:  S03:1996 looks like a fossil, or at least inconsistent
 250     with S07:72.  Any clarifications?
 251
 252 See r29571.
 253
 254     [Pm]  Okay, that helps, but there still seems to be an
 255     inconsistency.  r29571 says that list assignment is "mostly
 256     eager", but then says it evaluates leading iterators as long as
 257     they're known to be finite.  In other words, it suspends on the
 258     first iterator that is not provably finite.  But S07 says that
 259     "mostly eager" obtains all items until reaching a value that is
 260     known to be infinite.  "Not provably finite" is not the same as
 261     "known to be infinite"; the S03 interpretation would stop obtaining
 262     values on anything that _might_ be infinite, the S07 interpretation
 263     would eagerly obtain values until it reaches something that is
 264     _known_ to be (or declares itself to be) infinite.
 265
 266 "Mostly Eager" is now allowed to slack off after chewing on (or refusing
 267 to chew on) something that is 'not provably infinite'.
 268
 269 ----------------------
 270
 271 Pm-17:  Are the builtin types such as C<Num>, C<Int>, C<Rat>,
 272     C<Str>, C<List>, etc.  subclasses of C<Cool> or do they
 273     rely on C<Cool>'s method fallback mechanisms for the
 274     common methods?  (Or, another way of asking:  is
 275     C<List ~~ Cool> true?)
 276
 277 As it currently stands the fallback is only in the multi dispatcher.
 278 For single dispatch BuiltinType isa Cool.
 279
 280 ----------------------
 281
 282 Pm-18:  (Confirmation request)   S03:2111 indicates that the C< @(...) >
 283     sigil contextualizer is the same as the C<list> listop.  Is this correct?
 284     In particular, given C< my $a = [1,2,3]; >, then C< +@($a) >  is 1 and
 285     C< +@$a > is 3 ?   (For some reason I had been thinking that @($a) would
 286     act more like $a.list  than list($a).)
 287
 288 This part of the design seems rather confused in several dimensions.
 289 We will need to detangle some things.  First off, we will probably
 290 need to decouple the sigils from specific flattening/non-flattening
 291 behavior.  The @ sigil is probably going to mean some combination of
 292 Positional/Iterable, and not imply anything further such as .list.
 293 (Further context is supplied by how you use it.)  The @@ sigil as
 294 slice context is almost certainly going away just for being too ugly,
 295 and slices are also fundamentally ordered just like flat lists.
 296
 297 I think that @($a) and @$a must be made to mean the same thing.
 298 The form is fundamentally macroish, not function-callish.
 299 so @($a,$b) is not making a Capture out of $a and $b.  It does
 300 get a parcel, and whatever processes it treats that as an item,
 301 not as a list.  Parcel is iterable, so @ is basically a no-op.
 302 For @($a) or @$a, we have a degenerate parcel which would unwrap
 303 to $a in either case, so in both cases the make-iterable
 304 code sees $a direction, and not the parcel around it.
 305
 306 Similarly for any other coercion, we want to force macroization to
 307 treat both Foo(...) and (...).Foo the same, so ... should be
 308 treated as bare Parcel without any additional assumptions.
 309
 310 Therefore list($a) and $a.list should do the same thing, if indeed
 311 it's a coercion.  But again, that thing it's doing is something
 312 different from @.  We won't use the sigils to indicate the
 313 difference between flat/slice, just as we don't with lazy/eager/hyper.
 314 These contexts are looking more like dynamic variables or parameters
 315 internally, but attaching dynamic context to sigils is just as
 316 bad as trying to make other data structures intrinsically carry
 317 context they shouldn't know.
 318
 319 I'm trying to come up with a new slice notation that isn't sigil
 320 based.
 321
 322 ----------------------
 323
 324 Pm-19:  In each statement below, how many times is the block argument
 325     to .map() executed?  (Assume the block has arity/count of 1.)
 326
 327         my @b  = (1,2,3 Z 4,5,6).map({ ... });
 328         my @@c = (1,2,3 Z 4,5,6).map({ ... });
 329
 330         my ($x, $y, @@z) = (1,2,3 Z 4,5,6).map({ ... });
 331
 332 Iterating subsignatures need a rethink, as discussed on IRC.  If people
 333 expect a map block of -> $x { $x } to flatten, then $x is being
 334 bound in the variadic/flattening part of a bind, not in the positional,
 335 which defaults to getobj.  So either we force people to write
 336
 337     -> *$x, *$y {...}
 338
 339 (ick, doesn't extend to $^x or $_ easily) or we find some way
 340 of flipping the default binding of some block sigs to being slurpy
 341 scalars that use .get by default, and then use whatever slice notation
 342 we come up with to change the default back to .getobj.  So there's
 343 some hidden transmogrifier that map/grep use on their block parameter
 344 to make its sig behave, or they don't call the block directly, but
 345 rely on something else to do the transformation to a subpattern match.
 346
 347 The former approach:
 348
 349     sub map (&block, \$orig) {
 350         my &flatblock := &block.default_to_slurpy;
 351         my $cursor = $orig.get_iter_cursor;
 352         gather loop {
 353             take &flatblock($cursor, $newcursor) E last;
 354             $cursor = $newcursor; # or some such
 355         }
 356     }
 357
 358 where .default_to_slurpy takes the sig of -> $x and turns it into -> *$x,
 359 and get_iter_cursor gets a thread of pattern-matching iteration from the original
 360 capture.  (Which we don't strictly have to keep around here, since there's
 361 no nextsame, but still.)
 362
 363 The latter approach would be more like:
 364
 365     sub map (&block, \$orig) {
 366         my $cursor = $orig.get_iter_cursor;
 367         gather loop {
 368             take $cursor.apply(&block) E last;
 369         }
 370     }
 371
 372 where .apply would both mutate the block's sig semantics somehow and also
 373 mutate the cursor so we don't have to track it.
 374
 375 Update: more recent discussion on IRC suggests that the instruction for whether
 376 to anchor the sig match at the end should not come in either of those ways,
 377 but as a flag within the cursor itself: whereas a normal capture always
 378 anchors, a CaptureCursor has a flag that can go either way, but is set by
 379 map/grep to not anchor, but just advance the pointer.  (Or possibly they never
 380 anchor, and you have to convert back to a Capture, if we want it type based.)
 381 I will assume there's a Capture method, .get_unanchored_cursor for the nonce,
 382 which does the right thing.
 383
 384 With a mutable cursor, the notation is rather simple, (but the semantics
 385 are more problematic):
 386
 387     sub map (&block, \$cap) {
 388         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 389         gather loop {
 390             take block(|$cursor) E last;
 391         }
 392     }
 393
 394 (That's assuming that |$cursor does the right thing there, and doesn't
 395 convert back to Capture.)
 396
 397 Alternately, we could use immutable cursors, but then we need to tap in
 398 at a lower level for the invocation, in order to return both a new capture
 399 and a result parcel, something like:
 400
 401     sub map (&block, \$cap) {
 402         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 403         gather loop {
 404             my ($newcursor, $parcel) := INVOKE_WITH_CURSOR(&block, $cursor);
 405             take |$parcel E last;
 406         }
 407     }
 408
 409 Or maybe the last goes on the invoke:
 410
 411     sub map (&block, \$cap) {
 412         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 413         gather loop {
 414             my ($newcursor, $parcel) := INVOKE_WITH_CURSOR(&block, $cursor) // last;
 415             take |$parcel;
 416         }
 417     }
 418
 419 The immutable form is more amenable to invocation that has to track
 420 how many elements were consumed (like 'reduce' maybe?), since you
 421 could just compare the position of old cursor with the new, without
 422 having to remember the old position specially.  (I'm imagining cursors
 423 as lightweight objects pointing into the original capture here, not
 424 as heavy capture copies, much as a grammar parser doesn't pass the
 425 original text around, but just a match position into the original.)
 426
 427 Multiple return values make it harder to do fancy junctional tricks
 428 on Z without temporaries, though.  Self-mutating cursors jangle my
 429 FP nerves, but maybe I can argue myself into liking them from an
 430 OO perspective.  Maybe a mutable cursor is just a container of
 431 the current immutable cursor, and we could perhaps have it both ways,
 432 modulo yet another level of indirection.  Performance suffers, maybe.
 433
 434 ----------------------
 435