misc/pm.txt

   1 This file contains miscellaneous questions about design+spec
   2 that Pm has come up with that are awaiting answers from TimToady
   3 and/or others.  We'll likely record the answers here as well.
   4
   5 Unanswered questions:
   6
   7 ===========================================================
   8
   9 Answered questions:
  10
  11
  12 Pm-1:  In STD.pm, what is the semantic or key difference between <noun>
  13     and <term>?
  14
  15 None, they are now unified under <term>
  16
  17 ----------------------
  18
  19 Pm-2:  Are calls to subrules in other grammars still valid as
  20     C<<  / abc <OtherGrammar::xyz> def / >> ?  If so, then
  21     for the invocation of the subrule, do we construct a new
  22     cursor of type OtherGrammar and invoke the 'xyz' method on it?
  23     (Pm's preference is "yes" and "yes", but want confirmation.)
  24
  25 Yes, and yes.  That's essentially what STD is already doing for
  26 Regexen and such.
  27
  28 ----------------------
  29
  30 Pm-3:  When we generate a metaop, where does it live?  Lexical?
  31     Package?  If package, then what package?
  32
  33 Good question, if CORE is immutable, we can't add to it.  Probably
  34 UNIT, though that might prevent sharing of common definitions
  35 among different compilation units.  On the other hand, being sure
  36 you're based on the same underlying semantics is difficult anyway
  37 without a lot of lifting, so maybe UNIT is good enough for now.
  38
  39 ----------------------
  40
  41 Pm-4:  The C<.ast> method on a Match object returns the matched
  42     text if no abstract object has been set.  Is there (or should
  43     there be) a method to determine if an abstract object has been
  44     set?  (Currently I'm using C<.peek_ast> in nqp-rx for this.)
  45
  46 Uh, surely .ast is false if there isn't one.
  47
  48 [S05:2434 r28936 currently says that .ast returns the matched text,
  49 I expect this to change shortly.  --Pm]
  50
  51 ----------------------
  52
  53 Pm-5:  How "read-only" are subroutine parameters?  For example, given
  54     a subroutine like  C<<  sub abc($x) { ... }  >>, I know we can't
  55     assign to C<$x>, but can it be rebound using &infix:<:=> in the
  56     body of the sub?
  57
  58 That seems okay to me, since the intent is to not modify the
  59 passed in object, and the rebinding effectively anonymizes the
  60 argument that was bound.
  61
  62 ----------------------
  63
  64 Pm-6:  Is there a syntax that allows a (trusted) routine to access
  65     the private attributes of another object without going through
  66     an accessor?  For example, if object $b has a private attribute
  67     of "$!xyz", is there a syntax for me to get to that attribute?
  68
  69 See r28932.
  70
  71 [Answer:  as of r28932, C<< $b!SomeClass::xyz >>.  --Pm]
  72
  73 ----------------------
  74
  75 Pm-7:  S05 says that a match's reduction object is given by the
  76     C<:action> parameter, STD.pm's initparse has C<:actions> and
  77     $*ACTIONS.  Should we pick one and stick with it?  PGE and
  78     Rakudo have traditionally used C<:action> -- if there's to
  79     be a change, now would be a good time for it.  :-)
  80     (After looking at the way I typically use it, I'm leaning
  81     towards the plural form. --Pm)
  82
  83 Now pluralized in specland.
  84
  85 ----------------------
  86
  87 Pm-8: Are closures embedded in regexes creating a new lexical scope,
  88     or do they share the same scope as the regex block itself?
  89     (Currently I'm assuming they create a new scope, to be consistent
  90     with other uses of curlies. --Pm
  91
  92 Yes.
  93
  94 ----------------------
  95
  96 Pm-9: Inside of a regex, what happens with C<< <[z..a]> >> ?  Is it
  97     a compile-time error, an empty range, or ... ?
  98
  99 Let's make it a compile-time error for direct code, and a failure
 100 with warning for code compiled indirectly via <$code>.
 101
 102 ----------------------
 103
 104 Pm-10:  Subs are canonically considered to be stored in symbol
 105     tables (lexpads, namespaces) with the & sigil.  Is the same
 106     true for methods?  If we ask a method for its name or otherwise
 107     obtain a list of method names from an object, would we expect
 108     those names to have a & sigil as well?  (For HLL interop reasons
 109     Pm tends to want methods to not include a & sigil, but it's not
 110     a strong tendency.)
 111
 112 Whether there's the & sigil or not depends on where you're storing
 113 the name.  The MOP doesn't keep the sigil, but if a method is declared
 114 "my" or "our", the alias in the symbol table does have the &, since as
 115 far as the symbol table is concerned, the method is just a subroutine.
 116 (Also note that, while the MOP doesn't track the sigil on methods,
 117 it probably does track the sigil on attributes, such as $!foo, &!foo.)
 118 One could view exportation of methods as multi as a two-step process; first,
 119 declare my or our to make the & alias in the current lexpad, then
 120 "is export" does no more magic than it ordinarily does.  I suppose
 121 method aliases are always considered multi when they show up in a
 122 symbol table.
 123
 124 ----------------------
 125
 126 Kh-1: Is it true that Bool::True.name should work?  Should it return
 127     the string 'True'?
 128     I ask because of http://rt.perl.org/rt3/Public/Bug/Display.html?id=66576
 129
 130 Yes, because Bool is an object type, so it knows its .name.  (A native
 131 bool would not know its name, according to S12:1649.)
 132
 133 ----------------------
 134
 135 Jw-1: What is the difference between :(\$x) and :($x is ref)? Both
 136 would need to leave the original argument untouched. Since C<is ref>
 137 is meant to be used for e.g. map, which needs to work on lists like
 138 (1,2,3), it can't cut constants out, so it seems no more constraining
 139 than :(\$x) too. Should one of them go away?
 140
 141 No difference that I can see offhand.  I'm inclined to make 'is ref' go away
 142 and keep the backslashed form.
 143
 144 ----------------------
 145
 146 Pm-11: S11:257 says "Without an import list, C<import> imports
 147     the C<:DEFAULT> imports."  How does one import one of the
 148     other tagsets?  (I think I'm missing something obvious here.)
 149
 150 I believe the idea is simply to use the tagset pair directly as an argument.
 151 And since lexical importation is assumed, there's an implicit :MY() around it.
 152 So 'use Foo :tag' is probably short for 'use Foo :MY(:tag)'.
 153
 154 ----------------------
 155
 156 Ml-1: In my $x; say($x~'a'), what's the output? "a\n" or "Object()\n"?
 157       If it's "a\n", what kind of undef is stored in $x?
 158
 159 Here is perhaps a decent compromise, based on the recent Num vs
 160 Numeric distinction.  If we assume a similar Str vs Stringy difference
 161 and give the ~ operators Stringy semantics, we could distinguish basic
 162 stringification from the more abstract ~ like so:
 163
 164     my $a;              # always stores Object()
 165
 166     say $a.perl         # produces 'Object'
 167
 168     say $a.Str          # stringifies to 'Object()'
 169     say $a              # uses .Str internaly, so also stringifies to 'Object()'
 170
 171     say $a.Stringy      # says '' with warning
 172     say ~$a             # says '' with warning
 173     say $a ~ 'tion'     # says 'tion' with warning
 174
 175 As with type objects, junctions also need different abstraction levels,
 176 so we could say that
 177
 178     any(1,2,3).Str      # 'any(1,2,3)'
 179     ~any(1,2,3)         # any('1','2','3')
 180
 181 Note that most methods would still autothread on junctions by virtue of
 182 not being defined in the junction class.
 183
 184 ----------------------
 185
 186 Pm-12: S05:2121 says that smartmatching against regex/token/rule
 187     automatically anchors the match at both ends.  What construct is
 188     actually responsible for performing the anchor checks?  Is it the
 189     smart match operator (infix:<~~>), the .ACCEPTS method on the
 190     regex/token/rule, an option/flag passed to the regex/token/rule,
 191     or something else?
 192
 193 S05:2114 says that these are of type Method, not Regex, and therefore
 194 it's probable that Method.ACCEPTS sets up the .parse with :p semantics
 195 and also checks that the returned Cursor ends at the end of the string.
 196
 197 Or maybe we add a flag to .parse that checks anchoring on both ends,
 198 if it seems generally useful.  That'd help with the / ^ a | b | c $/,
 199 if people instead got into the habit of saying mt/ a | b | c / or
 200 some such, where :t would mean "totally", or "token", or some such.
 201 (Unfortunately, most of the other letters are taken, like :a (for
 202 :all, but already means :accent), or would at least be confusing
 203 if overloaded.  For instance, we could use :x without an arg to mean
 204 match exactly, but :x(1) would mean something rather different.  mx//
 205 reads well though.  Could also make a case for overloading :p without
 206 an argument, but mp doesn't do much for me.)
 207
 208 Anyway, it's Method.ACCEPTS that makes the decision, I suspect,
 209 regardless of how it's implemented.  We can always switch to
 210 a .parse option later.
 211
 212 ----------------------
 213
 214 Pm-13:  What's the parent type of C<Regex>?  (I hope it's C<Method>.)
 215
 216 Don't see any reason why not, offhand, unless Regex is really a role
 217 that gets mixed into ordinary methods (hence we could mix it into
 218 such methods as the opp as well).
 219
 220 ----------------------
 221
 222 Pm-14:  Is the :c modifier allowed on token/regex/rule?  I.e., could
 223     someone do...?
 224         'abcdef' ~~ token :c(1) { cdef }
 225
 226 :c and :p are intended primarily as mods to the Cursor constructor, not to
 227 the matcher method, so I don't think so, unless we decide to support them
 228 anywhere within a regex, which seems relatively useless and confusing.  (Well,
 229 a better case can be made for internal :p(1) than for :c(1), in any case.)
 230
 231 ----------------------
 232
 233 Pm-15:  S11:300 gives the following as an example of importing
 234     a tagset into package scope:
 235
 236         require Sense :OUR<ALL> # but this works
 237
 238     Should this be :OUR(:ALL) instead?  It seems to me that
 239     :OUR<ALL> would attempt to import a &ALL symbol (since
 240     :MY<common> imports the &common symbol).
 241
 242 Yes, just a typo.
 243
 244 ----------------------
 245
 246 Pm-16:  S03:1996 looks like a fossil, or at least inconsistent
 247     with S07:72.  Any clarifications?
 248
 249 See r29571.
 250
 251     [Pm]  Okay, that helps, but there still seems to be an
 252     inconsistency.  r29571 says that list assignment is "mostly
 253     eager", but then says it evaluates leading iterators as long as
 254     they're known to be finite.  In other words, it suspends on the
 255     first iterator that is not provably finite.  But S07 says that
 256     "mostly eager" obtains all items until reaching a value that is
 257     known to be infinite.  "Not provably finite" is not the same as
 258     "known to be infinite"; the S03 interpretation would stop obtaining
 259     values on anything that _might_ be infinite, the S07 interpretation
 260     would eagerly obtain values until it reaches something that is
 261     _known_ to be (or declares itself to be) infinite.
 262
 263 "Mostly Eager" is now allowed to slack off after chewing on (or refusing
 264 to chew on) something that is 'not provably infinite'.
 265
 266 ----------------------
 267
 268 Pm-17:  Are the builtin types such as C<Num>, C<Int>, C<Rat>,
 269     C<Str>, C<List>, etc.  subclasses of C<Cool> or do they
 270     rely on C<Cool>'s method fallback mechanisms for the
 271     common methods?  (Or, another way of asking:  is
 272     C<List ~~ Cool> true?)
 273
 274 As it currently stands the fallback is only in the multi dispatcher.
 275 For single dispatch BuiltinType isa Cool.
 276
 277 ----------------------
 278
 279 Pm-18:  (Confirmation request)   S03:2111 indicates that the C< @(...) >
 280     sigil contextualizer is the same as the C<list> listop.  Is this correct?
 281     In particular, given C< my $a = [1,2,3]; >, then C< +@($a) >  is 1 and
 282     C< +@$a > is 3 ?   (For some reason I had been thinking that @($a) would
 283     act more like $a.list  than list($a).)
 284
 285 This part of the design seems rather confused in several dimensions.
 286 We will need to detangle some things.  First off, we will probably
 287 need to decouple the sigils from specific flattening/non-flattening
 288 behavior.  The @ sigil is probably going to mean some combination of
 289 Positional/Iterable, and not imply anything further such as .list.
 290 (Further context is supplied by how you use it.)  The @@ sigil as
 291 slice context is almost certainly going away just for being too ugly,
 292 and slices are also fundamentally ordered just like flat lists.
 293
 294 I think that @($a) and @$a must be made to mean the same thing.
 295 The form is fundamentally macroish, not function-callish.
 296 so @($a,$b) is not making a Capture out of $a and $b.  It does
 297 get a parcel, and whatever processes it treats that as an item,
 298 not as a list.  Parcel is iterable, so @ is basically a no-op.
 299 For @($a) or @$a, we have a degenerate parcel which would unwrap
 300 to $a in either case, so in both cases the make-iterable
 301 code sees $a direction, and not the parcel around it.
 302
 303 Similarly for any other coercion, we want to force macroization to
 304 treat both Foo(...) and (...).Foo the same, so ... should be
 305 treated as bare Parcel without any additional assumptions.
 306
 307 Therefore list($a) and $a.list should do the same thing, if indeed
 308 it's a coercion.  But again, that thing it's doing is something
 309 different from @.  We won't use the sigils to indicate the
 310 difference between flat/slice, just as we don't with lazy/eager/hyper.
 311 These contexts are looking more like dynamic variables or parameters
 312 internally, but attaching dynamic context to sigils is just as
 313 bad as trying to make other data structures intrinsically carry
 314 context they shouldn't know.
 315
 316 I'm trying to come up with a new slice notation that isn't sigil
 317 based.
 318
 319 ----------------------
 320
 321 Pm-19:  In each statement below, how many times is the block argument
 322     to .map() executed?  (Assume the block has arity/count of 1.)
 323
 324         my @b  = (1,2,3 Z 4,5,6).map({ ... });
 325         my @@c = (1,2,3 Z 4,5,6).map({ ... });
 326
 327         my ($x, $y, @@z) = (1,2,3 Z 4,5,6).map({ ... });
 328
 329 Iterating subsignatures need a rethink, as discussed on IRC.  If people
 330 expect a map block of -> $x { $x } to flatten, then $x is being
 331 bound in the variadic/flattening part of a bind, not in the positional,
 332 which defaults to getobj.  So either we force people to write
 333
 334     -> *$x, *$y {...}
 335
 336 (ick, doesn't extend to $^x or $_ easily) or we find some way
 337 of flipping the default binding of some block sigs to being slurpy
 338 scalars that use .get by default, and then use whatever slice notation
 339 we come up with to change the default back to .getobj.  So there's
 340 some hidden transmogrifier that map/grep use on their block parameter
 341 to make its sig behave, or they don't call the block directly, but
 342 rely on something else to do the transformation to a subpattern match.
 343
 344 The former approach:
 345
 346     sub map (&block, \$orig) {
 347         my &flatblock := &block.default_to_slurpy;
 348         my $cursor = $orig.get_iter_cursor;
 349         gather loop {
 350             take &flatblock($cursor, $newcursor) E last;
 351             $cursor = $newcursor; # or some such
 352         }
 353     }
 354
 355 where .default_to_slurpy takes the sig of -> $x and turns it into -> *$x,
 356 and get_iter_cursor gets a thread of pattern-matching iteration from the original
 357 capture.  (Which we don't strictly have to keep around here, since there's
 358 no nextsame, but still.)
 359
 360 The latter approach would be more like:
 361
 362     sub map (&block, \$orig) {
 363         my $cursor = $orig.get_iter_cursor;
 364         gather loop {
 365             take $cursor.apply(&block) E last;
 366         }
 367     }
 368
 369 where .apply would both mutate the block's sig semantics somehow and also
 370 mutate the cursor so we don't have to track it.
 371
 372 Update: more recent discussion on IRC suggests that the instruction for whether
 373 to anchor the sig match at the end should not come in either of those ways,
 374 but as a flag within the cursor itself: whereas a normal capture always
 375 anchors, a CaptureCursor has a flag that can go either way, but is set by
 376 map/grep to not anchor, but just advance the pointer.  (Or possibly they never
 377 anchor, and you have to convert back to a Capture, if we want it type based.)
 378 I will assume there's a Capture method, .get_unanchored_cursor for the nonce,
 379 which does the right thing.
 380
 381 With a mutable cursor, the notation is rather simple, (but the semantics
 382 are more problematic):
 383
 384     sub map (&block, \$cap) {
 385         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 386         gather loop {
 387             take block(|$cursor) E last;
 388         }
 389     }
 390
 391 (That's assuming that |$cursor does the right thing there, and doesn't
 392 convert back to Capture.)
 393
 394 Alternately, we could use immutable cursors, but then we need to tap in
 395 at a lower level for the invocation, in order to return both a new capture
 396 and a result parcel, something like:
 397
 398     sub map (&block, \$cap) {
 399         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 400         gather loop {
 401             my ($newcursor, $parcel) := INVOKE_WITH_CURSOR(&block, $cursor);
 402             take |$parcel E last;
 403         }
 404     }
 405
 406 Or maybe the last goes on the invoke:
 407
 408     sub map (&block, \$cap) {
 409         my CaptureCursor $cursor = $cap.get_unanchored_cursor;
 410         gather loop {
 411             my ($newcursor, $parcel) := INVOKE_WITH_CURSOR(&block, $cursor) // last;
 412             take |$parcel;
 413         }
 414     }
 415
 416 The immutable form is more amenable to invocation that has to track
 417 how many elements were consumed (like 'reduce' maybe?), since you
 418 could just compare the position of old cursor with the new, without
 419 having to remember the old position specially.  (I'm imagining cursors
 420 as lightweight objects pointing into the original capture here, not
 421 as heavy capture copies, much as a grammar parser doesn't pass the
 422 original text around, but just a match position into the original.)
 423
 424 Multiple return values make it harder to do fancy junctional tricks
 425 on Z without temporaries, though.  Self-mutating cursors jangle my
 426 FP nerves, but maybe I can argue myself into liking them from an
 427 OO perspective.  Maybe a mutable cursor is just a container of
 428 the current immutable cursor, and we could perhaps have it both ways,
 429 modulo yet another level of indirection.  Performance suffers, maybe.
 430
 431 ----------------------
 432