Minutes-for-day-two:-8-February-2017.md

   1 # XProc Workshop Minutes, Day 2, 8 February 2017
   2
   3 <a id="orgc454650"></a>
   4
   5 ## Background links:
   6
   7 -   Spec repo: <https://github.com/xproc/1.1-specification>
   8 -   Online specs: <http://spec.xproc.org/>
   9 -   Issues list: <https://github.com/xproc/1.1-specification/issues>
  10 -   Norm’s @depends proposal:
  11     -   <https://github.com/xproc/1.1-specification/pull/33>
  12     -   <http://xpspectest.nwalsh.com/depends/head/xproc11/>
  13 -   Norm’s proposal for issue #30:
  14     -   <https://github.com/xproc/1.1-specification/pull/32>
  15     -   <http://xpspectest.nwalsh.com/fix-30/head/xproc11/>
  16
  17
  18 <a id="org9199c35"></a>
  19
  20 ## Agenda / Minutes
  21
  22 -   Present: Norm, Romain, Achim, Christophe, Matthieu, David, Geert,
  23     Gerrit, Martin, Henry, Liam
  24
  25
  26 <a id="org84e4079"></a>
  27
  28 ## New Action Items
  29
  30 -   Martin to write a proposal for @p:message and how to extend
  31     the messaging to log levels and other features using possibly
  32     p:pipeinfo or a new p:message element.
  33
  34 -   Nic and Ari to discuss the status of the community group
  35     and report back on xproc-dev by 15 Feb 2017.
  36
  37 -   Norm to propose a template for the extension steps repo.
  38
  39
  40 <a id="org4bbf545"></a>
  41
  42 ### Time-boxed review of this issue
  43
  44 -   About requirements, not proposals
  45 -   How do documents figure into our relations with our XPath interface
  46 -   We need precision in the requirements
  47
  48 Henry begins by projecting a document:
  49 <https://rawgit.com/xproc/Workshop-2017-02/master/docs/docReq.html>
  50
  51 Liam: XPath is very primary; but we’re talking about
  52 doing other kinds of expressions. Yesterday,
  53 I wondered if handles could be
  54 functions. Now I’m wondering if the expression language
  55 could be pluggable.
  56 Henry: That’s been in-and-out. I think for 3.0, it’s
  57 a big step to introduce, at the language level,
  58 multiple expression languages.
  59 Norm: I think it’s a question of how to mix the languages.
  60
  61 Norm:
  62
  63 Observation: ./function() is conceptually the same as function($x)
  64
  65 1.  The output of a step can go in a variable, $x.
  66 2.  I must be able to get the properties of the variable, $x.
  67 3.  $x must have identity
  68 4.  select=”p:document-properties($x)”
  69
  70 David: Why a document node, why not a string.
  71 Norm: It could be a string.
  72 Henry: The problem of collision is arises.
  73 Norm: A string would work, but string-length($x) would be 5 or 19.
  74
  75 Norm: We could also just make it a view: an XPath document, a
  76 JSON object, a text node, a hexBinary node.
  77
  78 Romain: We need access to the content from the empty document
  79 node.
  80 Norm: So you want to be able to use the binary functions on
  81 the hexBinary.
  82 Romain: Yes.
  83 Henry: Stipulate the empty document node solution for a moment.
  84 And the existance of a way to ask for the type of something.
  85 If the answer isn’t XDM or JSON, you’d like to be able to say
  86 coerce to XDM (string, hexBinary).
  87
  88 It would also be useful to be able to go the other way, coerce
  89 a hexBinary into an image/jpeg.
  90
  91 Achim: if I get a text/plain document, do I get a string or
  92 a blob?
  93 Romain: You get an empty document node, you can coerce it into
  94 string.
  95 Achim: Users are going to find that confusing. But we can
  96 tell users that they have to explicitly take the risk of attempting
  97 to get a large amount of data.
  98 Henry: We also talked about trying to put some logic into
  99 p:load.
 100
 101 Norm: We’re 50 minutes into the hour.
 102 Henry: Here’s the procedural question: do we have enough of an
 103 answer to proceed. It feels to me like we do.
 104 Norm: I think we do. I propose that we proceed with some spec drafting
 105 using the empty document node answer and see if it works or raises
 106 other problems on closer inspection.
 107
 108 Any objections? None heard.
 109
 110 David: For me it’s hard to talk about coercing media types.
 111 What it means is to process an entity in a different way.
 112 It’s not coercion.
 113 Norm: It’s not really coercion; it’s changing the label.
 114 David: If you have the load step and you say you want to
 115 load it as text/plain. If you use an HTTP URI, then this would
 116 mean that the implementation would send an Accept: header
 117 of text/plain.
 118 Norm: Alternatively, you just treat the octet stream returned
 119 as text/plain.
 120 David: The other way around is just a claim.
 121 Norm: I agree, “coercion” isn’t the right word.
 122
 123
 124 <a id="org92e6260"></a>
 125
 126 ### Overrides?
 127
 128 Matthieu: I’d like to be able to override steps on p:import,
 129 is that something we could consider?
 130 Norm: Yes, but I suggest we have an issue about that and have
 131 the discussion when we’ve had a chance to think about it.
 132 Gerrit: We solve this by dynamically generating the pipeline
 133 and then importing that.
 134
 135
 136 <a id="orgf4b87f0"></a>
 137
 138 ## Review of open issues
 139
 140 On reflection, trying to do this in the face-to-face seems
 141 like it won’t end well.
 142
 143
 144 <a id="org38db409"></a>
 145
 146 ### Proposals for resolving two issues
 147
 148 Norm: The depends attribute, PR #33.
 149 Henry: Remind me, what’s it for?
 150 Norm: It’s for the case where you have a step with a side-effect
 151 which isn’t manifest in the data flow: don’t start step B until
 152 step A has finished, irrespective of what you think you know.
 153 … It’s been implemented and appears to be sufficient.
 154 Liam: If you called it ‘wait-for’ instead of ‘depends’, then
 155 it would have been clearer.
 156 Henry: The simplicity of this depends on the answer to my
 157 question about streaming tomorrow: no.
 158 Christophe: I prefer ‘wait-for’.
 159 Henry: I prefer ‘depends’ because it leaves open the question
 160 of what exactly it means; we may need that space in the future.
 161
 162 Some discussion of the semantic variance between “depends” and
 163 “wait-for”.
 164
 165 Liam: I don’t think we really have to find an answer here.
 166 Norm: Let’s try taking this one to email.
 167
 168 Proposal: Merge the PR. Any objections?
 169
 170 None heard.
 171
 172 Norm: Allow attribute value templates in extension attributes,
 173 PR #32.
 174 Achim: Some of the extension attributes are handled in static
 175 analysis and some dynamically. It only makes sense for attributes
 176 that are being evaluated dynamically.
 177 Norm: What attributes are evaluated statically?
 178 Achim: depends are extension attributes today.
 179 Norm: Errmm…can we finesse this by saying that processors
 180 are free to forbid AVTs in any extension attributes that they
 181 wish? (Added comment.)
 182
 183 Some discussion of what the semantics of forbidding might mean.
 184 Could mean curly braces not allowed, could mean not interprted
 185 as an AVT. Up to the implementation.
 186
 187 Proposal: Merge the PR with that note, at editor’s discretion
 188 Any objections?
 189
 190 None heard.
 191
 192
 193 <a id="orge9df7e6"></a>
 194
 195 ### Proposed list of issues that warrant face-to-face time:
 196
 197 **\*** How to improve debugging
 198    **\*\*** issue 18
 199
 200 -   Norm: The observation that implementations should do better
 201     with error messages is a point taken.
 202 -   Achim: What do I get if an error occurs is one question.
 203     Another question is what do I get on p:catch? How good
 204     is the error vocabulary. Today we only get the name of
 205     the error. Maybe we could improve that.
 206 -   Norm: It’s very difficult to define error output. I think
 207     a proposal in this area would be a very good thing.
 208 -   Achim: The question is how useful would this be to users?
 209 -   Martin: Sometimes it would be very useful. We’re currently
 210     using Schematron that we extended with spans to handle
 211     reporting where the error occurred.
 212 -   Achim: Step names or step types both seem like they’d be
 213     valuable. Just more information.
 214 -   Norm: Someone should make a proposal.
 215 -   David: I find it hard to get the error messages at all.
 216     It was never a problem to figure out where the error was,
 217     but getting the message is hard.
 218 -   Norm: Yeah, my implementation sucks.
 219 -   David: A tutorial on p:log would be good.
 220 -   Norm: Yes, it’s a short tutorial but it would be good to have.
 221
 222 **\*\*** @cx:message
 223
 224 -   Norm mutters on about @cx:message…proposes @p:message
 225     to avoid stepping on the “message” option name.
 226 -   Martin: What about adding a terminate attribute?
 227 -   Gerrit: Isn’t this p:error?
 228 -   Norm: Do you want to do this conditionally?
 229 -   Henry: The advantage of having it as an attribute on
 230     a step is that it’s much simpler vis-a-vis the plumbing.
 231 -   Norm: So a step with p:terminate on it runs the step
 232     and then aborts.
 233 -   Henry: I propose p:abort with a message.
 234 -   Romain: Can we add a severity to p:message so that it
 235     works like proper logging?
 236 -   Gerrit: Maybe we can have p:error work like logging and
 237     messaging.
 238 -   Norm: Risking design on the fly: if p:message is a single
 239     string then print it. If it’s a sequence of two strings
 240     then the first string is a log level and the second string
 241     is the message. What the processor does with the log
 242     level is implementation defined.
 243 -   Achim: What about a function that returns the log level?
 244 -   Geert: I’ve been experimenting with overloaded steps.
 245     We could put the message and other options in the step
 246     content. That makes the mechanism extensible.
 247 -   Norm: Yes. We also have p:pipeinfo, you could use that.
 248 -   Henry: Maybe pipeinfo is a better way to do this altogether.
 249 -   Norm: I’m still attracted to @p:message for the simplicity
 250     of printing a message, and maybe p:pipeinfo for log levels
 251     and such.
 252 -   Romain: Or p:message element for more complicated messages.
 253 -   Norm: Right. So this wasn’t simple. I think we need a proposal.
 254
 255     ACTION: Martin to write a proposal
 256
 257
 258 <a id="org0f8d682"></a>
 259
 260 ### Return to yesterday’s discussion for a moment
 261
 262 -   Achim: There’s one more point in the specs dealing with XML
 263     vs. non-XML documents that I find inconvenient. If I have a
 264     heterogenous sequence, when this sequence hits a select
 265     expression, an error is thrown.
 266
 267     <http://spec.xproc.org/master/head/xproc11/#err.inline.D1004>
 268
 269     …I’d prefer to ignore the select expression for non-XML.
 270 -   Henry: We now have so many things flowing through pipelines
 271     that I think this kind of defaulting behavior will be
 272     surprising. Note also that under Norm’s proposal, these
 273     will be documents and you’ll get the empty sequence.
 274 -   Achim: Ah, yes. We’ll just have to live with it. Nevermind.
 275 -   Norm: We might be able to provide less verbose solutions
 276     with, for example, a function that takes a sequence of
 277     nodes and an XPath expression and does the right thing.
 278
 279
 280 <a id="orgd9073e0"></a>
 281
 282 ### Reopening the question from this morning
 283
 284 -   Henry: The fact that a sequence of pipe content types can be
 285     heterogenous means we may need to think about new tooling.
 286     Dispatching on type will be more common so we might like to make
 287     that easy.
 288
 289
 290 <a id="orgf004310"></a>
 291
 292 ### Important workflows (for publishing use cases)
 293
 294 -   Henry: I was just wondering if there were obvious mismatches
 295     between the functionality of 1.0 (steps or architectural) from
 296     the perspective of publishing workflows. If you regularly say
 297     “oh rats, it’s that problem again”…
 298 -   Romain: In our publishing workflows, we’re dealing with file
 299     sets. The way we do that is we have the in-memory documents
 300     that can be processed by XProc and we have an XML representation
 301     of the filesets. A directory structure, for example. A lot
 302     of our steps have these two things as inputs and outputs.
 303     We have to connect them explicitly everywhere.
 304 -   Henry: Two things?
 305 -   Romain: The in-memory documents and the fileset description.
 306     It would be convenient if some (more) connections could be
 307     implicity. If outputs of a particular type were automatically
 308     connected to specific inputs, for example.
 309
 310 Some discussion of how a zip step is wrapped for this purpose.
 311
 312 -   Romain: More implicit connections would be useful.
 313 -   Gerrit: So the connections are grouped automatically? You
 314     want to have multiple primary ports? How are they connected?
 315 -   Norm mumbles something about using media types to connect
 316     ports.
 317 -   Romain: There are other ideas here: both implicitly connecting
 318     from preceding steps and for grouping connections together.
 319 -   Martin: So if you have a p:xslt step and it’s preceded by two
 320     steps that produce XML and XSLT, they’d both be connected?
 321 -   Romain: If I have a step that produces HTML and binary images
 322     and the following step receives an HTML document and binary
 323     images, I want them to be connected implicitly.
 324 -   Norm: I think this is an interesting idea; but it’s complicated
 325     and we need a proposal to review.
 326 -   Romain: That’s one thing that we do often. It depends on
 327     how we define document sets.
 328 -   Henry: That (re)raises the question of whether we need the
 329     concept of document sets. Whether this is the same as the
 330     document collection idea or whether it’s more pipeline
 331     appropriate, I’m not sure. But in any case, “in the publishing
 332     workflow we often move document collections around” is worth
 333     considering. That’s not something we directly support in XProc.
 334     Wether the idea of document sets as I have them in my head from
 335     the Markup Pipeline from 15 years ago is actually what’s needed,
 336     I’m not sure. But thinking hard about sets is worth doing.
 337     I don’t think it’s for 3.0. It’s too big a change. It raises
 338     a whole bunch of questions. Whether you have to have steps
 339     designed to work with document sets or whether there’s a story
 340     about default plumbing is unclear.
 341 -   Romain: We might be able to leverage non-XML document ideas to
 342     solve this.
 343 -   Henry: If you want to change the composition of a set, you need
 344     the whole set, and doing that on a sequence representation of
 345     the set will be very un-intuitive.
 346     … So we could imagine having document collections flowing
 347     throw pipes, not sequences of documents but collections to which
 348     you have random access.
 349 -   Norm: Uh, er, maybe. :-)
 350 -   Gerrit: It’s hard to say if there are other things because we
 351     have worked around them. We have, for example, a catalog resolver
 352     for non-XML types so that we can refer to fonts and things.
 353     It’s an XProc step, we don’t need it anymore. We created
 354     extension steps to do image-metadata extraction, resizing,
 355     etc. Maybe one can have an EXProc steps at some point for
 356     processing images. We have an extension step that does unzip
 357     a bit differently than the proposal for pxp:unzip. It extracts
 358     the whole archive to disk and then other steps are able to
 359     work on them (on disk). This could be improved with the new
 360     concept that you have binary data flowing through steps.
 361 -   Martin: What we currently use is a step called file-uri
 362     to work with URIs and operating system paths.
 363 -   Gerrit: This is also encapsulated in a step.
 364 -   Romain: At the language level, there’s not much missing in XProc
 365     3.0. There are a variety of utility steps that could be
 366     standardized or not.
 367 -   Henry: What about interfaces to databases?
 368 -   Gerrit: I once wrote an issue about whether it could be a
 369     good idea to have implicit validation. Could you read
 370     the xml-model PI and do the right thing. You’d also want
 371     to have an easy way to prepend the PIs to documents.
 372     And you’d want to have a way to produces SVRL report
 373     for validated documents.
 374 -   Norm: So in addition to a general p:validate step, this
 375     includes the idea of, for example, a @validate attribute
 376     on p:output to say “validate any document with a
 377     xml-model PI”.
 378
 379 Some discussion of the xml-model PI:
 380 <https://www.w3.org/TR/2011/NOTE-xml-model-20110811/>
 381
 382 -   Jirka: It is now also an ISO standard.
 383
 384 Probably lunch. Probability, 100%.
 385
 386
 387 <a id="orgb62541f"></a>
 388
 389 ### What’s the status of the W3C community group?
 390
 391 Nic Gibson joins us by telepresence.
 392
 393 -   Norm: What about the community group?
 394 -   Nic: Ari and I both think it would be good if we tried to do something.
 395
 396 Ari in particular seems interested. The question is, what value is there
 397 in having both a mailing list and a community group?
 398
 399 -   Liam: I don’t think there mutually exclusive?
 400 -   Ari: That’s the question.
 401 -   Liam: There are a couple of lists.
 402 -   Norm: I think there’s only one, xproc-dev.
 403 -   Liam: I think the community group has a mailing list as well. I think
 404
 405 we should keep using the xproc-dev list.
 406
 407 -   Norm: What’s the title of the community group these days?
 408 -   Nic: Data Pipeling Use Cases.
 409 -   Norm: Does keeping a moribund community group open help us, hurt us,
 410
 411 or is it neutral.
 412
 413 -   Nic: The question is, do Ari or I (or anyone else) have time to make
 414
 415 it not moribund.
 416
 417 -   Norm: Is that something you can answer it now?
 418 -   Nic: I think it would be good to talk to Ari about it. And I have some
 419
 420 actions that I never got around to finish: mailing various lists and
 421 anyone who’s ever posted to xproc-dev.
 422
 423 -   Norm: You’d want to craft the message carefully.
 424 -   Nic: I recon that Ari and I can chat by the time the XML Prague conference
 425
 426 is over.
 427
 428 ACTION: Nic and Ari to discuss and report back on xproc-dev by next week.
 429
 430 -   Norm: Anything else?
 431 -   Liam: At this point “no”, eventually we’ll want to see some sort of a draft
 432
 433 of use cases.
 434
 435 -   Achim: The description of what the community group does is out of date
 436
 437 with respect to our current plans. We need to say that XProc is still
 438 alive even if the working group no longer exists. We need to assert
 439 that the community group is the center of XProc activities.
 440
 441 -   Norm: That is an interesting point. It sounds like we need to rewrite
 442
 443 the description. Maybe change the name.
 444
 445 -   Liam: I’m not sure if you can change the name of a community group.
 446 -   Norm: Ok. Let’s see what comes out of Nic and Ari come up with and the
 447
 448 consider next steps.
 449
 450
 451 <a id="orga191c4b"></a>
 452
 453 ### What’s our thinking on the “resource manager”?
 454
 455 -   What are the *semantics* of pipelines?
 456     -   What are the lowest-level abstractions needed to
 457         describe/discuss pipelines?
 458 -   There’s metadata flowing through the pipe on the one hand
 459     and a resource manager for local copies of things being
 460     fetched and stored. And then variables are right in the
 461     middle.
 462
 463 On reflect, we all feel that we’ve covered these items sufficiently
 464 earlier today or yesterday. Henry may come back with a simplified
 465 proposal after further consideration.
 466
 467
 468 <a id="org2df16fa"></a>
 469
 470 ### Any other business?
 471
 472 -   Achim: We should discuss how we can encourage the community
 473     to suggest new steps.
 474 -   Norm: Couldn’t we just use the ‘extensions’ XProc repo?
 475 -   Romain: Yes, and we could make a custom template for that.
 476
 477 ACTION: Norm to propose a template for the extension steps repo
 478
 479 -   Norm: After we have the template, let’s avoid “blank page syndrome”
 480     by populating the repo with the existing exproc steps. Maybe
 481     then have the exproc.org redirect there.
 482 -   Romain: And then have PRs to add them to the step spec.
 483
 484 Some discussion of how to organize the specs and repos. Must have
 485 a single entry point for the user.
 486
 487 -   Achim: If we put the exproc.org steps there, we should all read
 488     them again and see if we can clarify them. Norm and I have
 489     interpeted some of them differently.
 490 -   Romain: Should the extension steps target XProc 3.0 or 1.0 or
 491     what?
 492 -   Achim: I think we should target 3.0.
 493 -   Norm: Yeah…
 494 -   Romain: When is a step considered ready to be implemented?
 495     And how can I tell if the implementation is conformant with
 496     the spec?
 497 -   Achim: We should have test cases.
 498 -   Romain: Something like semver perhaps.
 499 -   Norm: So if you want version 1.3.5 of a step, you look to see
 500     if the implementor claims to support 1.3.5.
 501
 502
 503 <a id="org7e13b28"></a>
 504
 505 ### Next steps
 506
 507 -   How do we do this?
 508 -   Henry: The best way this works in open source projects is BDFL.
 509 -   Achim: We should divide the work up, the test suite, the
 510     documentation could be done separately.
 511 -   Norm: Achim and I seem to be signed up to do the spec editing.
 512 -   Achim: It would be nice to have one more editor who isn’t
 513     an implementor.
 514 -   Norm: Henry, you’re the obvious candidate.
 515 -   Henry: I’ll say yes, but you have to tell me if I’m doing
 516     a good job.
 517 -   Achim: It would be nice to have a user.
 518 -   Norm: Yeah. You have someone in mind?
 519 -   Gerrit: I’ll work on editing too.
 520 -   Henry: This is the SGML working group model: editors, a core
 521     group, and a broader group.
 522
 523
 524 <a id="org77d4e4e"></a>
 525
 526 #### Work items
 527
 528 -   A spec: Norm, Achim, Gerrit, Henry as time permits
 529 -   A test suite: David
 530 -   Step proposal curator: Geert
 531 -   Documentation: Christophe, Matthieu
 532
 533
 534 <a id="org94e2716"></a>
 535
 536 #### Status updates
 537
 538 -   Monthly reports to xproc-dev on the second Tuesday of the
 539     months starting on 14 March 2017.
 540
 541
 542 <a id="orgb463c81"></a>
 543
 544 #### Where do we start?
 545
 546 -   The documents at spec.xproc.org are the current head of development.
 547
 548
 549 <a id="orgc57f83f"></a>
 550
 551 #### How long do we expect this to take?
 552
 553 -   Goal: approaching functional completeness by XML Prague 2018
 554     (Henry proposes beta release)
 555
 556
 557 <a id="org0a0362b"></a>
 558
 559 ### Next meeting?
 560
 561 -   XML Prague 2018?
 562 -   Henry: Having one at the beginning or end of the summer might
 563     be useful, but we don’t know yet.
 564 -   Maybe XML Amsterdam?
 565
 566
 567 <a id="org03972e9"></a>
 568
 569 ### Thank our hosts
 570
 571 -   Thanks to Jirka, XML Prague, and the University of Economics.
 572
 573
 574 <a id="org4816009"></a>
 575
 576 ### Adjourned
 577