1 # XProc Workshop Minutes, Day 2, 8 February 2017
3 <a id="orgc454650"></a>
7 - Spec repo: <https://github.com/xproc/1.1-specification>
8 - Online specs: <http://spec.xproc.org/>
9 - Issues list: <https://github.com/xproc/1.1-specification/issues>
10 - Norm’s @depends proposal:
11 - <https://github.com/xproc/1.1-specification/pull/33>
12 - <http://xpspectest.nwalsh.com/depends/head/xproc11/>
13 - Norm’s proposal for issue #30:
14 - <https://github.com/xproc/1.1-specification/pull/32>
15 - <http://xpspectest.nwalsh.com/fix-30/head/xproc11/>
18 <a id="org9199c35"></a>
22 - Present: Norm, Romain, Achim, Christophe, Matthieu, David, Geert,
23 Gerrit, Martin, Henry, Liam
26 <a id="org84e4079"></a>
30 - Martin to write a proposal for @p:message and how to extend
31 the messaging to log levels and other features using possibly
32 p:pipeinfo or a new p:message element.
34 - Nic and Ari to discuss the status of the community group
35 and report back on xproc-dev by 15 Feb 2017.
37 - Norm to propose a template for the extension steps repo.
40 <a id="org4bbf545"></a>
42 ### Time-boxed review of this issue
44 - About requirements, not proposals
45 - How do documents figure into our relations with our XPath interface
46 - We need precision in the requirements
48 Henry begins by projecting a document:
49 <https://rawgit.com/xproc/Workshop-2017-02/master/docs/docReq.html>
51 Liam: XPath is very primary; but we’re talking about
52 doing other kinds of expressions. Yesterday,
53 I wondered if handles could be
54 functions. Now I’m wondering if the expression language
56 Henry: That’s been in-and-out. I think for 3.0, it’s
57 a big step to introduce, at the language level,
58 multiple expression languages.
59 Norm: I think it’s a question of how to mix the languages.
63 Observation: ./function() is conceptually the same as function($x)
65 1. The output of a step can go in a variable, $x.
66 2. I must be able to get the properties of the variable, $x.
67 3. $x must have identity
68 4. select=”p:document-properties($x)”
70 David: Why a document node, why not a string.
71 Norm: It could be a string.
72 Henry: The problem of collision is arises.
73 Norm: A string would work, but string-length($x) would be 5 or 19.
75 Norm: We could also just make it a view: an XPath document, a
76 JSON object, a text node, a hexBinary node.
78 Romain: We need access to the content from the empty document
80 Norm: So you want to be able to use the binary functions on
83 Henry: Stipulate the empty document node solution for a moment.
84 And the existance of a way to ask for the type of something.
85 If the answer isn’t XDM or JSON, you’d like to be able to say
86 coerce to XDM (string, hexBinary).
88 It would also be useful to be able to go the other way, coerce
89 a hexBinary into an image/jpeg.
91 Achim: if I get a text/plain document, do I get a string or
93 Romain: You get an empty document node, you can coerce it into
95 Achim: Users are going to find that confusing. But we can
96 tell users that they have to explicitly take the risk of attempting
97 to get a large amount of data.
98 Henry: We also talked about trying to put some logic into
101 Norm: We’re 50 minutes into the hour.
102 Henry: Here’s the procedural question: do we have enough of an
103 answer to proceed. It feels to me like we do.
104 Norm: I think we do. I propose that we proceed with some spec drafting
105 using the empty document node answer and see if it works or raises
106 other problems on closer inspection.
108 Any objections? None heard.
110 David: For me it’s hard to talk about coercing media types.
111 What it means is to process an entity in a different way.
113 Norm: It’s not really coercion; it’s changing the label.
114 David: If you have the load step and you say you want to
115 load it as text/plain. If you use an HTTP URI, then this would
116 mean that the implementation would send an Accept: header
118 Norm: Alternatively, you just treat the octet stream returned
120 David: The other way around is just a claim.
121 Norm: I agree, “coercion” isn’t the right word.
124 <a id="org92e6260"></a>
128 Matthieu: I’d like to be able to override steps on p:import,
129 is that something we could consider?
130 Norm: Yes, but I suggest we have an issue about that and have
131 the discussion when we’ve had a chance to think about it.
132 Gerrit: We solve this by dynamically generating the pipeline
133 and then importing that.
136 <a id="orgf4b87f0"></a>
138 ## Review of open issues
140 On reflection, trying to do this in the face-to-face seems
141 like it won’t end well.
144 <a id="org38db409"></a>
146 ### Proposals for resolving two issues
148 Norm: The depends attribute, PR #33.
149 Henry: Remind me, what’s it for?
150 Norm: It’s for the case where you have a step with a side-effect
151 which isn’t manifest in the data flow: don’t start step B until
152 step A has finished, irrespective of what you think you know.
153 … It’s been implemented and appears to be sufficient.
154 Liam: If you called it ‘wait-for’ instead of ‘depends’, then
155 it would have been clearer.
156 Henry: The simplicity of this depends on the answer to my
157 question about streaming tomorrow: no.
158 Christophe: I prefer ‘wait-for’.
159 Henry: I prefer ‘depends’ because it leaves open the question
160 of what exactly it means; we may need that space in the future.
162 Some discussion of the semantic variance between “depends” and
165 Liam: I don’t think we really have to find an answer here.
166 Norm: Let’s try taking this one to email.
168 Proposal: Merge the PR. Any objections?
172 Norm: Allow attribute value templates in extension attributes,
174 Achim: Some of the extension attributes are handled in static
175 analysis and some dynamically. It only makes sense for attributes
176 that are being evaluated dynamically.
177 Norm: What attributes are evaluated statically?
178 Achim: depends are extension attributes today.
179 Norm: Errmm…can we finesse this by saying that processors
180 are free to forbid AVTs in any extension attributes that they
181 wish? (Added comment.)
183 Some discussion of what the semantics of forbidding might mean.
184 Could mean curly braces not allowed, could mean not interprted
185 as an AVT. Up to the implementation.
187 Proposal: Merge the PR with that note, at editor’s discretion
193 <a id="orge9df7e6"></a>
195 ### Proposed list of issues that warrant face-to-face time:
197 **\*** How to improve debugging
200 - Norm: The observation that implementations should do better
201 with error messages is a point taken.
202 - Achim: What do I get if an error occurs is one question.
203 Another question is what do I get on p:catch? How good
204 is the error vocabulary. Today we only get the name of
205 the error. Maybe we could improve that.
206 - Norm: It’s very difficult to define error output. I think
207 a proposal in this area would be a very good thing.
208 - Achim: The question is how useful would this be to users?
209 - Martin: Sometimes it would be very useful. We’re currently
210 using Schematron that we extended with spans to handle
211 reporting where the error occurred.
212 - Achim: Step names or step types both seem like they’d be
213 valuable. Just more information.
214 - Norm: Someone should make a proposal.
215 - David: I find it hard to get the error messages at all.
216 It was never a problem to figure out where the error was,
217 but getting the message is hard.
218 - Norm: Yeah, my implementation sucks.
219 - David: A tutorial on p:log would be good.
220 - Norm: Yes, it’s a short tutorial but it would be good to have.
224 - Norm mutters on about @cx:message…proposes @p:message
225 to avoid stepping on the “message” option name.
226 - Martin: What about adding a terminate attribute?
227 - Gerrit: Isn’t this p:error?
228 - Norm: Do you want to do this conditionally?
229 - Henry: The advantage of having it as an attribute on
230 a step is that it’s much simpler vis-a-vis the plumbing.
231 - Norm: So a step with p:terminate on it runs the step
233 - Henry: I propose p:abort with a message.
234 - Romain: Can we add a severity to p:message so that it
235 works like proper logging?
236 - Gerrit: Maybe we can have p:error work like logging and
238 - Norm: Risking design on the fly: if p:message is a single
239 string then print it. If it’s a sequence of two strings
240 then the first string is a log level and the second string
241 is the message. What the processor does with the log
242 level is implementation defined.
243 - Achim: What about a function that returns the log level?
244 - Geert: I’ve been experimenting with overloaded steps.
245 We could put the message and other options in the step
246 content. That makes the mechanism extensible.
247 - Norm: Yes. We also have p:pipeinfo, you could use that.
248 - Henry: Maybe pipeinfo is a better way to do this altogether.
249 - Norm: I’m still attracted to @p:message for the simplicity
250 of printing a message, and maybe p:pipeinfo for log levels
252 - Romain: Or p:message element for more complicated messages.
253 - Norm: Right. So this wasn’t simple. I think we need a proposal.
255 ACTION: Martin to write a proposal
258 <a id="org0f8d682"></a>
260 ### Return to yesterday’s discussion for a moment
262 - Achim: There’s one more point in the specs dealing with XML
263 vs. non-XML documents that I find inconvenient. If I have a
264 heterogenous sequence, when this sequence hits a select
265 expression, an error is thrown.
267 <http://spec.xproc.org/master/head/xproc11/#err.inline.D1004>
269 …I’d prefer to ignore the select expression for non-XML.
270 - Henry: We now have so many things flowing through pipelines
271 that I think this kind of defaulting behavior will be
272 surprising. Note also that under Norm’s proposal, these
273 will be documents and you’ll get the empty sequence.
274 - Achim: Ah, yes. We’ll just have to live with it. Nevermind.
275 - Norm: We might be able to provide less verbose solutions
276 with, for example, a function that takes a sequence of
277 nodes and an XPath expression and does the right thing.
280 <a id="orgd9073e0"></a>
282 ### Reopening the question from this morning
284 - Henry: The fact that a sequence of pipe content types can be
285 heterogenous means we may need to think about new tooling.
286 Dispatching on type will be more common so we might like to make
290 <a id="orgf004310"></a>
292 ### Important workflows (for publishing use cases)
294 - Henry: I was just wondering if there were obvious mismatches
295 between the functionality of 1.0 (steps or architectural) from
296 the perspective of publishing workflows. If you regularly say
297 “oh rats, it’s that problem again”…
298 - Romain: In our publishing workflows, we’re dealing with file
299 sets. The way we do that is we have the in-memory documents
300 that can be processed by XProc and we have an XML representation
301 of the filesets. A directory structure, for example. A lot
302 of our steps have these two things as inputs and outputs.
303 We have to connect them explicitly everywhere.
305 - Romain: The in-memory documents and the fileset description.
306 It would be convenient if some (more) connections could be
307 implicity. If outputs of a particular type were automatically
308 connected to specific inputs, for example.
310 Some discussion of how a zip step is wrapped for this purpose.
312 - Romain: More implicit connections would be useful.
313 - Gerrit: So the connections are grouped automatically? You
314 want to have multiple primary ports? How are they connected?
315 - Norm mumbles something about using media types to connect
317 - Romain: There are other ideas here: both implicitly connecting
318 from preceding steps and for grouping connections together.
319 - Martin: So if you have a p:xslt step and it’s preceded by two
320 steps that produce XML and XSLT, they’d both be connected?
321 - Romain: If I have a step that produces HTML and binary images
322 and the following step receives an HTML document and binary
323 images, I want them to be connected implicitly.
324 - Norm: I think this is an interesting idea; but it’s complicated
325 and we need a proposal to review.
326 - Romain: That’s one thing that we do often. It depends on
327 how we define document sets.
328 - Henry: That (re)raises the question of whether we need the
329 concept of document sets. Whether this is the same as the
330 document collection idea or whether it’s more pipeline
331 appropriate, I’m not sure. But in any case, “in the publishing
332 workflow we often move document collections around” is worth
333 considering. That’s not something we directly support in XProc.
334 Wether the idea of document sets as I have them in my head from
335 the Markup Pipeline from 15 years ago is actually what’s needed,
336 I’m not sure. But thinking hard about sets is worth doing.
337 I don’t think it’s for 3.0. It’s too big a change. It raises
338 a whole bunch of questions. Whether you have to have steps
339 designed to work with document sets or whether there’s a story
340 about default plumbing is unclear.
341 - Romain: We might be able to leverage non-XML document ideas to
343 - Henry: If you want to change the composition of a set, you need
344 the whole set, and doing that on a sequence representation of
345 the set will be very un-intuitive.
346 … So we could imagine having document collections flowing
347 throw pipes, not sequences of documents but collections to which
348 you have random access.
349 - Norm: Uh, er, maybe. :-)
350 - Gerrit: It’s hard to say if there are other things because we
351 have worked around them. We have, for example, a catalog resolver
352 for non-XML types so that we can refer to fonts and things.
353 It’s an XProc step, we don’t need it anymore. We created
354 extension steps to do image-metadata extraction, resizing,
355 etc. Maybe one can have an EXProc steps at some point for
356 processing images. We have an extension step that does unzip
357 a bit differently than the proposal for pxp:unzip. It extracts
358 the whole archive to disk and then other steps are able to
359 work on them (on disk). This could be improved with the new
360 concept that you have binary data flowing through steps.
361 - Martin: What we currently use is a step called file-uri
362 to work with URIs and operating system paths.
363 - Gerrit: This is also encapsulated in a step.
364 - Romain: At the language level, there’s not much missing in XProc
365 3.0. There are a variety of utility steps that could be
367 - Henry: What about interfaces to databases?
368 - Gerrit: I once wrote an issue about whether it could be a
369 good idea to have implicit validation. Could you read
370 the xml-model PI and do the right thing. You’d also want
371 to have an easy way to prepend the PIs to documents.
372 And you’d want to have a way to produces SVRL report
373 for validated documents.
374 - Norm: So in addition to a general p:validate step, this
375 includes the idea of, for example, a @validate attribute
376 on p:output to say “validate any document with a
379 Some discussion of the xml-model PI:
380 <https://www.w3.org/TR/2011/NOTE-xml-model-20110811/>
382 - Jirka: It is now also an ISO standard.
384 Probably lunch. Probability, 100%.
387 <a id="orgb62541f"></a>
389 ### What’s the status of the W3C community group?
391 Nic Gibson joins us by telepresence.
393 - Norm: What about the community group?
394 - Nic: Ari and I both think it would be good if we tried to do something.
396 Ari in particular seems interested. The question is, what value is there
397 in having both a mailing list and a community group?
399 - Liam: I don’t think there mutually exclusive?
400 - Ari: That’s the question.
401 - Liam: There are a couple of lists.
402 - Norm: I think there’s only one, xproc-dev.
403 - Liam: I think the community group has a mailing list as well. I think
405 we should keep using the xproc-dev list.
407 - Norm: What’s the title of the community group these days?
408 - Nic: Data Pipeling Use Cases.
409 - Norm: Does keeping a moribund community group open help us, hurt us,
413 - Nic: The question is, do Ari or I (or anyone else) have time to make
417 - Norm: Is that something you can answer it now?
418 - Nic: I think it would be good to talk to Ari about it. And I have some
420 actions that I never got around to finish: mailing various lists and
421 anyone who’s ever posted to xproc-dev.
423 - Norm: You’d want to craft the message carefully.
424 - Nic: I recon that Ari and I can chat by the time the XML Prague conference
428 ACTION: Nic and Ari to discuss and report back on xproc-dev by next week.
430 - Norm: Anything else?
431 - Liam: At this point “no”, eventually we’ll want to see some sort of a draft
435 - Achim: The description of what the community group does is out of date
437 with respect to our current plans. We need to say that XProc is still
438 alive even if the working group no longer exists. We need to assert
439 that the community group is the center of XProc activities.
441 - Norm: That is an interesting point. It sounds like we need to rewrite
443 the description. Maybe change the name.
445 - Liam: I’m not sure if you can change the name of a community group.
446 - Norm: Ok. Let’s see what comes out of Nic and Ari come up with and the
451 <a id="orga191c4b"></a>
453 ### What’s our thinking on the “resource manager”?
455 - What are the *semantics* of pipelines?
456 - What are the lowest-level abstractions needed to
457 describe/discuss pipelines?
458 - There’s metadata flowing through the pipe on the one hand
459 and a resource manager for local copies of things being
460 fetched and stored. And then variables are right in the
463 On reflect, we all feel that we’ve covered these items sufficiently
464 earlier today or yesterday. Henry may come back with a simplified
465 proposal after further consideration.
468 <a id="org2df16fa"></a>
470 ### Any other business?
472 - Achim: We should discuss how we can encourage the community
473 to suggest new steps.
474 - Norm: Couldn’t we just use the ‘extensions’ XProc repo?
475 - Romain: Yes, and we could make a custom template for that.
477 ACTION: Norm to propose a template for the extension steps repo
479 - Norm: After we have the template, let’s avoid “blank page syndrome”
480 by populating the repo with the existing exproc steps. Maybe
481 then have the exproc.org redirect there.
482 - Romain: And then have PRs to add them to the step spec.
484 Some discussion of how to organize the specs and repos. Must have
485 a single entry point for the user.
487 - Achim: If we put the exproc.org steps there, we should all read
488 them again and see if we can clarify them. Norm and I have
489 interpeted some of them differently.
490 - Romain: Should the extension steps target XProc 3.0 or 1.0 or
492 - Achim: I think we should target 3.0.
494 - Romain: When is a step considered ready to be implemented?
495 And how can I tell if the implementation is conformant with
497 - Achim: We should have test cases.
498 - Romain: Something like semver perhaps.
499 - Norm: So if you want version 1.3.5 of a step, you look to see
500 if the implementor claims to support 1.3.5.
503 <a id="org7e13b28"></a>
508 - Henry: The best way this works in open source projects is BDFL.
509 - Achim: We should divide the work up, the test suite, the
510 documentation could be done separately.
511 - Norm: Achim and I seem to be signed up to do the spec editing.
512 - Achim: It would be nice to have one more editor who isn’t
514 - Norm: Henry, you’re the obvious candidate.
515 - Henry: I’ll say yes, but you have to tell me if I’m doing
517 - Achim: It would be nice to have a user.
518 - Norm: Yeah. You have someone in mind?
519 - Gerrit: I’ll work on editing too.
520 - Henry: This is the SGML working group model: editors, a core
521 group, and a broader group.
524 <a id="org77d4e4e"></a>
528 - A spec: Norm, Achim, Gerrit, Henry as time permits
529 - A test suite: David
530 - Step proposal curator: Geert
531 - Documentation: Christophe, Matthieu
534 <a id="org94e2716"></a>
538 - Monthly reports to xproc-dev on the second Tuesday of the
539 months starting on 14 March 2017.
542 <a id="orgb463c81"></a>
544 #### Where do we start?
546 - The documents at spec.xproc.org are the current head of development.
549 <a id="orgc57f83f"></a>
551 #### How long do we expect this to take?
553 - Goal: approaching functional completeness by XML Prague 2018
554 (Henry proposes beta release)
557 <a id="org0a0362b"></a>
562 - Henry: Having one at the beginning or end of the summer might
563 be useful, but we don’t know yet.
564 - Maybe XML Amsterdam?
567 <a id="org03972e9"></a>
571 - Thanks to Jirka, XML Prague, and the University of Economics.
574 <a id="org4816009"></a>