Fix spelling error in docs.
[PostgreSQL.git] / doc / src / sgml / xml2.sgml
blobe1641e8815b1ef6d90c5e3ae4036c9452934ed1b
1 <!-- $PostgreSQL$ -->
3 <sect1 id="xml2">
4 <title>xml2</title>
6 <indexterm zone="xml2">
7 <primary>xml2</primary>
8 </indexterm>
10 <para>
11 The <filename>xml2</> module provides XPath querying and
12 XSLT functionality.
13 </para>
15 <sect2>
16 <title>Deprecation notice</title>
18 <para>
19 From <productname>PostgreSQL</> 8.3 on, there is XML-related
20 functionality based on the SQL/XML standard in the core server.
21 That functionality covers XML syntax checking and XPath queries,
22 which is what this module does, and more, but the API is
23 not at all compatible. It is planned that this module will be
24 removed in PostgreSQL 8.4 in favor of the newer standard API, so
25 you are encouraged to try converting your applications. If you
26 find that some of the functionality of this module is not
27 available in an adequate form with the newer API, please explain
28 your issue to pgsql-hackers@postgresql.org so that the deficiency
29 can be addressed.
30 </para>
31 </sect2>
33 <sect2>
34 <title>Description of functions</title>
36 <para>
37 These functions provide straightforward XML parsing and XPath queries.
38 All arguments are of type <type>text</>, so for brevity that is not shown.
39 </para>
41 <table>
42 <title>Functions</title>
43 <tgroup cols="2">
44 <tbody>
45 <row>
46 <entry>
47 <synopsis>
48 xml_is_well_formed(document) returns bool
49 </synopsis>
50 </entry>
51 <entry>
52 <para>
53 This parses the document text in its parameter and returns true if the
54 document is well-formed XML. (Note: before PostgreSQL 8.2, this
55 function was called <function>xml_valid()</>. That is the wrong name
56 since validity and well-formedness have different meanings in XML.
57 The old name is still available, but is deprecated.)
58 </para>
59 </entry>
60 </row>
61 <row>
62 <entry>
63 <synopsis>
64 xpath_string(document,query) returns text
65 xpath_number(document,query) returns float4
66 xpath_bool(document,query) returns bool
67 </synopsis>
68 </entry>
69 <entry>
70 <para>
71 These functions evaluate the XPath query on the supplied document, and
72 cast the result to the specified type.
73 </para>
74 </entry>
75 </row>
76 <row>
77 <entry>
78 <synopsis>
79 xpath_nodeset(document,query,toptag,itemtag) returns text
80 </synopsis>
81 </entry>
82 <entry>
83 <para>
84 This evaluates query on document and wraps the result in XML tags. If
85 the result is multivalued, the output will look like:
86 </para>
87 <literal>
88 &lt;toptag&gt;
89 &lt;itemtag&gt;Value 1 which could be an XML fragment&lt;/itemtag&gt;
90 &lt;itemtag&gt;Value 2....&lt;/itemtag&gt;
91 &lt;/toptag&gt;
92 </literal>
93 <para>
94 If either toptag or itemtag is an empty string, the relevant tag is omitted.
95 </para>
96 </entry>
97 </row>
98 <row>
99 <entry>
100 <synopsis>
101 xpath_nodeset(document,query) returns text
102 </synopsis>
103 </entry>
104 <entry>
105 <para>
106 Like xpath_nodeset(document,query,toptag,itemtag) but result omits both tags.
107 </para>
108 </entry>
109 </row>
110 <row>
111 <entry>
112 <synopsis>
113 xpath_nodeset(document,query,itemtag) returns text
114 </synopsis>
115 </entry>
116 <entry>
117 <para>
118 Like xpath_nodeset(document,query,toptag,itemtag) but result omits toptag.
119 </para>
120 </entry>
121 </row>
122 <row>
123 <entry>
124 <synopsis>
125 xpath_list(document,query,separator) returns text
126 </synopsis>
127 </entry>
128 <entry>
129 <para>
130 This function returns multiple values separated by the specified
131 separator, for example <literal>Value 1,Value 2,Value 3</> if
132 separator is <literal>,</>.
133 </para>
134 </entry>
135 </row>
136 <row>
137 <entry>
138 <synopsis>
139 xpath_list(document,query) returns text
140 </synopsis>
141 </entry>
142 <entry>
143 This is a wrapper for the above function that uses <literal>,</>
144 as the separator.
145 </entry>
146 </row>
147 </tbody>
148 </tgroup>
149 </table>
150 </sect2>
152 <sect2>
153 <title><literal>xpath_table</literal></title>
155 <synopsis>
156 xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record
157 </synopsis>
159 <para>
160 <function>xpath_table</> is a table function that evaluates a set of XPath
161 queries on each of a set of documents and returns the results as a
162 table. The primary key field from the original document table is returned
163 as the first column of the result so that the result set
164 can readily be used in joins.
165 </para>
167 <table>
168 <title>Parameters</title>
169 <tgroup cols="2">
170 <tbody>
171 <row>
172 <entry><parameter>key</parameter></entry>
173 <entry>
174 <para>
175 the name of the <quote>key</> field &mdash; this is just a field to be used as
176 the first column of the output table, i.e. it identifies the record from
177 which each output row came (see note below about multiple values)
178 </para>
179 </entry>
180 </row>
181 <row>
182 <entry><parameter>document</parameter></entry>
183 <entry>
184 <para>
185 the name of the field containing the XML document
186 </para>
187 </entry>
188 </row>
189 <row>
190 <entry><parameter>relation</parameter></entry>
191 <entry>
192 <para>
193 the name of the table or view containing the documents
194 </para>
195 </entry>
196 </row>
197 <row>
198 <entry><parameter>xpaths</parameter></entry>
199 <entry>
200 <para>
201 one or more XPath expressions, separated by <literal>|</literal>
202 </para>
203 </entry>
204 </row>
205 <row>
206 <entry><parameter>criteria</parameter></entry>
207 <entry>
208 <para>
209 the contents of the WHERE clause. This cannot be omitted, so use
210 <literal>true</literal> or <literal>1=1</literal> if you want to
211 process all the rows in the relation
212 </para>
213 </entry>
214 </row>
215 </tbody>
216 </tgroup>
217 </table>
219 <para>
220 These parameters (except the XPath strings) are just substituted
221 into a plain SQL SELECT statement, so you have some flexibility &mdash; the
222 statement is
223 </para>
225 <para>
226 <literal>
227 SELECT &lt;key&gt;, &lt;document&gt; FROM &lt;relation&gt; WHERE &lt;criteria&gt;
228 </literal>
229 </para>
231 <para>
232 so those parameters can be <emphasis>anything</> valid in those particular
233 locations. The result from this SELECT needs to return exactly two
234 columns (which it will unless you try to list multiple fields for key
235 or document). Beware that this simplistic approach requires that you
236 validate any user-supplied values to avoid SQL injection attacks.
237 </para>
239 <para>
240 The function has to be used in a <literal>FROM</> expression, with an
241 <literal>AS</> clause to specify the output columns; for example
242 </para>
244 <programlisting>
245 SELECT * FROM
246 xpath_table('article_id',
247 'article_xml',
248 'articles',
249 '/article/author|/article/pages|/article/title',
250 'date_entered > ''2003-01-01'' ')
251 AS t(article_id integer, author text, page_count integer, title text);
252 </programlisting>
254 <para>
255 The <literal>AS</> clause defines the names and types of the columns in the
256 output table. The first is the <quote>key</> field and the rest correspond
257 to the XPath queries.
258 If there are more XPath queries than result columns,
259 the extra queries will be ignored. If there are more result columns
260 than XPath queries, the extra columns will be NULL.
261 </para>
263 <para>
264 Notice that this example defines the <structname>page_count</> result
265 column as an integer. The function deals internally with string
266 representations, so when you say you want an integer in the output, it will
267 take the string representation of the XPath result and use PostgreSQL input
268 functions to transform it into an integer (or whatever type the <type>AS</>
269 clause requests). An error will result if it can't do this &mdash; for
270 example if the result is empty &mdash; so you may wish to just stick to
271 <type>text</> as the column type if you think your data has any problems.
272 </para>
274 <para>
275 The calling <command>SELECT</> statement doesn't necessarily have be
276 be just <literal>SELECT *</> &mdash; it can reference the output
277 columns by name or join them to other tables. The function produces a
278 virtual table with which you can perform any operation you wish (e.g.
279 aggregation, joining, sorting etc). So we could also have:
280 </para>
282 <programlisting>
283 SELECT t.title, p.fullname, p.email
284 FROM xpath_table('article_id', 'article_xml', 'articles',
285 '/article/title|/article/author/@id',
286 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
287 AS t(article_id integer, title text, author_id integer),
288 tblPeopleInfo AS p
289 WHERE t.author_id = p.person_id;
290 </programlisting>
292 <para>
293 as a more complicated example. Of course, you could wrap all
294 of this in a view for convenience.
295 </para>
297 <sect3>
298 <title>Multivalued results</title>
300 <para>
301 The <function>xpath_table</> function assumes that the results of each XPath query
302 might be multi-valued, so the number of rows returned by the function
303 may not be the same as the number of input documents. The first row
304 returned contains the first result from each query, the second row the
305 second result from each query. If one of the queries has fewer values
306 than the others, NULLs will be returned instead.
307 </para>
309 <para>
310 In some cases, a user will know that a given XPath query will return
311 only a single result (perhaps a unique document identifier) &mdash; if used
312 alongside an XPath query returning multiple results, the single-valued
313 result will appear only on the first row of the result. The solution
314 to this is to use the key field as part of a join against a simpler
315 XPath query. As an example:
316 </para>
318 <programlisting>
319 CREATE TABLE test (
320 id int4 NOT NULL,
321 xml text,
322 CONSTRAINT pk PRIMARY KEY (id)
325 INSERT INTO test VALUES (1, '&lt;doc num="C1"&gt;
326 &lt;line num="L1"&gt;&lt;a&gt;1&lt;/a&gt;&lt;b&gt;2&lt;/b&gt;&lt;c&gt;3&lt;/c&gt;&lt;/line&gt;
327 &lt;line num="L2"&gt;&lt;a&gt;11&lt;/a&gt;&lt;b&gt;22&lt;/b&gt;&lt;c&gt;33&lt;/c&gt;&lt;/line&gt;
328 &lt;/doc&gt;');
330 INSERT INTO test VALUES (2, '&lt;doc num="C2"&gt;
331 &lt;line num="L1"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
332 &lt;line num="L2"&gt;&lt;a&gt;111&lt;/a&gt;&lt;b&gt;222&lt;/b&gt;&lt;c&gt;333&lt;/c&gt;&lt;/line&gt;
333 &lt;/doc&gt;');
335 SELECT * FROM
336 xpath_table('id','xml','test',
337 '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
338 'true')
339 AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4, val2 int4, val3 int4)
340 WHERE id = 1 ORDER BY doc_num, line_num
342 id | doc_num | line_num | val1 | val2 | val3
343 ----+---------+----------+------+------+------
344 1 | C1 | L1 | 1 | 2 | 3
345 1 | | L2 | 11 | 22 | 33
346 </programlisting>
348 <para>
349 To get doc_num on every line, the solution is to use two invocations
350 of xpath_table and join the results:
351 </para>
353 <programlisting>
354 SELECT t.*,i.doc_num FROM
355 xpath_table('id', 'xml', 'test',
356 '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
357 'true')
358 AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
359 xpath_table('id', 'xml', 'test', '/doc/@num', 'true')
360 AS i(id int4, doc_num varchar(10))
361 WHERE i.id=t.id AND i.id=1
362 ORDER BY doc_num, line_num;
364 id | line_num | val1 | val2 | val3 | doc_num
365 ----+----------+------+------+------+---------
366 1 | L1 | 1 | 2 | 3 | C1
367 1 | L2 | 11 | 22 | 33 | C1
368 (2 rows)
369 </programlisting>
370 </sect3>
371 </sect2>
373 <sect2>
374 <title>XSLT functions</title>
376 <para>
377 The following functions are available if libxslt is installed:
378 </para>
380 <sect3>
381 <title><literal>xslt_process</literal></title>
383 <synopsis>
384 xslt_process(text document, text stylesheet, text paramlist) returns text
385 </synopsis>
387 <para>
388 This function appplies the XSL stylesheet to the document and returns
389 the transformed result. The paramlist is a list of parameter
390 assignments to be used in the transformation, specified in the form
391 <literal>a=1,b=2</>. Note that the
392 parameter parsing is very simple-minded: parameter values cannot
393 contain commas!
394 </para>
396 <para>
397 Also note that if either the document or stylesheet values do not
398 begin with a &lt; then they will be treated as URLs and libxslt will
399 fetch them. It follows that you can use <function>xslt_process</> as a
400 means to fetch the contents of URLs &mdash; you should be aware of the
401 security implications of this.
402 </para>
404 <para>
405 There is also a two-parameter version of <function>xslt_process</> which
406 does not pass any parameters to the transformation.
407 </para>
408 </sect3>
409 </sect2>
411 <sect2>
412 <title>Author</title>
414 <para>
415 John Gray <email>jgray@azuli.co.uk</email>
416 </para>
418 <para>
419 Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com).
420 It has the same BSD licence as PostgreSQL.
421 </para>
422 </sect2>
424 </sect1>