Doc/library/xml.etree.elementtree.rst

   1
   2 :mod:`xml.etree.ElementTree` --- The ElementTree XML API
   3 ========================================================
   4
   5 .. module:: xml.etree.ElementTree
   6    :synopsis: Implementation of the ElementTree API.
   7 .. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
   8
   9
  10 .. versionadded:: 2.5
  11
  12 The Element type is a flexible container object, designed to store hierarchical
  13 data structures in memory. The type can be described as a cross between a list
  14 and a dictionary.
  15
  16 Each element has a number of properties associated with it:
  17
  18 * a tag which is a string identifying what kind of data this element represents
  19   (the element type, in other words).
  20
  21 * a number of attributes, stored in a Python dictionary.
  22
  23 * a text string.
  24
  25 * an optional tail string.
  26
  27 * a number of child elements, stored in a Python sequence
  28
  29 To create an element instance, use the Element or SubElement factory functions.
  30
  31 The :class:`ElementTree` class can be used to wrap an element structure, and
  32 convert it from and to XML.
  33
  34 A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
  35
  36 See http://effbot.org/zone/element-index.htm for tutorials and links to other
  37 docs. Fredrik Lundh's page is also the location of the development version of the
  38 xml.etree.ElementTree.
  39
  40 .. _elementtree-functions:
  41
  42 Functions
  43 ---------
  44
  45
  46 .. function:: Comment([text])
  47
  48    Comment element factory.  This factory function creates a special element that
  49    will be serialized as an XML comment. The comment string can be either an 8-bit
  50    ASCII string or a Unicode string. *text* is a string containing the comment
  51    string. Returns an element instance representing a comment.
  52
  53
  54 .. function:: dump(elem)
  55
  56    Writes an element tree or element structure to sys.stdout.  This function should
  57    be used for debugging only.
  58
  59    The exact output format is implementation dependent.  In this version, it's
  60    written as an ordinary XML file.
  61
  62    *elem* is an element tree or an individual element.
  63
  64
  65 .. function:: Element(tag[, attrib][, **extra])
  66
  67    Element factory.  This function returns an object implementing the standard
  68    Element interface.  The exact class or type of that object is implementation
  69    dependent, but it will always be compatible with the _ElementInterface class in
  70    this module.
  71
  72    The element name, attribute names, and attribute values can be either 8-bit
  73    ASCII strings or Unicode strings. *tag* is the element name. *attrib* is an
  74    optional dictionary, containing element attributes. *extra* contains additional
  75    attributes, given as keyword arguments. Returns an element instance.
  76
  77
  78 .. function:: fromstring(text)
  79
  80    Parses an XML section from a string constant.  Same as XML. *text* is a string
  81    containing XML data. Returns an Element instance.
  82
  83
  84 .. function:: iselement(element)
  85
  86    Checks if an object appears to be a valid element object. *element* is an
  87    element instance. Returns a true value if this is an element object.
  88
  89
  90 .. function:: iterparse(source[, events])
  91
  92    Parses an XML section into an element tree incrementally, and reports what's
  93    going on to the user. *source* is a filename or file object containing XML data.
  94    *events* is a list of events to report back.  If omitted, only "end" events are
  95    reported. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
  96
  97    .. note::
  98
  99       :func:`iterparse` only guarantees that it has seen the ">"
 100       character of a starting tag when it emits a "start" event, so the
 101       attributes are defined, but the contents of the text and tail attributes
 102       are undefined at that point.  The same applies to the element children;
 103       they may or may not be present.
 104
 105       If you need a fully populated element, look for "end" events instead.
 106
 107
 108 .. function:: parse(source[, parser])
 109
 110    Parses an XML section into an element tree. *source* is a filename or file
 111    object containing XML data. *parser* is an optional parser instance.  If not
 112    given, the standard XMLTreeBuilder parser is used. Returns an ElementTree
 113    instance.
 114
 115
 116 .. function:: ProcessingInstruction(target[, text])
 117
 118    PI element factory.  This factory function creates a special element that will
 119    be serialized as an XML processing instruction. *target* is a string containing
 120    the PI target. *text* is a string containing the PI contents, if given. Returns
 121    an element instance, representing a processing instruction.
 122
 123
 124 .. function:: SubElement(parent, tag[, attrib[,  **extra]])
 125
 126    Subelement factory.  This function creates an element instance, and appends it
 127    to an existing element.
 128
 129    The element name, attribute names, and attribute values can be either 8-bit
 130    ASCII strings or Unicode strings. *parent* is the parent element. *tag* is the
 131    subelement name. *attrib* is an optional dictionary, containing element
 132    attributes. *extra* contains additional attributes, given as keyword arguments.
 133    Returns an element instance.
 134
 135
 136 .. function:: tostring(element[, encoding])
 137
 138    Generates a string representation of an XML element, including all subelements.
 139    *element* is an Element instance. *encoding* is the output encoding (default is
 140    US-ASCII). Returns an encoded string containing the XML data.
 141
 142
 143 .. function:: XML(text)
 144
 145    Parses an XML section from a string constant.  This function can be used to
 146    embed "XML literals" in Python code. *text* is a string containing XML data.
 147    Returns an Element instance.
 148
 149
 150 .. function:: XMLID(text)
 151
 152    Parses an XML section from a string constant, and also returns a dictionary
 153    which maps from element id:s to elements. *text* is a string containing XML
 154    data. Returns a tuple containing an Element instance and a dictionary.
 155
 156
 157 .. _elementtree-element-interface:
 158
 159 The Element Interface
 160 ---------------------
 161
 162 Element objects returned by Element or SubElement have the  following methods
 163 and attributes.
 164
 165
 166 .. attribute:: Element.tag
 167
 168    A string identifying what kind of data this element represents (the element
 169    type, in other words).
 170
 171
 172 .. attribute:: Element.text
 173
 174    The *text* attribute can be used to hold additional data associated with the
 175    element. As the name implies this attribute is usually a string but may be any
 176    application-specific object. If the element is created from an XML file the
 177    attribute will contain any text found between the element tags.
 178
 179
 180 .. attribute:: Element.tail
 181
 182    The *tail* attribute can be used to hold additional data associated with the
 183    element. This attribute is usually a string but may be any application-specific
 184    object. If the element is created from an XML file the attribute will contain
 185    any text found after the element's end tag and before the next tag.
 186
 187
 188 .. attribute:: Element.attrib
 189
 190    A dictionary containing the element's attributes. Note that while the *attrib*
 191    value is always a real mutable Python dictionary, an ElementTree implementation
 192    may choose to use another internal representation, and create the dictionary
 193    only if someone asks for it. To take advantage of such implementations, use the
 194    dictionary methods below whenever possible.
 195
 196 The following dictionary-like methods work on the element attributes.
 197
 198
 199 .. method:: Element.clear()
 200
 201    Resets an element.  This function removes all subelements, clears all
 202    attributes, and sets the text and tail attributes to None.
 203
 204
 205 .. method:: Element.get(key[, default=None])
 206
 207    Gets the element attribute named *key*.
 208
 209    Returns the attribute value, or *default* if the attribute was not found.
 210
 211
 212 .. method:: Element.items()
 213
 214    Returns the element attributes as a sequence of (name, value) pairs. The
 215    attributes are returned in an arbitrary order.
 216
 217
 218 .. method:: Element.keys()
 219
 220    Returns the elements attribute names as a list. The names are returned in an
 221    arbitrary order.
 222
 223
 224 .. method:: Element.set(key, value)
 225
 226    Set the attribute *key* on the element to *value*.
 227
 228 The following methods work on the element's children (subelements).
 229
 230
 231 .. method:: Element.append(subelement)
 232
 233    Adds the element *subelement* to the end of this elements internal list of
 234    subelements.
 235
 236
 237 .. method:: Element.find(match)
 238
 239    Finds the first subelement matching *match*.  *match* may be a tag name or path.
 240    Returns an element instance or ``None``.
 241
 242
 243 .. method:: Element.findall(match)
 244
 245    Finds all subelements matching *match*.  *match* may be a tag name or path.
 246    Returns an iterable yielding all matching elements in document order.
 247
 248
 249 .. method:: Element.findtext(condition[, default=None])
 250
 251    Finds text for the first subelement matching *condition*.  *condition* may be a
 252    tag name or path. Returns the text content of the first matching element, or
 253    *default* if no element was found.  Note that if the matching element has no
 254    text content an empty string is returned.
 255
 256
 257 .. method:: Element.getchildren()
 258
 259    Returns all subelements.  The elements are returned in document order.
 260
 261
 262 .. method:: Element.getiterator([tag=None])
 263
 264    Creates a tree iterator with the current element as the root.   The iterator
 265    iterates over this element and all elements below it, in document (depth first)
 266    order.  If *tag* is not ``None`` or ``'*'``, only elements whose tag equals
 267    *tag* are returned from the iterator.
 268
 269
 270 .. method:: Element.insert(index, element)
 271
 272    Inserts a subelement at the given position in this element.
 273
 274
 275 .. method:: Element.makeelement(tag, attrib)
 276
 277    Creates a new element object of the same type as this element. Do not call this
 278    method, use the SubElement factory function instead.
 279
 280
 281 .. method:: Element.remove(subelement)
 282
 283    Removes *subelement* from the element.   Unlike the findXYZ methods this method
 284    compares elements based on  the instance identity, not on tag value or contents.
 285
 286 Element objects also support the following sequence type methods for working
 287 with subelements: :meth:`__delitem__`, :meth:`__getitem__`, :meth:`__setitem__`,
 288 :meth:`__len__`.
 289
 290 Caution: Because Element objects do not define a :meth:`__nonzero__` method,
 291 elements with no subelements will test as ``False``. ::
 292
 293    element = root.find('foo')
 294
 295    if not element: # careful!
 296        print "element not found, or element has no subelements"
 297
 298    if element is None:
 299        print "element not found"
 300
 301
 302 .. _elementtree-elementtree-objects:
 303
 304 ElementTree Objects
 305 -------------------
 306
 307
 308 .. class:: ElementTree([element,] [file])
 309
 310    ElementTree wrapper class.  This class represents an entire element hierarchy,
 311    and adds some extra support for serialization to and from standard XML.
 312
 313    *element* is the root element. The tree is initialized with the contents of the
 314    XML *file* if given.
 315
 316
 317    .. method:: _setroot(element)
 318
 319       Replaces the root element for this tree.  This discards the current
 320       contents of the tree, and replaces it with the given element.  Use with
 321       care. *element* is an element instance.
 322
 323
 324    .. method:: find(path)
 325
 326       Finds the first toplevel element with given tag. Same as
 327       getroot().find(path).  *path* is the element to look for. Returns the
 328       first matching element, or ``None`` if no element was found.
 329
 330
 331    .. method:: findall(path)
 332
 333       Finds all toplevel elements with the given tag. Same as
 334       getroot().findall(path).  *path* is the element to look for. Returns a
 335       list or :term:`iterator` containing all matching elements, in document
 336       order.
 337
 338
 339    .. method:: findtext(path[, default])
 340
 341       Finds the element text for the first toplevel element with given tag.
 342       Same as getroot().findtext(path). *path* is the toplevel element to look
 343       for. *default* is the value to return if the element was not
 344       found. Returns the text content of the first matching element, or the
 345       default value no element was found.  Note that if the element has is
 346       found, but has no text content, this method returns an empty string.
 347
 348
 349    .. method:: getiterator([tag])
 350
 351       Creates and returns a tree iterator for the root element.  The iterator
 352       loops over all elements in this tree, in section order. *tag* is the tag
 353       to look for (default is to return all elements)
 354
 355
 356    .. method:: getroot()
 357
 358       Returns the root element for this tree.
 359
 360
 361    .. method:: parse(source[, parser])
 362
 363       Loads an external XML section into this element tree. *source* is a file
 364       name or file object. *parser* is an optional parser instance.  If not
 365       given, the standard XMLTreeBuilder parser is used. Returns the section
 366       root element.
 367
 368
 369    .. method:: write(file[, encoding])
 370
 371       Writes the element tree to a file, as XML. *file* is a file name, or a
 372       file object opened for writing. *encoding* [1]_ is the output encoding
 373       (default is US-ASCII).
 374
 375 This is the XML file that is going to be manipulated::
 376
 377     <html>
 378         <head>
 379             <title>Example page</title>
 380         </head>
 381         <body>
 382             <p>Moved to <a href="http://example.org/">example.org</a>
 383             or <a href="http://example.com/">example.com</a>.</p>
 384         </body>
 385     </html>
 386
 387 Example of changing the attribute "target" of every link in first paragraph::
 388
 389     >>> from xml.etree.ElementTree import ElementTree
 390     >>> tree = ElementTree()
 391     >>> tree.parse("index.xhtml")
 392     <Element html at b7d3f1ec>
 393     >>> p = tree.find("body/p")     # Finds first occurrence of tag p in body
 394     >>> p
 395     <Element p at 8416e0c>
 396     >>> links = p.getiterator("a")  # Returns list of all links
 397     >>> links
 398     [<Element a at b7d4f9ec>, <Element a at b7d4fb0c>]
 399     >>> for i in links:             # Iterates through all found links
 400     ...     i.attrib["target"] = "blank"
 401     >>> tree.write("output.xhtml")
 402
 403 .. _elementtree-qname-objects:
 404
 405 QName Objects
 406 -------------
 407
 408
 409 .. class:: QName(text_or_uri[, tag])
 410
 411    QName wrapper.  This can be used to wrap a QName attribute value, in order to
 412    get proper namespace handling on output. *text_or_uri* is a string containing
 413    the QName value, in the form {uri}local, or, if the tag argument is given, the
 414    URI part of a QName. If *tag* is given, the first argument is interpreted as an
 415    URI, and this argument is interpreted as a local name. :class:`QName` instances
 416    are opaque.
 417
 418
 419 .. _elementtree-treebuilder-objects:
 420
 421 TreeBuilder Objects
 422 -------------------
 423
 424
 425 .. class:: TreeBuilder([element_factory])
 426
 427    Generic element structure builder.  This builder converts a sequence of start,
 428    data, and end method calls to a well-formed element structure. You can use this
 429    class to build an element structure using a custom XML parser, or a parser for
 430    some other XML-like format. The *element_factory* is called to create new
 431    Element instances when given.
 432
 433
 434    .. method:: close()
 435
 436       Flushes the parser buffers, and returns the toplevel document
 437       element. Returns an Element instance.
 438
 439
 440    .. method:: data(data)
 441
 442       Adds text to the current element. *data* is a string.  This should be
 443       either an 8-bit string containing ASCII text, or a Unicode string.
 444
 445
 446    .. method:: end(tag)
 447
 448       Closes the current element. *tag* is the element name. Returns the closed
 449       element.
 450
 451
 452    .. method:: start(tag, attrs)
 453
 454       Opens a new element. *tag* is the element name. *attrs* is a dictionary
 455       containing element attributes. Returns the opened element.
 456
 457
 458 .. _elementtree-xmltreebuilder-objects:
 459
 460 XMLTreeBuilder Objects
 461 ----------------------
 462
 463
 464 .. class:: XMLTreeBuilder([html,] [target])
 465
 466    Element structure builder for XML source data, based on the expat parser. *html*
 467    are predefined HTML entities.  This flag is not supported by the current
 468    implementation. *target* is the target object.  If omitted, the builder uses an
 469    instance of the standard TreeBuilder class.
 470
 471
 472    .. method:: close()
 473
 474       Finishes feeding data to the parser. Returns an element structure.
 475
 476
 477    .. method:: doctype(name, pubid, system)
 478
 479       Handles a doctype declaration. *name* is the doctype name. *pubid* is the
 480       public identifier. *system* is the system identifier.
 481
 482
 483    .. method:: feed(data)
 484
 485       Feeds data to the parser. *data* is encoded data.
 486
 487 :meth:`XMLTreeBuilder.feed` calls *target*\'s :meth:`start` method
 488 for each opening tag, its :meth:`end` method for each closing tag,
 489 and data is processed by method :meth:`data`. :meth:`XMLTreeBuilder.close`
 490 calls *target*\'s method :meth:`close`.
 491 :class:`XMLTreeBuilder` can be used not only for building a tree structure.
 492 This is an example of counting the maximum depth of an XML file::
 493
 494     >>> from xml.etree.ElementTree import XMLTreeBuilder
 495     >>> class MaxDepth:                     # The target object of the parser
 496     ...     maxDepth = 0
 497     ...     depth = 0
 498     ...     def start(self, tag, attrib):   # Called for each opening tag.
 499     ...         self.depth += 1
 500     ...         if self.depth > self.maxDepth:
 501     ...             self.maxDepth = self.depth
 502     ...     def end(self, tag):             # Called for each closing tag.
 503     ...         self.depth -= 1
 504     ...     def data(self, data):
 505     ...         pass            # We do not need to do anything with data.
 506     ...     def close(self):    # Called when all data has been parsed.
 507     ...         return self.maxDepth
 508     ...
 509     >>> target = MaxDepth()
 510     >>> parser = XMLTreeBuilder(target=target)
 511     >>> exampleXml = """
 512     ... <a>
 513     ...   <b>
 514     ...   </b>
 515     ...   <b>
 516     ...     <c>
 517     ...       <d>
 518     ...       </d>
 519     ...     </c>
 520     ...   </b>
 521     ... </a>"""
 522     >>> parser.feed(exampleXml)
 523     >>> parser.close()
 524     4
 525
 526
 527 .. rubric:: Footnotes
 528
 529 .. [#] The encoding string included in XML output should conform to the
 530    appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
 531    not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
 532    and http://www.iana.org/assignments/character-sets.
 533