Added a test for the ability to specify a class attribute in Formatter configuration...
[python.git] / Doc / lib / xmldomminidom.tex
blobf7657eb32986e2ca30ef3e1211a08d3b9e9ec942
1 \section{\module{xml.dom.minidom} ---
2 Lightweight DOM implementation}
4 \declaremodule{standard}{xml.dom.minidom}
5 \modulesynopsis{Lightweight Document Object Model (DOM) implementation.}
6 \moduleauthor{Paul Prescod}{paul@prescod.net}
7 \sectionauthor{Paul Prescod}{paul@prescod.net}
8 \sectionauthor{Martin v. L\"owis}{martin@v.loewis.de}
10 \versionadded{2.0}
12 \module{xml.dom.minidom} is a light-weight implementation of the
13 Document Object Model interface. It is intended to be
14 simpler than the full DOM and also significantly smaller.
16 DOM applications typically start by parsing some XML into a DOM. With
17 \module{xml.dom.minidom}, this is done through the parse functions:
19 \begin{verbatim}
20 from xml.dom.minidom import parse, parseString
22 dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name
24 datasource = open('c:\\temp\\mydata.xml')
25 dom2 = parse(datasource) # parse an open file
27 dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
28 \end{verbatim}
30 The \function{parse()} function can take either a filename or an open
31 file object.
33 \begin{funcdesc}{parse}{filename_or_file{, parser}}
34 Return a \class{Document} from the given input. \var{filename_or_file}
35 may be either a file name, or a file-like object. \var{parser}, if
36 given, must be a SAX2 parser object. This function will change the
37 document handler of the parser and activate namespace support; other
38 parser configuration (like setting an entity resolver) must have been
39 done in advance.
40 \end{funcdesc}
42 If you have XML in a string, you can use the
43 \function{parseString()} function instead:
45 \begin{funcdesc}{parseString}{string\optional{, parser}}
46 Return a \class{Document} that represents the \var{string}. This
47 method creates a \class{StringIO} object for the string and passes
48 that on to \function{parse}.
49 \end{funcdesc}
51 Both functions return a \class{Document} object representing the
52 content of the document.
54 What the \function{parse()} and \function{parseString()} functions do
55 is connect an XML parser with a ``DOM builder'' that can accept parse
56 events from any SAX parser and convert them into a DOM tree. The name
57 of the functions are perhaps misleading, but are easy to grasp when
58 learning the interfaces. The parsing of the document will be
59 completed before these functions return; it's simply that these
60 functions do not provide a parser implementation themselves.
62 You can also create a \class{Document} by calling a method on a ``DOM
63 Implementation'' object. You can get this object either by calling
64 the \function{getDOMImplementation()} function in the
65 \refmodule{xml.dom} package or the \module{xml.dom.minidom} module.
66 Using the implementation from the \module{xml.dom.minidom} module will
67 always return a \class{Document} instance from the minidom
68 implementation, while the version from \refmodule{xml.dom} may provide
69 an alternate implementation (this is likely if you have the
70 \ulink{PyXML package}{http://pyxml.sourceforge.net/} installed). Once
71 you have a \class{Document}, you can add child nodes to it to populate
72 the DOM:
74 \begin{verbatim}
75 from xml.dom.minidom import getDOMImplementation
77 impl = getDOMImplementation()
79 newdoc = impl.createDocument(None, "some_tag", None)
80 top_element = newdoc.documentElement
81 text = newdoc.createTextNode('Some textual content.')
82 top_element.appendChild(text)
83 \end{verbatim}
85 Once you have a DOM document object, you can access the parts of your
86 XML document through its properties and methods. These properties are
87 defined in the DOM specification. The main property of the document
88 object is the \member{documentElement} property. It gives you the
89 main element in the XML document: the one that holds all others. Here
90 is an example program:
92 \begin{verbatim}
93 dom3 = parseString("<myxml>Some data</myxml>")
94 assert dom3.documentElement.tagName == "myxml"
95 \end{verbatim}
97 When you are finished with a DOM, you should clean it up. This is
98 necessary because some versions of Python do not support garbage
99 collection of objects that refer to each other in a cycle. Until this
100 restriction is removed from all versions of Python, it is safest to
101 write your code as if cycles would not be cleaned up.
103 The way to clean up a DOM is to call its \method{unlink()} method:
105 \begin{verbatim}
106 dom1.unlink()
107 dom2.unlink()
108 dom3.unlink()
109 \end{verbatim}
111 \method{unlink()} is a \module{xml.dom.minidom}-specific extension to
112 the DOM API. After calling \method{unlink()} on a node, the node and
113 its descendants are essentially useless.
115 \begin{seealso}
116 \seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{Document Object
117 Model (DOM) Level 1 Specification}
118 {The W3C recommendation for the
119 DOM supported by \module{xml.dom.minidom}.}
120 \end{seealso}
123 \subsection{DOM Objects \label{dom-objects}}
125 The definition of the DOM API for Python is given as part of the
126 \refmodule{xml.dom} module documentation. This section lists the
127 differences between the API and \refmodule{xml.dom.minidom}.
130 \begin{methoddesc}[Node]{unlink}{}
131 Break internal references within the DOM so that it will be garbage
132 collected on versions of Python without cyclic GC. Even when cyclic
133 GC is available, using this can make large amounts of memory available
134 sooner, so calling this on DOM objects as soon as they are no longer
135 needed is good practice. This only needs to be called on the
136 \class{Document} object, but may be called on child nodes to discard
137 children of that node.
138 \end{methoddesc}
140 \begin{methoddesc}[Node]{writexml}{writer\optional{,indent=""\optional{,addindent=""\optional{,newl=""}}}}
141 Write XML to the writer object. The writer should have a
142 \method{write()} method which matches that of the file object
143 interface. The \var{indent} parameter is the indentation of the current
144 node. The \var{addindent} parameter is the incremental indentation to use
145 for subnodes of the current one. The \var{newl} parameter specifies the
146 string to use to terminate newlines.
148 \versionchanged[The optional keyword parameters
149 \var{indent}, \var{addindent}, and \var{newl} were added to support pretty
150 output]{2.1}
152 \versionchanged[For the \class{Document} node, an additional keyword
153 argument \var{encoding} can be used to specify the encoding field of the XML
154 header]{2.3}
155 \end{methoddesc}
157 \begin{methoddesc}[Node]{toxml}{\optional{encoding}}
158 Return the XML that the DOM represents as a string.
160 With no argument, the XML header does not specify an encoding, and the
161 result is Unicode string if the default encoding cannot represent all
162 characters in the document. Encoding this string in an encoding other
163 than UTF-8 is likely incorrect, since UTF-8 is the default encoding of
164 XML.
166 With an explicit \var{encoding} argument, the result is a byte string
167 in the specified encoding. It is recommended that this argument is
168 always specified. To avoid UnicodeError exceptions in case of
169 unrepresentable text data, the encoding argument should be specified
170 as "utf-8".
172 \versionchanged[the \var{encoding} argument was introduced]{2.3}
173 \end{methoddesc}
175 \begin{methoddesc}[Node]{toprettyxml}{\optional{indent\optional{, newl}}}
176 Return a pretty-printed version of the document. \var{indent} specifies
177 the indentation string and defaults to a tabulator; \var{newl} specifies
178 the string emitted at the end of each line and defaults to \code{\e n}.
180 \versionadded{2.1}
181 \versionchanged[the encoding argument; see \method{toxml()}]{2.3}
182 \end{methoddesc}
184 The following standard DOM methods have special considerations with
185 \refmodule{xml.dom.minidom}:
187 \begin{methoddesc}[Node]{cloneNode}{deep}
188 Although this method was present in the version of
189 \refmodule{xml.dom.minidom} packaged with Python 2.0, it was seriously
190 broken. This has been corrected for subsequent releases.
191 \end{methoddesc}
194 \subsection{DOM Example \label{dom-example}}
196 This example program is a fairly realistic example of a simple
197 program. In this particular case, we do not take much advantage
198 of the flexibility of the DOM.
200 \verbatiminput{minidom-example.py}
203 \subsection{minidom and the DOM standard \label{minidom-and-dom}}
205 The \refmodule{xml.dom.minidom} module is essentially a DOM
206 1.0-compatible DOM with some DOM 2 features (primarily namespace
207 features).
209 Usage of the DOM interface in Python is straight-forward. The
210 following mapping rules apply:
212 \begin{itemize}
213 \item Interfaces are accessed through instance objects. Applications
214 should not instantiate the classes themselves; they should use
215 the creator functions available on the \class{Document} object.
216 Derived interfaces support all operations (and attributes) from
217 the base interfaces, plus any new operations.
219 \item Operations are used as methods. Since the DOM uses only
220 \keyword{in} parameters, the arguments are passed in normal
221 order (from left to right). There are no optional
222 arguments. \keyword{void} operations return \code{None}.
224 \item IDL attributes map to instance attributes. For compatibility
225 with the OMG IDL language mapping for Python, an attribute
226 \code{foo} can also be accessed through accessor methods
227 \method{_get_foo()} and \method{_set_foo()}. \keyword{readonly}
228 attributes must not be changed; this is not enforced at
229 runtime.
231 \item The types \code{short int}, \code{unsigned int}, \code{unsigned
232 long long}, and \code{boolean} all map to Python integer
233 objects.
235 \item The type \code{DOMString} maps to Python strings.
236 \refmodule{xml.dom.minidom} supports either byte or Unicode
237 strings, but will normally produce Unicode strings. Values
238 of type \code{DOMString} may also be \code{None} where allowed
239 to have the IDL \code{null} value by the DOM specification from
240 the W3C.
242 \item \keyword{const} declarations map to variables in their
243 respective scope
244 (e.g. \code{xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE});
245 they must not be changed.
247 \item \code{DOMException} is currently not supported in
248 \refmodule{xml.dom.minidom}. Instead,
249 \refmodule{xml.dom.minidom} uses standard Python exceptions such
250 as \exception{TypeError} and \exception{AttributeError}.
252 \item \class{NodeList} objects are implemented using Python's built-in
253 list type. Starting with Python 2.2, these objects provide the
254 interface defined in the DOM specification, but with earlier
255 versions of Python they do not support the official API. They
256 are, however, much more ``Pythonic'' than the interface defined
257 in the W3C recommendations.
258 \end{itemize}
261 The following interfaces have no implementation in
262 \refmodule{xml.dom.minidom}:
264 \begin{itemize}
265 \item \class{DOMTimeStamp}
267 \item \class{DocumentType} (added in Python 2.1)
269 \item \class{DOMImplementation} (added in Python 2.1)
271 \item \class{CharacterData}
273 \item \class{CDATASection}
275 \item \class{Notation}
277 \item \class{Entity}
279 \item \class{EntityReference}
281 \item \class{DocumentFragment}
282 \end{itemize}
284 Most of these reflect information in the XML document that is not of
285 general utility to most DOM users.