Added a test for the ability to specify a class attribute in Formatter configuration...
[python.git] / Doc / whatsnew / whatsnew23.tex
blobe29ecdd21f72f7aeab219ff80b707c26ef800efe
1 \documentclass{howto}
2 \usepackage{distutils}
3 % $Id$
5 \title{What's New in Python 2.3}
6 \release{1.01}
7 \author{A.M.\ Kuchling}
8 \authoraddress{
9 \strong{Python Software Foundation}\\
10 Email: \email{amk@amk.ca}
13 \begin{document}
14 \maketitle
15 \tableofcontents
17 This article explains the new features in Python 2.3. Python 2.3 was
18 released on July 29, 2003.
20 The main themes for Python 2.3 are polishing some of the features
21 added in 2.2, adding various small but useful enhancements to the core
22 language, and expanding the standard library. The new object model
23 introduced in the previous version has benefited from 18 months of
24 bugfixes and from optimization efforts that have improved the
25 performance of new-style classes. A few new built-in functions have
26 been added such as \function{sum()} and \function{enumerate()}. The
27 \keyword{in} operator can now be used for substring searches (e.g.
28 \code{"ab" in "abc"} returns \constant{True}).
30 Some of the many new library features include Boolean, set, heap, and
31 date/time data types, the ability to import modules from ZIP-format
32 archives, metadata support for the long-awaited Python catalog, an
33 updated version of IDLE, and modules for logging messages, wrapping
34 text, parsing CSV files, processing command-line options, using BerkeleyDB
35 databases... the list of new and enhanced modules is lengthy.
37 This article doesn't attempt to provide a complete specification of
38 the new features, but instead provides a convenient overview. For
39 full details, you should refer to the documentation for Python 2.3,
40 such as the \citetitle[../lib/lib.html]{Python Library Reference} and
41 the \citetitle[../ref/ref.html]{Python Reference Manual}. If you want
42 to understand the complete implementation and design rationale,
43 refer to the PEP for a particular new feature.
46 %======================================================================
47 \section{PEP 218: A Standard Set Datatype}
49 The new \module{sets} module contains an implementation of a set
50 datatype. The \class{Set} class is for mutable sets, sets that can
51 have members added and removed. The \class{ImmutableSet} class is for
52 sets that can't be modified, and instances of \class{ImmutableSet} can
53 therefore be used as dictionary keys. Sets are built on top of
54 dictionaries, so the elements within a set must be hashable.
56 Here's a simple example:
58 \begin{verbatim}
59 >>> import sets
60 >>> S = sets.Set([1,2,3])
61 >>> S
62 Set([1, 2, 3])
63 >>> 1 in S
64 True
65 >>> 0 in S
66 False
67 >>> S.add(5)
68 >>> S.remove(3)
69 >>> S
70 Set([1, 2, 5])
71 >>>
72 \end{verbatim}
74 The union and intersection of sets can be computed with the
75 \method{union()} and \method{intersection()} methods; an alternative
76 notation uses the bitwise operators \code{\&} and \code{|}.
77 Mutable sets also have in-place versions of these methods,
78 \method{union_update()} and \method{intersection_update()}.
80 \begin{verbatim}
81 >>> S1 = sets.Set([1,2,3])
82 >>> S2 = sets.Set([4,5,6])
83 >>> S1.union(S2)
84 Set([1, 2, 3, 4, 5, 6])
85 >>> S1 | S2 # Alternative notation
86 Set([1, 2, 3, 4, 5, 6])
87 >>> S1.intersection(S2)
88 Set([])
89 >>> S1 & S2 # Alternative notation
90 Set([])
91 >>> S1.union_update(S2)
92 >>> S1
93 Set([1, 2, 3, 4, 5, 6])
94 >>>
95 \end{verbatim}
97 It's also possible to take the symmetric difference of two sets. This
98 is the set of all elements in the union that aren't in the
99 intersection. Another way of putting it is that the symmetric
100 difference contains all elements that are in exactly one
101 set. Again, there's an alternative notation (\code{\^}), and an
102 in-place version with the ungainly name
103 \method{symmetric_difference_update()}.
105 \begin{verbatim}
106 >>> S1 = sets.Set([1,2,3,4])
107 >>> S2 = sets.Set([3,4,5,6])
108 >>> S1.symmetric_difference(S2)
109 Set([1, 2, 5, 6])
110 >>> S1 ^ S2
111 Set([1, 2, 5, 6])
113 \end{verbatim}
115 There are also \method{issubset()} and \method{issuperset()} methods
116 for checking whether one set is a subset or superset of another:
118 \begin{verbatim}
119 >>> S1 = sets.Set([1,2,3])
120 >>> S2 = sets.Set([2,3])
121 >>> S2.issubset(S1)
122 True
123 >>> S1.issubset(S2)
124 False
125 >>> S1.issuperset(S2)
126 True
128 \end{verbatim}
131 \begin{seealso}
133 \seepep{218}{Adding a Built-In Set Object Type}{PEP written by Greg V. Wilson.
134 Implemented by Greg V. Wilson, Alex Martelli, and GvR.}
136 \end{seealso}
140 %======================================================================
141 \section{PEP 255: Simple Generators\label{section-generators}}
143 In Python 2.2, generators were added as an optional feature, to be
144 enabled by a \code{from __future__ import generators} directive. In
145 2.3 generators no longer need to be specially enabled, and are now
146 always present; this means that \keyword{yield} is now always a
147 keyword. The rest of this section is a copy of the description of
148 generators from the ``What's New in Python 2.2'' document; if you read
149 it back when Python 2.2 came out, you can skip the rest of this section.
151 You're doubtless familiar with how function calls work in Python or C.
152 When you call a function, it gets a private namespace where its local
153 variables are created. When the function reaches a \keyword{return}
154 statement, the local variables are destroyed and the resulting value
155 is returned to the caller. A later call to the same function will get
156 a fresh new set of local variables. But, what if the local variables
157 weren't thrown away on exiting a function? What if you could later
158 resume the function where it left off? This is what generators
159 provide; they can be thought of as resumable functions.
161 Here's the simplest example of a generator function:
163 \begin{verbatim}
164 def generate_ints(N):
165 for i in range(N):
166 yield i
167 \end{verbatim}
169 A new keyword, \keyword{yield}, was introduced for generators. Any
170 function containing a \keyword{yield} statement is a generator
171 function; this is detected by Python's bytecode compiler which
172 compiles the function specially as a result.
174 When you call a generator function, it doesn't return a single value;
175 instead it returns a generator object that supports the iterator
176 protocol. On executing the \keyword{yield} statement, the generator
177 outputs the value of \code{i}, similar to a \keyword{return}
178 statement. The big difference between \keyword{yield} and a
179 \keyword{return} statement is that on reaching a \keyword{yield} the
180 generator's state of execution is suspended and local variables are
181 preserved. On the next call to the generator's \code{.next()} method,
182 the function will resume executing immediately after the
183 \keyword{yield} statement. (For complicated reasons, the
184 \keyword{yield} statement isn't allowed inside the \keyword{try} block
185 of a \keyword{try}...\keyword{finally} statement; read \pep{255} for a full
186 explanation of the interaction between \keyword{yield} and
187 exceptions.)
189 Here's a sample usage of the \function{generate_ints()} generator:
191 \begin{verbatim}
192 >>> gen = generate_ints(3)
193 >>> gen
194 <generator object at 0x8117f90>
195 >>> gen.next()
197 >>> gen.next()
199 >>> gen.next()
201 >>> gen.next()
202 Traceback (most recent call last):
203 File "stdin", line 1, in ?
204 File "stdin", line 2, in generate_ints
205 StopIteration
206 \end{verbatim}
208 You could equally write \code{for i in generate_ints(5)}, or
209 \code{a,b,c = generate_ints(3)}.
211 Inside a generator function, the \keyword{return} statement can only
212 be used without a value, and signals the end of the procession of
213 values; afterwards the generator cannot return any further values.
214 \keyword{return} with a value, such as \code{return 5}, is a syntax
215 error inside a generator function. The end of the generator's results
216 can also be indicated by raising \exception{StopIteration} manually,
217 or by just letting the flow of execution fall off the bottom of the
218 function.
220 You could achieve the effect of generators manually by writing your
221 own class and storing all the local variables of the generator as
222 instance variables. For example, returning a list of integers could
223 be done by setting \code{self.count} to 0, and having the
224 \method{next()} method increment \code{self.count} and return it.
225 However, for a moderately complicated generator, writing a
226 corresponding class would be much messier.
227 \file{Lib/test/test_generators.py} contains a number of more
228 interesting examples. The simplest one implements an in-order
229 traversal of a tree using generators recursively.
231 \begin{verbatim}
232 # A recursive generator that generates Tree leaves in in-order.
233 def inorder(t):
234 if t:
235 for x in inorder(t.left):
236 yield x
237 yield t.label
238 for x in inorder(t.right):
239 yield x
240 \end{verbatim}
242 Two other examples in \file{Lib/test/test_generators.py} produce
243 solutions for the N-Queens problem (placing $N$ queens on an $NxN$
244 chess board so that no queen threatens another) and the Knight's Tour
245 (a route that takes a knight to every square of an $NxN$ chessboard
246 without visiting any square twice).
248 The idea of generators comes from other programming languages,
249 especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
250 idea of generators is central. In Icon, every
251 expression and function call behaves like a generator. One example
252 from ``An Overview of the Icon Programming Language'' at
253 \url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
254 what this looks like:
256 \begin{verbatim}
257 sentence := "Store it in the neighboring harbor"
258 if (i := find("or", sentence)) > 5 then write(i)
259 \end{verbatim}
261 In Icon the \function{find()} function returns the indexes at which the
262 substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
263 \code{i} is first assigned a value of 3, but 3 is less than 5, so the
264 comparison fails, and Icon retries it with the second value of 23. 23
265 is greater than 5, so the comparison now succeeds, and the code prints
266 the value 23 to the screen.
268 Python doesn't go nearly as far as Icon in adopting generators as a
269 central concept. Generators are considered part of the core
270 Python language, but learning or using them isn't compulsory; if they
271 don't solve any problems that you have, feel free to ignore them.
272 One novel feature of Python's interface as compared to
273 Icon's is that a generator's state is represented as a concrete object
274 (the iterator) that can be passed around to other functions or stored
275 in a data structure.
277 \begin{seealso}
279 \seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
280 Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
281 and Tim Peters, with other fixes from the Python Labs crew.}
283 \end{seealso}
286 %======================================================================
287 \section{PEP 263: Source Code Encodings \label{section-encodings}}
289 Python source files can now be declared as being in different
290 character set encodings. Encodings are declared by including a
291 specially formatted comment in the first or second line of the source
292 file. For example, a UTF-8 file can be declared with:
294 \begin{verbatim}
295 #!/usr/bin/env python
296 # -*- coding: UTF-8 -*-
297 \end{verbatim}
299 Without such an encoding declaration, the default encoding used is
300 7-bit ASCII. Executing or importing modules that contain string
301 literals with 8-bit characters and have no encoding declaration will result
302 in a \exception{DeprecationWarning} being signalled by Python 2.3; in
303 2.4 this will be a syntax error.
305 The encoding declaration only affects Unicode string literals, which
306 will be converted to Unicode using the specified encoding. Note that
307 Python identifiers are still restricted to ASCII characters, so you
308 can't have variable names that use characters outside of the usual
309 alphanumerics.
311 \begin{seealso}
313 \seepep{263}{Defining Python Source Code Encodings}{Written by
314 Marc-Andr\'e Lemburg and Martin von~L\"owis; implemented by Suzuki
315 Hisao and Martin von~L\"owis.}
317 \end{seealso}
320 %======================================================================
321 \section{PEP 273: Importing Modules from Zip Archives}
323 The new \module{zipimport} module adds support for importing
324 modules from a ZIP-format archive. You don't need to import the
325 module explicitly; it will be automatically imported if a ZIP
326 archive's filename is added to \code{sys.path}. For example:
328 \begin{verbatim}
329 amk@nyman:~/src/python$ unzip -l /tmp/example.zip
330 Archive: /tmp/example.zip
331 Length Date Time Name
332 -------- ---- ---- ----
333 8467 11-26-02 22:30 jwzthreading.py
334 -------- -------
335 8467 1 file
336 amk@nyman:~/src/python$ ./python
337 Python 2.3 (#1, Aug 1 2003, 19:54:32)
338 >>> import sys
339 >>> sys.path.insert(0, '/tmp/example.zip') # Add .zip file to front of path
340 >>> import jwzthreading
341 >>> jwzthreading.__file__
342 '/tmp/example.zip/jwzthreading.py'
344 \end{verbatim}
346 An entry in \code{sys.path} can now be the filename of a ZIP archive.
347 The ZIP archive can contain any kind of files, but only files named
348 \file{*.py}, \file{*.pyc}, or \file{*.pyo} can be imported. If an
349 archive only contains \file{*.py} files, Python will not attempt to
350 modify the archive by adding the corresponding \file{*.pyc} file, meaning
351 that if a ZIP archive doesn't contain \file{*.pyc} files, importing may be
352 rather slow.
354 A path within the archive can also be specified to only import from a
355 subdirectory; for example, the path \file{/tmp/example.zip/lib/}
356 would only import from the \file{lib/} subdirectory within the
357 archive.
359 \begin{seealso}
361 \seepep{273}{Import Modules from Zip Archives}{Written by James C. Ahlstrom,
362 who also provided an implementation.
363 Python 2.3 follows the specification in \pep{273},
364 but uses an implementation written by Just van~Rossum
365 that uses the import hooks described in \pep{302}.
366 See section~\ref{section-pep302} for a description of the new import hooks.
369 \end{seealso}
371 %======================================================================
372 \section{PEP 277: Unicode file name support for Windows NT}
374 On Windows NT, 2000, and XP, the system stores file names as Unicode
375 strings. Traditionally, Python has represented file names as byte
376 strings, which is inadequate because it renders some file names
377 inaccessible.
379 Python now allows using arbitrary Unicode strings (within the
380 limitations of the file system) for all functions that expect file
381 names, most notably the \function{open()} built-in function. If a Unicode
382 string is passed to \function{os.listdir()}, Python now returns a list
383 of Unicode strings. A new function, \function{os.getcwdu()}, returns
384 the current directory as a Unicode string.
386 Byte strings still work as file names, and on Windows Python will
387 transparently convert them to Unicode using the \code{mbcs} encoding.
389 Other systems also allow Unicode strings as file names but convert
390 them to byte strings before passing them to the system, which can
391 cause a \exception{UnicodeError} to be raised. Applications can test
392 whether arbitrary Unicode strings are supported as file names by
393 checking \member{os.path.supports_unicode_filenames}, a Boolean value.
395 Under MacOS, \function{os.listdir()} may now return Unicode filenames.
397 \begin{seealso}
399 \seepep{277}{Unicode file name support for Windows NT}{Written by Neil
400 Hodgson; implemented by Neil Hodgson, Martin von~L\"owis, and Mark
401 Hammond.}
403 \end{seealso}
406 %======================================================================
407 \section{PEP 278: Universal Newline Support}
409 The three major operating systems used today are Microsoft Windows,
410 Apple's Macintosh OS, and the various \UNIX\ derivatives. A minor
411 irritation of cross-platform work
412 is that these three platforms all use different characters
413 to mark the ends of lines in text files. \UNIX\ uses the linefeed
414 (ASCII character 10), MacOS uses the carriage return (ASCII
415 character 13), and Windows uses a two-character sequence of a
416 carriage return plus a newline.
418 Python's file objects can now support end of line conventions other
419 than the one followed by the platform on which Python is running.
420 Opening a file with the mode \code{'U'} or \code{'rU'} will open a file
421 for reading in universal newline mode. All three line ending
422 conventions will be translated to a \character{\e n} in the strings
423 returned by the various file methods such as \method{read()} and
424 \method{readline()}.
426 Universal newline support is also used when importing modules and when
427 executing a file with the \function{execfile()} function. This means
428 that Python modules can be shared between all three operating systems
429 without needing to convert the line-endings.
431 This feature can be disabled when compiling Python by specifying
432 the \longprogramopt{without-universal-newlines} switch when running Python's
433 \program{configure} script.
435 \begin{seealso}
437 \seepep{278}{Universal Newline Support}{Written
438 and implemented by Jack Jansen.}
440 \end{seealso}
443 %======================================================================
444 \section{PEP 279: enumerate()\label{section-enumerate}}
446 A new built-in function, \function{enumerate()}, will make
447 certain loops a bit clearer. \code{enumerate(thing)}, where
448 \var{thing} is either an iterator or a sequence, returns a iterator
449 that will return \code{(0, \var{thing}[0])}, \code{(1,
450 \var{thing}[1])}, \code{(2, \var{thing}[2])}, and so forth.
452 A common idiom to change every element of a list looks like this:
454 \begin{verbatim}
455 for i in range(len(L)):
456 item = L[i]
457 # ... compute some result based on item ...
458 L[i] = result
459 \end{verbatim}
461 This can be rewritten using \function{enumerate()} as:
463 \begin{verbatim}
464 for i, item in enumerate(L):
465 # ... compute some result based on item ...
466 L[i] = result
467 \end{verbatim}
470 \begin{seealso}
472 \seepep{279}{The enumerate() built-in function}{Written
473 and implemented by Raymond D. Hettinger.}
475 \end{seealso}
478 %======================================================================
479 \section{PEP 282: The logging Package}
481 A standard package for writing logs, \module{logging}, has been added
482 to Python 2.3. It provides a powerful and flexible mechanism for
483 generating logging output which can then be filtered and processed in
484 various ways. A configuration file written in a standard format can
485 be used to control the logging behavior of a program. Python
486 includes handlers that will write log records to
487 standard error or to a file or socket, send them to the system log, or
488 even e-mail them to a particular address; of course, it's also
489 possible to write your own handler classes.
491 The \class{Logger} class is the primary class.
492 Most application code will deal with one or more \class{Logger}
493 objects, each one used by a particular subsystem of the application.
494 Each \class{Logger} is identified by a name, and names are organized
495 into a hierarchy using \samp{.} as the component separator. For
496 example, you might have \class{Logger} instances named \samp{server},
497 \samp{server.auth} and \samp{server.network}. The latter two
498 instances are below \samp{server} in the hierarchy. This means that
499 if you turn up the verbosity for \samp{server} or direct \samp{server}
500 messages to a different handler, the changes will also apply to
501 records logged to \samp{server.auth} and \samp{server.network}.
502 There's also a root \class{Logger} that's the parent of all other
503 loggers.
505 For simple uses, the \module{logging} package contains some
506 convenience functions that always use the root log:
508 \begin{verbatim}
509 import logging
511 logging.debug('Debugging information')
512 logging.info('Informational message')
513 logging.warning('Warning:config file %s not found', 'server.conf')
514 logging.error('Error occurred')
515 logging.critical('Critical error -- shutting down')
516 \end{verbatim}
518 This produces the following output:
520 \begin{verbatim}
521 WARNING:root:Warning:config file server.conf not found
522 ERROR:root:Error occurred
523 CRITICAL:root:Critical error -- shutting down
524 \end{verbatim}
526 In the default configuration, informational and debugging messages are
527 suppressed and the output is sent to standard error. You can enable
528 the display of informational and debugging messages by calling the
529 \method{setLevel()} method on the root logger.
531 Notice the \function{warning()} call's use of string formatting
532 operators; all of the functions for logging messages take the
533 arguments \code{(\var{msg}, \var{arg1}, \var{arg2}, ...)} and log the
534 string resulting from \code{\var{msg} \% (\var{arg1}, \var{arg2},
535 ...)}.
537 There's also an \function{exception()} function that records the most
538 recent traceback. Any of the other functions will also record the
539 traceback if you specify a true value for the keyword argument
540 \var{exc_info}.
542 \begin{verbatim}
543 def f():
544 try: 1/0
545 except: logging.exception('Problem recorded')
548 \end{verbatim}
550 This produces the following output:
552 \begin{verbatim}
553 ERROR:root:Problem recorded
554 Traceback (most recent call last):
555 File "t.py", line 6, in f
557 ZeroDivisionError: integer division or modulo by zero
558 \end{verbatim}
560 Slightly more advanced programs will use a logger other than the root
561 logger. The \function{getLogger(\var{name})} function is used to get
562 a particular log, creating it if it doesn't exist yet.
563 \function{getLogger(None)} returns the root logger.
566 \begin{verbatim}
567 log = logging.getLogger('server')
569 log.info('Listening on port %i', port)
571 log.critical('Disk full')
573 \end{verbatim}
575 Log records are usually propagated up the hierarchy, so a message
576 logged to \samp{server.auth} is also seen by \samp{server} and
577 \samp{root}, but a \class{Logger} can prevent this by setting its
578 \member{propagate} attribute to \constant{False}.
580 There are more classes provided by the \module{logging} package that
581 can be customized. When a \class{Logger} instance is told to log a
582 message, it creates a \class{LogRecord} instance that is sent to any
583 number of different \class{Handler} instances. Loggers and handlers
584 can also have an attached list of filters, and each filter can cause
585 the \class{LogRecord} to be ignored or can modify the record before
586 passing it along. When they're finally output, \class{LogRecord}
587 instances are converted to text by a \class{Formatter} class. All of
588 these classes can be replaced by your own specially-written classes.
590 With all of these features the \module{logging} package should provide
591 enough flexibility for even the most complicated applications. This
592 is only an incomplete overview of its features, so please see the
593 \ulink{package's reference documentation}{../lib/module-logging.html}
594 for all of the details. Reading \pep{282} will also be helpful.
597 \begin{seealso}
599 \seepep{282}{A Logging System}{Written by Vinay Sajip and Trent Mick;
600 implemented by Vinay Sajip.}
602 \end{seealso}
605 %======================================================================
606 \section{PEP 285: A Boolean Type\label{section-bool}}
608 A Boolean type was added to Python 2.3. Two new constants were added
609 to the \module{__builtin__} module, \constant{True} and
610 \constant{False}. (\constant{True} and
611 \constant{False} constants were added to the built-ins
612 in Python 2.2.1, but the 2.2.1 versions are simply set to integer values of
613 1 and 0 and aren't a different type.)
615 The type object for this new type is named
616 \class{bool}; the constructor for it takes any Python value and
617 converts it to \constant{True} or \constant{False}.
619 \begin{verbatim}
620 >>> bool(1)
621 True
622 >>> bool(0)
623 False
624 >>> bool([])
625 False
626 >>> bool( (1,) )
627 True
628 \end{verbatim}
630 Most of the standard library modules and built-in functions have been
631 changed to return Booleans.
633 \begin{verbatim}
634 >>> obj = []
635 >>> hasattr(obj, 'append')
636 True
637 >>> isinstance(obj, list)
638 True
639 >>> isinstance(obj, tuple)
640 False
641 \end{verbatim}
643 Python's Booleans were added with the primary goal of making code
644 clearer. For example, if you're reading a function and encounter the
645 statement \code{return 1}, you might wonder whether the \code{1}
646 represents a Boolean truth value, an index, or a
647 coefficient that multiplies some other quantity. If the statement is
648 \code{return True}, however, the meaning of the return value is quite
649 clear.
651 Python's Booleans were \emph{not} added for the sake of strict
652 type-checking. A very strict language such as Pascal would also
653 prevent you performing arithmetic with Booleans, and would require
654 that the expression in an \keyword{if} statement always evaluate to a
655 Boolean result. Python is not this strict and never will be, as
656 \pep{285} explicitly says. This means you can still use any
657 expression in an \keyword{if} statement, even ones that evaluate to a
658 list or tuple or some random object. The Boolean type is a
659 subclass of the \class{int} class so that arithmetic using a Boolean
660 still works.
662 \begin{verbatim}
663 >>> True + 1
665 >>> False + 1
667 >>> False * 75
669 >>> True * 75
671 \end{verbatim}
673 To sum up \constant{True} and \constant{False} in a sentence: they're
674 alternative ways to spell the integer values 1 and 0, with the single
675 difference that \function{str()} and \function{repr()} return the
676 strings \code{'True'} and \code{'False'} instead of \code{'1'} and
677 \code{'0'}.
679 \begin{seealso}
681 \seepep{285}{Adding a bool type}{Written and implemented by GvR.}
683 \end{seealso}
686 %======================================================================
687 \section{PEP 293: Codec Error Handling Callbacks}
689 When encoding a Unicode string into a byte string, unencodable
690 characters may be encountered. So far, Python has allowed specifying
691 the error processing as either ``strict'' (raising
692 \exception{UnicodeError}), ``ignore'' (skipping the character), or
693 ``replace'' (using a question mark in the output string), with
694 ``strict'' being the default behavior. It may be desirable to specify
695 alternative processing of such errors, such as inserting an XML
696 character reference or HTML entity reference into the converted
697 string.
699 Python now has a flexible framework to add different processing
700 strategies. New error handlers can be added with
701 \function{codecs.register_error}, and codecs then can access the error
702 handler with \function{codecs.lookup_error}. An equivalent C API has
703 been added for codecs written in C. The error handler gets the
704 necessary state information such as the string being converted, the
705 position in the string where the error was detected, and the target
706 encoding. The handler can then either raise an exception or return a
707 replacement string.
709 Two additional error handlers have been implemented using this
710 framework: ``backslashreplace'' uses Python backslash quoting to
711 represent unencodable characters and ``xmlcharrefreplace'' emits
712 XML character references.
714 \begin{seealso}
716 \seepep{293}{Codec Error Handling Callbacks}{Written and implemented by
717 Walter D\"orwald.}
719 \end{seealso}
722 %======================================================================
723 \section{PEP 301: Package Index and Metadata for
724 Distutils\label{section-pep301}}
726 Support for the long-requested Python catalog makes its first
727 appearance in 2.3.
729 The heart of the catalog is the new Distutils \command{register} command.
730 Running \code{python setup.py register} will collect the metadata
731 describing a package, such as its name, version, maintainer,
732 description, \&c., and send it to a central catalog server. The
733 resulting catalog is available from \url{http://www.python.org/pypi}.
735 To make the catalog a bit more useful, a new optional
736 \var{classifiers} keyword argument has been added to the Distutils
737 \function{setup()} function. A list of
738 \ulink{Trove}{http://catb.org/\textasciitilde esr/trove/}-style
739 strings can be supplied to help classify the software.
741 Here's an example \file{setup.py} with classifiers, written to be compatible
742 with older versions of the Distutils:
744 \begin{verbatim}
745 from distutils import core
746 kw = {'name': "Quixote",
747 'version': "0.5.1",
748 'description': "A highly Pythonic Web application framework",
749 # ...
752 if (hasattr(core, 'setup_keywords') and
753 'classifiers' in core.setup_keywords):
754 kw['classifiers'] = \
755 ['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
756 'Environment :: No Input/Output (Daemon)',
757 'Intended Audience :: Developers'],
759 core.setup(**kw)
760 \end{verbatim}
762 The full list of classifiers can be obtained by running
763 \verb|python setup.py register --list-classifiers|.
765 \begin{seealso}
767 \seepep{301}{Package Index and Metadata for Distutils}{Written and
768 implemented by Richard Jones.}
770 \end{seealso}
773 %======================================================================
774 \section{PEP 302: New Import Hooks \label{section-pep302}}
776 While it's been possible to write custom import hooks ever since the
777 \module{ihooks} module was introduced in Python 1.3, no one has ever
778 been really happy with it because writing new import hooks is
779 difficult and messy. There have been various proposed alternatives
780 such as the \module{imputil} and \module{iu} modules, but none of them
781 has ever gained much acceptance, and none of them were easily usable
782 from \C{} code.
784 \pep{302} borrows ideas from its predecessors, especially from
785 Gordon McMillan's \module{iu} module. Three new items
786 are added to the \module{sys} module:
788 \begin{itemize}
789 \item \code{sys.path_hooks} is a list of callable objects; most
790 often they'll be classes. Each callable takes a string containing a
791 path and either returns an importer object that will handle imports
792 from this path or raises an \exception{ImportError} exception if it
793 can't handle this path.
795 \item \code{sys.path_importer_cache} caches importer objects for
796 each path, so \code{sys.path_hooks} will only need to be traversed
797 once for each path.
799 \item \code{sys.meta_path} is a list of importer objects that will
800 be traversed before \code{sys.path} is checked. This list is
801 initially empty, but user code can add objects to it. Additional
802 built-in and frozen modules can be imported by an object added to
803 this list.
805 \end{itemize}
807 Importer objects must have a single method,
808 \method{find_module(\var{fullname}, \var{path}=None)}. \var{fullname}
809 will be a module or package name, e.g. \samp{string} or
810 \samp{distutils.core}. \method{find_module()} must return a loader object
811 that has a single method, \method{load_module(\var{fullname})}, that
812 creates and returns the corresponding module object.
814 Pseudo-code for Python's new import logic, therefore, looks something
815 like this (simplified a bit; see \pep{302} for the full details):
817 \begin{verbatim}
818 for mp in sys.meta_path:
819 loader = mp(fullname)
820 if loader is not None:
821 <module> = loader.load_module(fullname)
823 for path in sys.path:
824 for hook in sys.path_hooks:
825 try:
826 importer = hook(path)
827 except ImportError:
828 # ImportError, so try the other path hooks
829 pass
830 else:
831 loader = importer.find_module(fullname)
832 <module> = loader.load_module(fullname)
834 # Not found!
835 raise ImportError
836 \end{verbatim}
838 \begin{seealso}
840 \seepep{302}{New Import Hooks}{Written by Just van~Rossum and Paul Moore.
841 Implemented by Just van~Rossum.
844 \end{seealso}
847 %======================================================================
848 \section{PEP 305: Comma-separated Files \label{section-pep305}}
850 Comma-separated files are a format frequently used for exporting data
851 from databases and spreadsheets. Python 2.3 adds a parser for
852 comma-separated files.
854 Comma-separated format is deceptively simple at first glance:
856 \begin{verbatim}
857 Costs,150,200,3.95
858 \end{verbatim}
860 Read a line and call \code{line.split(',')}: what could be simpler?
861 But toss in string data that can contain commas, and things get more
862 complicated:
864 \begin{verbatim}
865 "Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"
866 \end{verbatim}
868 A big ugly regular expression can parse this, but using the new
869 \module{csv} package is much simpler:
871 \begin{verbatim}
872 import csv
874 input = open('datafile', 'rb')
875 reader = csv.reader(input)
876 for line in reader:
877 print line
878 \end{verbatim}
880 The \function{reader} function takes a number of different options.
881 The field separator isn't limited to the comma and can be changed to
882 any character, and so can the quoting and line-ending characters.
884 Different dialects of comma-separated files can be defined and
885 registered; currently there are two dialects, both used by Microsoft Excel.
886 A separate \class{csv.writer} class will generate comma-separated files
887 from a succession of tuples or lists, quoting strings that contain the
888 delimiter.
890 \begin{seealso}
892 \seepep{305}{CSV File API}{Written and implemented
893 by Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells.
896 \end{seealso}
898 %======================================================================
899 \section{PEP 307: Pickle Enhancements \label{section-pep305}}
901 The \module{pickle} and \module{cPickle} modules received some
902 attention during the 2.3 development cycle. In 2.2, new-style classes
903 could be pickled without difficulty, but they weren't pickled very
904 compactly; \pep{307} quotes a trivial example where a new-style class
905 results in a pickled string three times longer than that for a classic
906 class.
908 The solution was to invent a new pickle protocol. The
909 \function{pickle.dumps()} function has supported a text-or-binary flag
910 for a long time. In 2.3, this flag is redefined from a Boolean to an
911 integer: 0 is the old text-mode pickle format, 1 is the old binary
912 format, and now 2 is a new 2.3-specific format. A new constant,
913 \constant{pickle.HIGHEST_PROTOCOL}, can be used to select the fanciest
914 protocol available.
916 Unpickling is no longer considered a safe operation. 2.2's
917 \module{pickle} provided hooks for trying to prevent unsafe classes
918 from being unpickled (specifically, a
919 \member{__safe_for_unpickling__} attribute), but none of this code
920 was ever audited and therefore it's all been ripped out in 2.3. You
921 should not unpickle untrusted data in any version of Python.
923 To reduce the pickling overhead for new-style classes, a new interface
924 for customizing pickling was added using three special methods:
925 \method{__getstate__}, \method{__setstate__}, and
926 \method{__getnewargs__}. Consult \pep{307} for the full semantics
927 of these methods.
929 As a way to compress pickles yet further, it's now possible to use
930 integer codes instead of long strings to identify pickled classes.
931 The Python Software Foundation will maintain a list of standardized
932 codes; there's also a range of codes for private use. Currently no
933 codes have been specified.
935 \begin{seealso}
937 \seepep{307}{Extensions to the pickle protocol}{Written and implemented
938 by Guido van Rossum and Tim Peters.}
940 \end{seealso}
942 %======================================================================
943 \section{Extended Slices\label{section-slices}}
945 Ever since Python 1.4, the slicing syntax has supported an optional
946 third ``step'' or ``stride'' argument. For example, these are all
947 legal Python syntax: \code{L[1:10:2]}, \code{L[:-1:1]},
948 \code{L[::-1]}. This was added to Python at the request of
949 the developers of Numerical Python, which uses the third argument
950 extensively. However, Python's built-in list, tuple, and string
951 sequence types have never supported this feature, raising a
952 \exception{TypeError} if you tried it. Michael Hudson contributed a
953 patch to fix this shortcoming.
955 For example, you can now easily extract the elements of a list that
956 have even indexes:
958 \begin{verbatim}
959 >>> L = range(10)
960 >>> L[::2]
961 [0, 2, 4, 6, 8]
962 \end{verbatim}
964 Negative values also work to make a copy of the same list in reverse
965 order:
967 \begin{verbatim}
968 >>> L[::-1]
969 [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
970 \end{verbatim}
972 This also works for tuples, arrays, and strings:
974 \begin{verbatim}
975 >>> s='abcd'
976 >>> s[::2]
977 'ac'
978 >>> s[::-1]
979 'dcba'
980 \end{verbatim}
982 If you have a mutable sequence such as a list or an array you can
983 assign to or delete an extended slice, but there are some differences
984 between assignment to extended and regular slices. Assignment to a
985 regular slice can be used to change the length of the sequence:
987 \begin{verbatim}
988 >>> a = range(3)
989 >>> a
990 [0, 1, 2]
991 >>> a[1:3] = [4, 5, 6]
992 >>> a
993 [0, 4, 5, 6]
994 \end{verbatim}
996 Extended slices aren't this flexible. When assigning to an extended
997 slice, the list on the right hand side of the statement must contain
998 the same number of items as the slice it is replacing:
1000 \begin{verbatim}
1001 >>> a = range(4)
1002 >>> a
1003 [0, 1, 2, 3]
1004 >>> a[::2]
1005 [0, 2]
1006 >>> a[::2] = [0, -1]
1007 >>> a
1008 [0, 1, -1, 3]
1009 >>> a[::2] = [0,1,2]
1010 Traceback (most recent call last):
1011 File "<stdin>", line 1, in ?
1012 ValueError: attempt to assign sequence of size 3 to extended slice of size 2
1013 \end{verbatim}
1015 Deletion is more straightforward:
1017 \begin{verbatim}
1018 >>> a = range(4)
1019 >>> a
1020 [0, 1, 2, 3]
1021 >>> a[::2]
1022 [0, 2]
1023 >>> del a[::2]
1024 >>> a
1025 [1, 3]
1026 \end{verbatim}
1028 One can also now pass slice objects to the
1029 \method{__getitem__} methods of the built-in sequences:
1031 \begin{verbatim}
1032 >>> range(10).__getitem__(slice(0, 5, 2))
1033 [0, 2, 4]
1034 \end{verbatim}
1036 Or use slice objects directly in subscripts:
1038 \begin{verbatim}
1039 >>> range(10)[slice(0, 5, 2)]
1040 [0, 2, 4]
1041 \end{verbatim}
1043 To simplify implementing sequences that support extended slicing,
1044 slice objects now have a method \method{indices(\var{length})} which,
1045 given the length of a sequence, returns a \code{(\var{start},
1046 \var{stop}, \var{step})} tuple that can be passed directly to
1047 \function{range()}.
1048 \method{indices()} handles omitted and out-of-bounds indices in a
1049 manner consistent with regular slices (and this innocuous phrase hides
1050 a welter of confusing details!). The method is intended to be used
1051 like this:
1053 \begin{verbatim}
1054 class FakeSeq:
1056 def calc_item(self, i):
1058 def __getitem__(self, item):
1059 if isinstance(item, slice):
1060 indices = item.indices(len(self))
1061 return FakeSeq([self.calc_item(i) for i in range(*indices)])
1062 else:
1063 return self.calc_item(i)
1064 \end{verbatim}
1066 From this example you can also see that the built-in \class{slice}
1067 object is now the type object for the slice type, and is no longer a
1068 function. This is consistent with Python 2.2, where \class{int},
1069 \class{str}, etc., underwent the same change.
1072 %======================================================================
1073 \section{Other Language Changes}
1075 Here are all of the changes that Python 2.3 makes to the core Python
1076 language.
1078 \begin{itemize}
1079 \item The \keyword{yield} statement is now always a keyword, as
1080 described in section~\ref{section-generators} of this document.
1082 \item A new built-in function \function{enumerate()}
1083 was added, as described in section~\ref{section-enumerate} of this
1084 document.
1086 \item Two new constants, \constant{True} and \constant{False} were
1087 added along with the built-in \class{bool} type, as described in
1088 section~\ref{section-bool} of this document.
1090 \item The \function{int()} type constructor will now return a long
1091 integer instead of raising an \exception{OverflowError} when a string
1092 or floating-point number is too large to fit into an integer. This
1093 can lead to the paradoxical result that
1094 \code{isinstance(int(\var{expression}), int)} is false, but that seems
1095 unlikely to cause problems in practice.
1097 \item Built-in types now support the extended slicing syntax,
1098 as described in section~\ref{section-slices} of this document.
1100 \item A new built-in function, \function{sum(\var{iterable}, \var{start}=0)},
1101 adds up the numeric items in the iterable object and returns their sum.
1102 \function{sum()} only accepts numbers, meaning that you can't use it
1103 to concatenate a bunch of strings. (Contributed by Alex
1104 Martelli.)
1106 \item \code{list.insert(\var{pos}, \var{value})} used to
1107 insert \var{value} at the front of the list when \var{pos} was
1108 negative. The behaviour has now been changed to be consistent with
1109 slice indexing, so when \var{pos} is -1 the value will be inserted
1110 before the last element, and so forth.
1112 \item \code{list.index(\var{value})}, which searches for \var{value}
1113 within the list and returns its index, now takes optional
1114 \var{start} and \var{stop} arguments to limit the search to
1115 only part of the list.
1117 \item Dictionaries have a new method, \method{pop(\var{key}\optional{,
1118 \var{default}})}, that returns the value corresponding to \var{key}
1119 and removes that key/value pair from the dictionary. If the requested
1120 key isn't present in the dictionary, \var{default} is returned if it's
1121 specified and \exception{KeyError} raised if it isn't.
1123 \begin{verbatim}
1124 >>> d = {1:2}
1125 >>> d
1126 {1: 2}
1127 >>> d.pop(4)
1128 Traceback (most recent call last):
1129 File "stdin", line 1, in ?
1130 KeyError: 4
1131 >>> d.pop(1)
1133 >>> d.pop(1)
1134 Traceback (most recent call last):
1135 File "stdin", line 1, in ?
1136 KeyError: 'pop(): dictionary is empty'
1137 >>> d
1140 \end{verbatim}
1142 There's also a new class method,
1143 \method{dict.fromkeys(\var{iterable}, \var{value})}, that
1144 creates a dictionary with keys taken from the supplied iterator
1145 \var{iterable} and all values set to \var{value}, defaulting to
1146 \code{None}.
1148 (Patches contributed by Raymond Hettinger.)
1150 Also, the \function{dict()} constructor now accepts keyword arguments to
1151 simplify creating small dictionaries:
1153 \begin{verbatim}
1154 >>> dict(red=1, blue=2, green=3, black=4)
1155 {'blue': 2, 'black': 4, 'green': 3, 'red': 1}
1156 \end{verbatim}
1158 (Contributed by Just van~Rossum.)
1160 \item The \keyword{assert} statement no longer checks the \code{__debug__}
1161 flag, so you can no longer disable assertions by assigning to \code{__debug__}.
1162 Running Python with the \programopt{-O} switch will still generate
1163 code that doesn't execute any assertions.
1165 \item Most type objects are now callable, so you can use them
1166 to create new objects such as functions, classes, and modules. (This
1167 means that the \module{new} module can be deprecated in a future
1168 Python version, because you can now use the type objects available in
1169 the \module{types} module.)
1170 % XXX should new.py use PendingDeprecationWarning?
1171 For example, you can create a new module object with the following code:
1173 \begin{verbatim}
1174 >>> import types
1175 >>> m = types.ModuleType('abc','docstring')
1176 >>> m
1177 <module 'abc' (built-in)>
1178 >>> m.__doc__
1179 'docstring'
1180 \end{verbatim}
1182 \item
1183 A new warning, \exception{PendingDeprecationWarning} was added to
1184 indicate features which are in the process of being
1185 deprecated. The warning will \emph{not} be printed by default. To
1186 check for use of features that will be deprecated in the future,
1187 supply \programopt{-Walways::PendingDeprecationWarning::} on the
1188 command line or use \function{warnings.filterwarnings()}.
1190 \item The process of deprecating string-based exceptions, as
1191 in \code{raise "Error occurred"}, has begun. Raising a string will
1192 now trigger \exception{PendingDeprecationWarning}.
1194 \item Using \code{None} as a variable name will now result in a
1195 \exception{SyntaxWarning} warning. In a future version of Python,
1196 \code{None} may finally become a keyword.
1198 \item The \method{xreadlines()} method of file objects, introduced in
1199 Python 2.1, is no longer necessary because files now behave as their
1200 own iterator. \method{xreadlines()} was originally introduced as a
1201 faster way to loop over all the lines in a file, but now you can
1202 simply write \code{for line in file_obj}. File objects also have a
1203 new read-only \member{encoding} attribute that gives the encoding used
1204 by the file; Unicode strings written to the file will be automatically
1205 converted to bytes using the given encoding.
1207 \item The method resolution order used by new-style classes has
1208 changed, though you'll only notice the difference if you have a really
1209 complicated inheritance hierarchy. Classic classes are unaffected by
1210 this change. Python 2.2 originally used a topological sort of a
1211 class's ancestors, but 2.3 now uses the C3 algorithm as described in
1212 the paper \ulink{``A Monotonic Superclass Linearization for
1213 Dylan''}{http://www.webcom.com/haahr/dylan/linearization-oopsla96.html}.
1214 To understand the motivation for this change,
1215 read Michele Simionato's article
1216 \ulink{``Python 2.3 Method Resolution Order''}
1217 {http://www.python.org/2.3/mro.html}, or
1218 read the thread on python-dev starting with the message at
1219 \url{http://mail.python.org/pipermail/python-dev/2002-October/029035.html}.
1220 Samuele Pedroni first pointed out the problem and also implemented the
1221 fix by coding the C3 algorithm.
1223 \item Python runs multithreaded programs by switching between threads
1224 after executing N bytecodes. The default value for N has been
1225 increased from 10 to 100 bytecodes, speeding up single-threaded
1226 applications by reducing the switching overhead. Some multithreaded
1227 applications may suffer slower response time, but that's easily fixed
1228 by setting the limit back to a lower number using
1229 \function{sys.setcheckinterval(\var{N})}.
1230 The limit can be retrieved with the new
1231 \function{sys.getcheckinterval()} function.
1233 \item One minor but far-reaching change is that the names of extension
1234 types defined by the modules included with Python now contain the
1235 module and a \character{.} in front of the type name. For example, in
1236 Python 2.2, if you created a socket and printed its
1237 \member{__class__}, you'd get this output:
1239 \begin{verbatim}
1240 >>> s = socket.socket()
1241 >>> s.__class__
1242 <type 'socket'>
1243 \end{verbatim}
1245 In 2.3, you get this:
1246 \begin{verbatim}
1247 >>> s.__class__
1248 <type '_socket.socket'>
1249 \end{verbatim}
1251 \item One of the noted incompatibilities between old- and new-style
1252 classes has been removed: you can now assign to the
1253 \member{__name__} and \member{__bases__} attributes of new-style
1254 classes. There are some restrictions on what can be assigned to
1255 \member{__bases__} along the lines of those relating to assigning to
1256 an instance's \member{__class__} attribute.
1258 \end{itemize}
1261 %======================================================================
1262 \subsection{String Changes}
1264 \begin{itemize}
1266 \item The \keyword{in} operator now works differently for strings.
1267 Previously, when evaluating \code{\var{X} in \var{Y}} where \var{X}
1268 and \var{Y} are strings, \var{X} could only be a single character.
1269 That's now changed; \var{X} can be a string of any length, and
1270 \code{\var{X} in \var{Y}} will return \constant{True} if \var{X} is a
1271 substring of \var{Y}. If \var{X} is the empty string, the result is
1272 always \constant{True}.
1274 \begin{verbatim}
1275 >>> 'ab' in 'abcd'
1276 True
1277 >>> 'ad' in 'abcd'
1278 False
1279 >>> '' in 'abcd'
1280 True
1281 \end{verbatim}
1283 Note that this doesn't tell you where the substring starts; if you
1284 need that information, use the \method{find()} string method.
1286 \item The \method{strip()}, \method{lstrip()}, and \method{rstrip()}
1287 string methods now have an optional argument for specifying the
1288 characters to strip. The default is still to remove all whitespace
1289 characters:
1291 \begin{verbatim}
1292 >>> ' abc '.strip()
1293 'abc'
1294 >>> '><><abc<><><>'.strip('<>')
1295 'abc'
1296 >>> '><><abc<><><>\n'.strip('<>')
1297 'abc<><><>\n'
1298 >>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
1299 u'\u4001abc'
1301 \end{verbatim}
1303 (Suggested by Simon Brunning and implemented by Walter D\"orwald.)
1305 \item The \method{startswith()} and \method{endswith()}
1306 string methods now accept negative numbers for the \var{start} and \var{end}
1307 parameters.
1309 \item Another new string method is \method{zfill()}, originally a
1310 function in the \module{string} module. \method{zfill()} pads a
1311 numeric string with zeros on the left until it's the specified width.
1312 Note that the \code{\%} operator is still more flexible and powerful
1313 than \method{zfill()}.
1315 \begin{verbatim}
1316 >>> '45'.zfill(4)
1317 '0045'
1318 >>> '12345'.zfill(4)
1319 '12345'
1320 >>> 'goofy'.zfill(6)
1321 '0goofy'
1322 \end{verbatim}
1324 (Contributed by Walter D\"orwald.)
1326 \item A new type object, \class{basestring}, has been added.
1327 Both 8-bit strings and Unicode strings inherit from this type, so
1328 \code{isinstance(obj, basestring)} will return \constant{True} for
1329 either kind of string. It's a completely abstract type, so you
1330 can't create \class{basestring} instances.
1332 \item Interned strings are no longer immortal and will now be
1333 garbage-collected in the usual way when the only reference to them is
1334 from the internal dictionary of interned strings. (Implemented by
1335 Oren Tirosh.)
1337 \end{itemize}
1340 %======================================================================
1341 \subsection{Optimizations}
1343 \begin{itemize}
1345 \item The creation of new-style class instances has been made much
1346 faster; they're now faster than classic classes!
1348 \item The \method{sort()} method of list objects has been extensively
1349 rewritten by Tim Peters, and the implementation is significantly
1350 faster.
1352 \item Multiplication of large long integers is now much faster thanks
1353 to an implementation of Karatsuba multiplication, an algorithm that
1354 scales better than the O(n*n) required for the grade-school
1355 multiplication algorithm. (Original patch by Christopher A. Craig,
1356 and significantly reworked by Tim Peters.)
1358 \item The \code{SET_LINENO} opcode is now gone. This may provide a
1359 small speed increase, depending on your compiler's idiosyncrasies.
1360 See section~\ref{section-other} for a longer explanation.
1361 (Removed by Michael Hudson.)
1363 \item \function{xrange()} objects now have their own iterator, making
1364 \code{for i in xrange(n)} slightly faster than
1365 \code{for i in range(n)}. (Patch by Raymond Hettinger.)
1367 \item A number of small rearrangements have been made in various
1368 hotspots to improve performance, such as inlining a function or removing
1369 some code. (Implemented mostly by GvR, but lots of people have
1370 contributed single changes.)
1372 \end{itemize}
1374 The net result of the 2.3 optimizations is that Python 2.3 runs the
1375 pystone benchmark around 25\% faster than Python 2.2.
1378 %======================================================================
1379 \section{New, Improved, and Deprecated Modules}
1381 As usual, Python's standard library received a number of enhancements and
1382 bug fixes. Here's a partial list of the most notable changes, sorted
1383 alphabetically by module name. Consult the
1384 \file{Misc/NEWS} file in the source tree for a more
1385 complete list of changes, or look through the CVS logs for all the
1386 details.
1388 \begin{itemize}
1390 \item The \module{array} module now supports arrays of Unicode
1391 characters using the \character{u} format character. Arrays also now
1392 support using the \code{+=} assignment operator to add another array's
1393 contents, and the \code{*=} assignment operator to repeat an array.
1394 (Contributed by Jason Orendorff.)
1396 \item The \module{bsddb} module has been replaced by version 4.1.6
1397 of the \ulink{PyBSDDB}{http://pybsddb.sourceforge.net} package,
1398 providing a more complete interface to the transactional features of
1399 the BerkeleyDB library.
1401 The old version of the module has been renamed to
1402 \module{bsddb185} and is no longer built automatically; you'll
1403 have to edit \file{Modules/Setup} to enable it. Note that the new
1404 \module{bsddb} package is intended to be compatible with the
1405 old module, so be sure to file bugs if you discover any
1406 incompatibilities. When upgrading to Python 2.3, if the new interpreter is compiled
1407 with a new version of
1408 the underlying BerkeleyDB library, you will almost certainly have to
1409 convert your database files to the new version. You can do this
1410 fairly easily with the new scripts \file{db2pickle.py} and
1411 \file{pickle2db.py} which you will find in the distribution's
1412 \file{Tools/scripts} directory. If you've already been using the PyBSDDB
1413 package and importing it as \module{bsddb3}, you will have to change your
1414 \code{import} statements to import it as \module{bsddb}.
1416 \item The new \module{bz2} module is an interface to the bz2 data
1417 compression library. bz2-compressed data is usually smaller than
1418 corresponding \module{zlib}-compressed data. (Contributed by Gustavo Niemeyer.)
1420 \item A set of standard date/time types has been added in the new \module{datetime}
1421 module. See the following section for more details.
1423 \item The Distutils \class{Extension} class now supports
1424 an extra constructor argument named \var{depends} for listing
1425 additional source files that an extension depends on. This lets
1426 Distutils recompile the module if any of the dependency files are
1427 modified. For example, if \file{sampmodule.c} includes the header
1428 file \file{sample.h}, you would create the \class{Extension} object like
1429 this:
1431 \begin{verbatim}
1432 ext = Extension("samp",
1433 sources=["sampmodule.c"],
1434 depends=["sample.h"])
1435 \end{verbatim}
1437 Modifying \file{sample.h} would then cause the module to be recompiled.
1438 (Contributed by Jeremy Hylton.)
1440 \item Other minor changes to Distutils:
1441 it now checks for the \envvar{CC}, \envvar{CFLAGS}, \envvar{CPP},
1442 \envvar{LDFLAGS}, and \envvar{CPPFLAGS} environment variables, using
1443 them to override the settings in Python's configuration (contributed
1444 by Robert Weber).
1446 \item Previously the \module{doctest} module would only search the
1447 docstrings of public methods and functions for test cases, but it now
1448 also examines private ones as well. The \function{DocTestSuite(}
1449 function creates a \class{unittest.TestSuite} object from a set of
1450 \module{doctest} tests.
1452 \item The new \function{gc.get_referents(\var{object})} function returns a
1453 list of all the objects referenced by \var{object}.
1455 \item The \module{getopt} module gained a new function,
1456 \function{gnu_getopt()}, that supports the same arguments as the existing
1457 \function{getopt()} function but uses GNU-style scanning mode.
1458 The existing \function{getopt()} stops processing options as soon as a
1459 non-option argument is encountered, but in GNU-style mode processing
1460 continues, meaning that options and arguments can be mixed. For
1461 example:
1463 \begin{verbatim}
1464 >>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1465 ([('-f', 'filename')], ['output', '-v'])
1466 >>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1467 ([('-f', 'filename'), ('-v', '')], ['output'])
1468 \end{verbatim}
1470 (Contributed by Peter \AA{strand}.)
1472 \item The \module{grp}, \module{pwd}, and \module{resource} modules
1473 now return enhanced tuples:
1475 \begin{verbatim}
1476 >>> import grp
1477 >>> g = grp.getgrnam('amk')
1478 >>> g.gr_name, g.gr_gid
1479 ('amk', 500)
1480 \end{verbatim}
1482 \item The \module{gzip} module can now handle files exceeding 2~Gb.
1484 \item The new \module{heapq} module contains an implementation of a
1485 heap queue algorithm. A heap is an array-like data structure that
1486 keeps items in a partially sorted order such that, for every index
1487 \var{k}, \code{heap[\var{k}] <= heap[2*\var{k}+1]} and
1488 \code{heap[\var{k}] <= heap[2*\var{k}+2]}. This makes it quick to
1489 remove the smallest item, and inserting a new item while maintaining
1490 the heap property is O(lg~n). (See
1491 \url{http://www.nist.gov/dads/HTML/priorityque.html} for more
1492 information about the priority queue data structure.)
1494 The \module{heapq} module provides \function{heappush()} and
1495 \function{heappop()} functions for adding and removing items while
1496 maintaining the heap property on top of some other mutable Python
1497 sequence type. Here's an example that uses a Python list:
1499 \begin{verbatim}
1500 >>> import heapq
1501 >>> heap = []
1502 >>> for item in [3, 7, 5, 11, 1]:
1503 ... heapq.heappush(heap, item)
1505 >>> heap
1506 [1, 3, 5, 11, 7]
1507 >>> heapq.heappop(heap)
1509 >>> heapq.heappop(heap)
1511 >>> heap
1512 [5, 7, 11]
1513 \end{verbatim}
1515 (Contributed by Kevin O'Connor.)
1517 \item The IDLE integrated development environment has been updated
1518 using the code from the IDLEfork project
1519 (\url{http://idlefork.sf.net}). The most notable feature is that the
1520 code being developed is now executed in a subprocess, meaning that
1521 there's no longer any need for manual \code{reload()} operations.
1522 IDLE's core code has been incorporated into the standard library as the
1523 \module{idlelib} package.
1525 \item The \module{imaplib} module now supports IMAP over SSL.
1526 (Contributed by Piers Lauder and Tino Lange.)
1528 \item The \module{itertools} contains a number of useful functions for
1529 use with iterators, inspired by various functions provided by the ML
1530 and Haskell languages. For example,
1531 \code{itertools.ifilter(predicate, iterator)} returns all elements in
1532 the iterator for which the function \function{predicate()} returns
1533 \constant{True}, and \code{itertools.repeat(obj, \var{N})} returns
1534 \code{obj} \var{N} times. There are a number of other functions in
1535 the module; see the \ulink{package's reference
1536 documentation}{../lib/module-itertools.html} for details.
1537 (Contributed by Raymond Hettinger.)
1539 \item Two new functions in the \module{math} module,
1540 \function{degrees(\var{rads})} and \function{radians(\var{degs})},
1541 convert between radians and degrees. Other functions in the
1542 \module{math} module such as \function{math.sin()} and
1543 \function{math.cos()} have always required input values measured in
1544 radians. Also, an optional \var{base} argument was added to
1545 \function{math.log()} to make it easier to compute logarithms for
1546 bases other than \code{e} and \code{10}. (Contributed by Raymond
1547 Hettinger.)
1549 \item Several new POSIX functions (\function{getpgid()}, \function{killpg()},
1550 \function{lchown()}, \function{loadavg()}, \function{major()}, \function{makedev()},
1551 \function{minor()}, and \function{mknod()}) were added to the
1552 \module{posix} module that underlies the \module{os} module.
1553 (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.)
1555 \item In the \module{os} module, the \function{*stat()} family of
1556 functions can now report fractions of a second in a timestamp. Such
1557 time stamps are represented as floats, similar to
1558 the value returned by \function{time.time()}.
1560 During testing, it was found that some applications will break if time
1561 stamps are floats. For compatibility, when using the tuple interface
1562 of the \class{stat_result} time stamps will be represented as integers.
1563 When using named fields (a feature first introduced in Python 2.2),
1564 time stamps are still represented as integers, unless
1565 \function{os.stat_float_times()} is invoked to enable float return
1566 values:
1568 \begin{verbatim}
1569 >>> os.stat("/tmp").st_mtime
1570 1034791200
1571 >>> os.stat_float_times(True)
1572 >>> os.stat("/tmp").st_mtime
1573 1034791200.6335014
1574 \end{verbatim}
1576 In Python 2.4, the default will change to always returning floats.
1578 Application developers should enable this feature only if all their
1579 libraries work properly when confronted with floating point time
1580 stamps, or if they use the tuple API. If used, the feature should be
1581 activated on an application level instead of trying to enable it on a
1582 per-use basis.
1584 \item The \module{optparse} module contains a new parser for command-line arguments
1585 that can convert option values to a particular Python type
1586 and will automatically generate a usage message. See the following section for
1587 more details.
1589 \item The old and never-documented \module{linuxaudiodev} module has
1590 been deprecated, and a new version named \module{ossaudiodev} has been
1591 added. The module was renamed because the OSS sound drivers can be
1592 used on platforms other than Linux, and the interface has also been
1593 tidied and brought up to date in various ways. (Contributed by Greg
1594 Ward and Nicholas FitzRoy-Dale.)
1596 \item The new \module{platform} module contains a number of functions
1597 that try to determine various properties of the platform you're
1598 running on. There are functions for getting the architecture, CPU
1599 type, the Windows OS version, and even the Linux distribution version.
1600 (Contributed by Marc-Andr\'e Lemburg.)
1602 \item The parser objects provided by the \module{pyexpat} module
1603 can now optionally buffer character data, resulting in fewer calls to
1604 your character data handler and therefore faster performance. Setting
1605 the parser object's \member{buffer_text} attribute to \constant{True}
1606 will enable buffering.
1608 \item The \function{sample(\var{population}, \var{k})} function was
1609 added to the \module{random} module. \var{population} is a sequence or
1610 \class{xrange} object containing the elements of a population, and
1611 \function{sample()} chooses \var{k} elements from the population without
1612 replacing chosen elements. \var{k} can be any value up to
1613 \code{len(\var{population})}. For example:
1615 \begin{verbatim}
1616 >>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn']
1617 >>> random.sample(days, 3) # Choose 3 elements
1618 ['St', 'Sn', 'Th']
1619 >>> random.sample(days, 7) # Choose 7 elements
1620 ['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']
1621 >>> random.sample(days, 7) # Choose 7 again
1622 ['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']
1623 >>> random.sample(days, 8) # Can't choose eight
1624 Traceback (most recent call last):
1625 File "<stdin>", line 1, in ?
1626 File "random.py", line 414, in sample
1627 raise ValueError, "sample larger than population"
1628 ValueError: sample larger than population
1629 >>> random.sample(xrange(1,10000,2), 10) # Choose ten odd nos. under 10000
1630 [3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
1631 \end{verbatim}
1633 The \module{random} module now uses a new algorithm, the Mersenne
1634 Twister, implemented in C. It's faster and more extensively studied
1635 than the previous algorithm.
1637 (All changes contributed by Raymond Hettinger.)
1639 \item The \module{readline} module also gained a number of new
1640 functions: \function{get_history_item()},
1641 \function{get_current_history_length()}, and \function{redisplay()}.
1643 \item The \module{rexec} and \module{Bastion} modules have been
1644 declared dead, and attempts to import them will fail with a
1645 \exception{RuntimeError}. New-style classes provide new ways to break
1646 out of the restricted execution environment provided by
1647 \module{rexec}, and no one has interest in fixing them or time to do
1648 so. If you have applications using \module{rexec}, rewrite them to
1649 use something else.
1651 (Sticking with Python 2.2 or 2.1 will not make your applications any
1652 safer because there are known bugs in the \module{rexec} module in
1653 those versions. To repeat: if you're using \module{rexec}, stop using
1654 it immediately.)
1656 \item The \module{rotor} module has been deprecated because the
1657 algorithm it uses for encryption is not believed to be secure. If
1658 you need encryption, use one of the several AES Python modules
1659 that are available separately.
1661 \item The \module{shutil} module gained a \function{move(\var{src},
1662 \var{dest})} function that recursively moves a file or directory to a new
1663 location.
1665 \item Support for more advanced POSIX signal handling was added
1666 to the \module{signal} but then removed again as it proved impossible
1667 to make it work reliably across platforms.
1669 \item The \module{socket} module now supports timeouts. You
1670 can call the \method{settimeout(\var{t})} method on a socket object to
1671 set a timeout of \var{t} seconds. Subsequent socket operations that
1672 take longer than \var{t} seconds to complete will abort and raise a
1673 \exception{socket.timeout} exception.
1675 The original timeout implementation was by Tim O'Malley. Michael
1676 Gilfix integrated it into the Python \module{socket} module and
1677 shepherded it through a lengthy review. After the code was checked
1678 in, Guido van~Rossum rewrote parts of it. (This is a good example of
1679 a collaborative development process in action.)
1681 \item On Windows, the \module{socket} module now ships with Secure
1682 Sockets Layer (SSL) support.
1684 \item The value of the C \constant{PYTHON_API_VERSION} macro is now
1685 exposed at the Python level as \code{sys.api_version}. The current
1686 exception can be cleared by calling the new \function{sys.exc_clear()}
1687 function.
1689 \item The new \module{tarfile} module
1690 allows reading from and writing to \program{tar}-format archive files.
1691 (Contributed by Lars Gust\"abel.)
1693 \item The new \module{textwrap} module contains functions for wrapping
1694 strings containing paragraphs of text. The \function{wrap(\var{text},
1695 \var{width})} function takes a string and returns a list containing
1696 the text split into lines of no more than the chosen width. The
1697 \function{fill(\var{text}, \var{width})} function returns a single
1698 string, reformatted to fit into lines no longer than the chosen width.
1699 (As you can guess, \function{fill()} is built on top of
1700 \function{wrap()}. For example:
1702 \begin{verbatim}
1703 >>> import textwrap
1704 >>> paragraph = "Not a whit, we defy augury: ... more text ..."
1705 >>> textwrap.wrap(paragraph, 60)
1706 ["Not a whit, we defy augury: there's a special providence in",
1707 "the fall of a sparrow. If it be now, 'tis not to come; if it",
1708 ...]
1709 >>> print textwrap.fill(paragraph, 35)
1710 Not a whit, we defy augury: there's
1711 a special providence in the fall of
1712 a sparrow. If it be now, 'tis not
1713 to come; if it be not to come, it
1714 will be now; if it be not now, yet
1715 it will come: the readiness is all.
1717 \end{verbatim}
1719 The module also contains a \class{TextWrapper} class that actually
1720 implements the text wrapping strategy. Both the
1721 \class{TextWrapper} class and the \function{wrap()} and
1722 \function{fill()} functions support a number of additional keyword
1723 arguments for fine-tuning the formatting; consult the \ulink{module's
1724 documentation}{../lib/module-textwrap.html} for details.
1725 (Contributed by Greg Ward.)
1727 \item The \module{thread} and \module{threading} modules now have
1728 companion modules, \module{dummy_thread} and \module{dummy_threading},
1729 that provide a do-nothing implementation of the \module{thread}
1730 module's interface for platforms where threads are not supported. The
1731 intention is to simplify thread-aware modules (ones that \emph{don't}
1732 rely on threads to run) by putting the following code at the top:
1734 \begin{verbatim}
1735 try:
1736 import threading as _threading
1737 except ImportError:
1738 import dummy_threading as _threading
1739 \end{verbatim}
1741 In this example, \module{_threading} is used as the module name to make
1742 it clear that the module being used is not necessarily the actual
1743 \module{threading} module. Code can call functions and use classes in
1744 \module{_threading} whether or not threads are supported, avoiding an
1745 \keyword{if} statement and making the code slightly clearer. This
1746 module will not magically make multithreaded code run without threads;
1747 code that waits for another thread to return or to do something will
1748 simply hang forever.
1750 \item The \module{time} module's \function{strptime()} function has
1751 long been an annoyance because it uses the platform C library's
1752 \function{strptime()} implementation, and different platforms
1753 sometimes have odd bugs. Brett Cannon contributed a portable
1754 implementation that's written in pure Python and should behave
1755 identically on all platforms.
1757 \item The new \module{timeit} module helps measure how long snippets
1758 of Python code take to execute. The \file{timeit.py} file can be run
1759 directly from the command line, or the module's \class{Timer} class
1760 can be imported and used directly. Here's a short example that
1761 figures out whether it's faster to convert an 8-bit string to Unicode
1762 by appending an empty Unicode string to it or by using the
1763 \function{unicode()} function:
1765 \begin{verbatim}
1766 import timeit
1768 timer1 = timeit.Timer('unicode("abc")')
1769 timer2 = timeit.Timer('"abc" + u""')
1771 # Run three trials
1772 print timer1.repeat(repeat=3, number=100000)
1773 print timer2.repeat(repeat=3, number=100000)
1775 # On my laptop this outputs:
1776 # [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]
1777 # [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
1778 \end{verbatim}
1780 \item The \module{Tix} module has received various bug fixes and
1781 updates for the current version of the Tix package.
1783 \item The \module{Tkinter} module now works with a thread-enabled
1784 version of Tcl. Tcl's threading model requires that widgets only be
1785 accessed from the thread in which they're created; accesses from
1786 another thread can cause Tcl to panic. For certain Tcl interfaces,
1787 \module{Tkinter} will now automatically avoid this
1788 when a widget is accessed from a different thread by marshalling a
1789 command, passing it to the correct thread, and waiting for the
1790 results. Other interfaces can't be handled automatically but
1791 \module{Tkinter} will now raise an exception on such an access so that
1792 you can at least find out about the problem. See
1793 \url{http://mail.python.org/pipermail/python-dev/2002-December/031107.html} %
1794 for a more detailed explanation of this change. (Implemented by
1795 Martin von~L\"owis.)
1797 \item Calling Tcl methods through \module{_tkinter} no longer
1798 returns only strings. Instead, if Tcl returns other objects those
1799 objects are converted to their Python equivalent, if one exists, or
1800 wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent
1801 exists. This behavior can be controlled through the
1802 \method{wantobjects()} method of \class{tkapp} objects.
1804 When using \module{_tkinter} through the \module{Tkinter} module (as
1805 most Tkinter applications will), this feature is always activated. It
1806 should not cause compatibility problems, since Tkinter would always
1807 convert string results to Python types where possible.
1809 If any incompatibilities are found, the old behavior can be restored
1810 by setting the \member{wantobjects} variable in the \module{Tkinter}
1811 module to false before creating the first \class{tkapp} object.
1813 \begin{verbatim}
1814 import Tkinter
1815 Tkinter.wantobjects = 0
1816 \end{verbatim}
1818 Any breakage caused by this change should be reported as a bug.
1820 \item The \module{UserDict} module has a new \class{DictMixin} class which
1821 defines all dictionary methods for classes that already have a minimum
1822 mapping interface. This greatly simplifies writing classes that need
1823 to be substitutable for dictionaries, such as the classes in
1824 the \module{shelve} module.
1826 Adding the mix-in as a superclass provides the full dictionary
1827 interface whenever the class defines \method{__getitem__},
1828 \method{__setitem__}, \method{__delitem__}, and \method{keys}.
1829 For example:
1831 \begin{verbatim}
1832 >>> import UserDict
1833 >>> class SeqDict(UserDict.DictMixin):
1834 ... """Dictionary lookalike implemented with lists."""
1835 ... def __init__(self):
1836 ... self.keylist = []
1837 ... self.valuelist = []
1838 ... def __getitem__(self, key):
1839 ... try:
1840 ... i = self.keylist.index(key)
1841 ... except ValueError:
1842 ... raise KeyError
1843 ... return self.valuelist[i]
1844 ... def __setitem__(self, key, value):
1845 ... try:
1846 ... i = self.keylist.index(key)
1847 ... self.valuelist[i] = value
1848 ... except ValueError:
1849 ... self.keylist.append(key)
1850 ... self.valuelist.append(value)
1851 ... def __delitem__(self, key):
1852 ... try:
1853 ... i = self.keylist.index(key)
1854 ... except ValueError:
1855 ... raise KeyError
1856 ... self.keylist.pop(i)
1857 ... self.valuelist.pop(i)
1858 ... def keys(self):
1859 ... return list(self.keylist)
1860 ...
1861 >>> s = SeqDict()
1862 >>> dir(s) # See that other dictionary methods are implemented
1863 ['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__',
1864 '__init__', '__iter__', '__len__', '__module__', '__repr__',
1865 '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems',
1866 'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem',
1867 'setdefault', 'update', 'valuelist', 'values']
1868 \end{verbatim}
1870 (Contributed by Raymond Hettinger.)
1872 \item The DOM implementation
1873 in \module{xml.dom.minidom} can now generate XML output in a
1874 particular encoding by providing an optional encoding argument to
1875 the \method{toxml()} and \method{toprettyxml()} methods of DOM nodes.
1877 \item The \module{xmlrpclib} module now supports an XML-RPC extension
1878 for handling nil data values such as Python's \code{None}. Nil values
1879 are always supported on unmarshalling an XML-RPC response. To
1880 generate requests containing \code{None}, you must supply a true value
1881 for the \var{allow_none} parameter when creating a \class{Marshaller}
1882 instance.
1884 \item The new \module{DocXMLRPCServer} module allows writing
1885 self-documenting XML-RPC servers. Run it in demo mode (as a program)
1886 to see it in action. Pointing the Web browser to the RPC server
1887 produces pydoc-style documentation; pointing xmlrpclib to the
1888 server allows invoking the actual methods.
1889 (Contributed by Brian Quinlan.)
1891 \item Support for internationalized domain names (RFCs 3454, 3490,
1892 3491, and 3492) has been added. The ``idna'' encoding can be used
1893 to convert between a Unicode domain name and the ASCII-compatible
1894 encoding (ACE) of that name.
1896 \begin{alltt}
1897 >{}>{}> u"www.Alliancefran\c caise.nu".encode("idna")
1898 'www.xn--alliancefranaise-npb.nu'
1899 \end{alltt}
1901 The \module{socket} module has also been extended to transparently
1902 convert Unicode hostnames to the ACE version before passing them to
1903 the C library. Modules that deal with hostnames such as
1904 \module{httplib} and \module{ftplib}) also support Unicode host names;
1905 \module{httplib} also sends HTTP \samp{Host} headers using the ACE
1906 version of the domain name. \module{urllib} supports Unicode URLs
1907 with non-ASCII host names as long as the \code{path} part of the URL
1908 is ASCII only.
1910 To implement this change, the \module{stringprep} module, the
1911 \code{mkstringprep} tool and the \code{punycode} encoding have been added.
1913 \end{itemize}
1916 %======================================================================
1917 \subsection{Date/Time Type}
1919 Date and time types suitable for expressing timestamps were added as
1920 the \module{datetime} module. The types don't support different
1921 calendars or many fancy features, and just stick to the basics of
1922 representing time.
1924 The three primary types are: \class{date}, representing a day, month,
1925 and year; \class{time}, consisting of hour, minute, and second; and
1926 \class{datetime}, which contains all the attributes of both
1927 \class{date} and \class{time}. There's also a
1928 \class{timedelta} class representing differences between two points
1929 in time, and time zone logic is implemented by classes inheriting from
1930 the abstract \class{tzinfo} class.
1932 You can create instances of \class{date} and \class{time} by either
1933 supplying keyword arguments to the appropriate constructor,
1934 e.g. \code{datetime.date(year=1972, month=10, day=15)}, or by using
1935 one of a number of class methods. For example, the \method{date.today()}
1936 class method returns the current local date.
1938 Once created, instances of the date/time classes are all immutable.
1939 There are a number of methods for producing formatted strings from
1940 objects:
1942 \begin{verbatim}
1943 >>> import datetime
1944 >>> now = datetime.datetime.now()
1945 >>> now.isoformat()
1946 '2002-12-30T21:27:03.994956'
1947 >>> now.ctime() # Only available on date, datetime
1948 'Mon Dec 30 21:27:03 2002'
1949 >>> now.strftime('%Y %d %b')
1950 '2002 30 Dec'
1951 \end{verbatim}
1953 The \method{replace()} method allows modifying one or more fields
1954 of a \class{date} or \class{datetime} instance, returning a new instance:
1956 \begin{verbatim}
1957 >>> d = datetime.datetime.now()
1958 >>> d
1959 datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
1960 >>> d.replace(year=2001, hour = 12)
1961 datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
1963 \end{verbatim}
1965 Instances can be compared, hashed, and converted to strings (the
1966 result is the same as that of \method{isoformat()}). \class{date} and
1967 \class{datetime} instances can be subtracted from each other, and
1968 added to \class{timedelta} instances. The largest missing feature is
1969 that there's no standard library support for parsing strings and getting back a
1970 \class{date} or \class{datetime}.
1972 For more information, refer to the \ulink{module's reference
1973 documentation}{../lib/module-datetime.html}.
1974 (Contributed by Tim Peters.)
1977 %======================================================================
1978 \subsection{The optparse Module}
1980 The \module{getopt} module provides simple parsing of command-line
1981 arguments. The new \module{optparse} module (originally named Optik)
1982 provides more elaborate command-line parsing that follows the Unix
1983 conventions, automatically creates the output for \longprogramopt{help},
1984 and can perform different actions for different options.
1986 You start by creating an instance of \class{OptionParser} and telling
1987 it what your program's options are.
1989 \begin{verbatim}
1990 import sys
1991 from optparse import OptionParser
1993 op = OptionParser()
1994 op.add_option('-i', '--input',
1995 action='store', type='string', dest='input',
1996 help='set input filename')
1997 op.add_option('-l', '--length',
1998 action='store', type='int', dest='length',
1999 help='set maximum length of output')
2000 \end{verbatim}
2002 Parsing a command line is then done by calling the \method{parse_args()}
2003 method.
2005 \begin{verbatim}
2006 options, args = op.parse_args(sys.argv[1:])
2007 print options
2008 print args
2009 \end{verbatim}
2011 This returns an object containing all of the option values,
2012 and a list of strings containing the remaining arguments.
2014 Invoking the script with the various arguments now works as you'd
2015 expect it to. Note that the length argument is automatically
2016 converted to an integer.
2018 \begin{verbatim}
2019 $ ./python opt.py -i data arg1
2020 <Values at 0x400cad4c: {'input': 'data', 'length': None}>
2021 ['arg1']
2022 $ ./python opt.py --input=data --length=4
2023 <Values at 0x400cad2c: {'input': 'data', 'length': 4}>
2026 \end{verbatim}
2028 The help message is automatically generated for you:
2030 \begin{verbatim}
2031 $ ./python opt.py --help
2032 usage: opt.py [options]
2034 options:
2035 -h, --help show this help message and exit
2036 -iINPUT, --input=INPUT
2037 set input filename
2038 -lLENGTH, --length=LENGTH
2039 set maximum length of output
2041 \end{verbatim}
2042 % $ prevent Emacs tex-mode from getting confused
2044 See the \ulink{module's documentation}{../lib/module-optparse.html}
2045 for more details.
2047 Optik was written by Greg Ward, with suggestions from the readers of
2048 the Getopt SIG.
2051 %======================================================================
2052 \section{Pymalloc: A Specialized Object Allocator\label{section-pymalloc}}
2054 Pymalloc, a specialized object allocator written by Vladimir
2055 Marangozov, was a feature added to Python 2.1. Pymalloc is intended
2056 to be faster than the system \cfunction{malloc()} and to have less
2057 memory overhead for allocation patterns typical of Python programs.
2058 The allocator uses C's \cfunction{malloc()} function to get large
2059 pools of memory and then fulfills smaller memory requests from these
2060 pools.
2062 In 2.1 and 2.2, pymalloc was an experimental feature and wasn't
2063 enabled by default; you had to explicitly enable it when compiling
2064 Python by providing the
2065 \longprogramopt{with-pymalloc} option to the \program{configure}
2066 script. In 2.3, pymalloc has had further enhancements and is now
2067 enabled by default; you'll have to supply
2068 \longprogramopt{without-pymalloc} to disable it.
2070 This change is transparent to code written in Python; however,
2071 pymalloc may expose bugs in C extensions. Authors of C extension
2072 modules should test their code with pymalloc enabled,
2073 because some incorrect code may cause core dumps at runtime.
2075 There's one particularly common error that causes problems. There are
2076 a number of memory allocation functions in Python's C API that have
2077 previously just been aliases for the C library's \cfunction{malloc()}
2078 and \cfunction{free()}, meaning that if you accidentally called
2079 mismatched functions the error wouldn't be noticeable. When the
2080 object allocator is enabled, these functions aren't aliases of
2081 \cfunction{malloc()} and \cfunction{free()} any more, and calling the
2082 wrong function to free memory may get you a core dump. For example,
2083 if memory was allocated using \cfunction{PyObject_Malloc()}, it has to
2084 be freed using \cfunction{PyObject_Free()}, not \cfunction{free()}. A
2085 few modules included with Python fell afoul of this and had to be
2086 fixed; doubtless there are more third-party modules that will have the
2087 same problem.
2089 As part of this change, the confusing multiple interfaces for
2090 allocating memory have been consolidated down into two API families.
2091 Memory allocated with one family must not be manipulated with
2092 functions from the other family. There is one family for allocating
2093 chunks of memory and another family of functions specifically for
2094 allocating Python objects.
2096 \begin{itemize}
2097 \item To allocate and free an undistinguished chunk of memory use
2098 the ``raw memory'' family: \cfunction{PyMem_Malloc()},
2099 \cfunction{PyMem_Realloc()}, and \cfunction{PyMem_Free()}.
2101 \item The ``object memory'' family is the interface to the pymalloc
2102 facility described above and is biased towards a large number of
2103 ``small'' allocations: \cfunction{PyObject_Malloc},
2104 \cfunction{PyObject_Realloc}, and \cfunction{PyObject_Free}.
2106 \item To allocate and free Python objects, use the ``object'' family
2107 \cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()}, and
2108 \cfunction{PyObject_Del()}.
2109 \end{itemize}
2111 Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides
2112 debugging features to catch memory overwrites and doubled frees in
2113 both extension modules and in the interpreter itself. To enable this
2114 support, compile a debugging version of the Python interpreter by
2115 running \program{configure} with \longprogramopt{with-pydebug}.
2117 To aid extension writers, a header file \file{Misc/pymemcompat.h} is
2118 distributed with the source to Python 2.3 that allows Python
2119 extensions to use the 2.3 interfaces to memory allocation while
2120 compiling against any version of Python since 1.5.2. You would copy
2121 the file from Python's source distribution and bundle it with the
2122 source of your extension.
2124 \begin{seealso}
2126 \seeurl{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/obmalloc.c}
2127 {For the full details of the pymalloc implementation, see
2128 the comments at the top of the file \file{Objects/obmalloc.c} in the
2129 Python source code. The above link points to the file within the
2130 SourceForge CVS browser.}
2132 \end{seealso}
2135 % ======================================================================
2136 \section{Build and C API Changes}
2138 Changes to Python's build process and to the C API include:
2140 \begin{itemize}
2142 \item The cycle detection implementation used by the garbage collection
2143 has proven to be stable, so it's now been made mandatory. You can no
2144 longer compile Python without it, and the
2145 \longprogramopt{with-cycle-gc} switch to \program{configure} has been removed.
2147 \item Python can now optionally be built as a shared library
2148 (\file{libpython2.3.so}) by supplying \longprogramopt{enable-shared}
2149 when running Python's \program{configure} script. (Contributed by Ondrej
2150 Palkovsky.)
2152 \item The \csimplemacro{DL_EXPORT} and \csimplemacro{DL_IMPORT} macros
2153 are now deprecated. Initialization functions for Python extension
2154 modules should now be declared using the new macro
2155 \csimplemacro{PyMODINIT_FUNC}, while the Python core will generally
2156 use the \csimplemacro{PyAPI_FUNC} and \csimplemacro{PyAPI_DATA}
2157 macros.
2159 \item The interpreter can be compiled without any docstrings for
2160 the built-in functions and modules by supplying
2161 \longprogramopt{without-doc-strings} to the \program{configure} script.
2162 This makes the Python executable about 10\% smaller, but will also
2163 mean that you can't get help for Python's built-ins. (Contributed by
2164 Gustavo Niemeyer.)
2166 \item The \cfunction{PyArg_NoArgs()} macro is now deprecated, and code
2167 that uses it should be changed. For Python 2.2 and later, the method
2168 definition table can specify the
2169 \constant{METH_NOARGS} flag, signalling that there are no arguments, and
2170 the argument checking can then be removed. If compatibility with
2171 pre-2.2 versions of Python is important, the code could use
2172 \code{PyArg_ParseTuple(\var{args}, "")} instead, but this will be slower
2173 than using \constant{METH_NOARGS}.
2175 \item \cfunction{PyArg_ParseTuple()} accepts new format characters for various sizes of unsigned integers: \samp{B} for \ctype{unsigned char},
2176 \samp{H} for \ctype{unsigned short int},
2177 \samp{I} for \ctype{unsigned int},
2178 and \samp{K} for \ctype{unsigned long long}.
2180 \item A new function, \cfunction{PyObject_DelItemString(\var{mapping},
2181 char *\var{key})} was added as shorthand for
2182 \code{PyObject_DelItem(\var{mapping}, PyString_New(\var{key}))}.
2184 \item File objects now manage their internal string buffer
2185 differently, increasing it exponentially when needed. This results in
2186 the benchmark tests in \file{Lib/test/test_bufio.py} speeding up
2187 considerably (from 57 seconds to 1.7 seconds, according to one
2188 measurement).
2190 \item It's now possible to define class and static methods for a C
2191 extension type by setting either the \constant{METH_CLASS} or
2192 \constant{METH_STATIC} flags in a method's \ctype{PyMethodDef}
2193 structure.
2195 \item Python now includes a copy of the Expat XML parser's source code,
2196 removing any dependence on a system version or local installation of
2197 Expat.
2199 \item If you dynamically allocate type objects in your extension, you
2200 should be aware of a change in the rules relating to the
2201 \member{__module__} and \member{__name__} attributes. In summary,
2202 you will want to ensure the type's dictionary contains a
2203 \code{'__module__'} key; making the module name the part of the type
2204 name leading up to the final period will no longer have the desired
2205 effect. For more detail, read the API reference documentation or the
2206 source.
2208 \end{itemize}
2211 %======================================================================
2212 \subsection{Port-Specific Changes}
2214 Support for a port to IBM's OS/2 using the EMX runtime environment was
2215 merged into the main Python source tree. EMX is a POSIX emulation
2216 layer over the OS/2 system APIs. The Python port for EMX tries to
2217 support all the POSIX-like capability exposed by the EMX runtime, and
2218 mostly succeeds; \function{fork()} and \function{fcntl()} are
2219 restricted by the limitations of the underlying emulation layer. The
2220 standard OS/2 port, which uses IBM's Visual Age compiler, also gained
2221 support for case-sensitive import semantics as part of the integration
2222 of the EMX port into CVS. (Contributed by Andrew MacIntyre.)
2224 On MacOS, most toolbox modules have been weaklinked to improve
2225 backward compatibility. This means that modules will no longer fail
2226 to load if a single routine is missing on the current OS version.
2227 Instead calling the missing routine will raise an exception.
2228 (Contributed by Jack Jansen.)
2230 The RPM spec files, found in the \file{Misc/RPM/} directory in the
2231 Python source distribution, were updated for 2.3. (Contributed by
2232 Sean Reifschneider.)
2234 Other new platforms now supported by Python include AtheOS
2235 (\url{http://www.atheos.cx/}), GNU/Hurd, and OpenVMS.
2238 %======================================================================
2239 \section{Other Changes and Fixes \label{section-other}}
2241 As usual, there were a bunch of other improvements and bugfixes
2242 scattered throughout the source tree. A search through the CVS change
2243 logs finds there were 523 patches applied and 514 bugs fixed between
2244 Python 2.2 and 2.3. Both figures are likely to be underestimates.
2246 Some of the more notable changes are:
2248 \begin{itemize}
2250 \item If the \envvar{PYTHONINSPECT} environment variable is set, the
2251 Python interpreter will enter the interactive prompt after running a
2252 Python program, as if Python had been invoked with the \programopt{-i}
2253 option. The environment variable can be set before running the Python
2254 interpreter, or it can be set by the Python program as part of its
2255 execution.
2257 \item The \file{regrtest.py} script now provides a way to allow ``all
2258 resources except \var{foo}.'' A resource name passed to the
2259 \programopt{-u} option can now be prefixed with a hyphen
2260 (\character{-}) to mean ``remove this resource.'' For example, the
2261 option `\code{\programopt{-u}all,-bsddb}' could be used to enable the
2262 use of all resources except \code{bsddb}.
2264 \item The tools used to build the documentation now work under Cygwin
2265 as well as \UNIX.
2267 \item The \code{SET_LINENO} opcode has been removed. Back in the
2268 mists of time, this opcode was needed to produce line numbers in
2269 tracebacks and support trace functions (for, e.g., \module{pdb}).
2270 Since Python 1.5, the line numbers in tracebacks have been computed
2271 using a different mechanism that works with ``python -O''. For Python
2272 2.3 Michael Hudson implemented a similar scheme to determine when to
2273 call the trace function, removing the need for \code{SET_LINENO}
2274 entirely.
2276 It would be difficult to detect any resulting difference from Python
2277 code, apart from a slight speed up when Python is run without
2278 \programopt{-O}.
2280 C extensions that access the \member{f_lineno} field of frame objects
2281 should instead call \code{PyCode_Addr2Line(f->f_code, f->f_lasti)}.
2282 This will have the added effect of making the code work as desired
2283 under ``python -O'' in earlier versions of Python.
2285 A nifty new feature is that trace functions can now assign to the
2286 \member{f_lineno} attribute of frame objects, changing the line that
2287 will be executed next. A \samp{jump} command has been added to the
2288 \module{pdb} debugger taking advantage of this new feature.
2289 (Implemented by Richie Hindle.)
2291 \end{itemize}
2294 %======================================================================
2295 \section{Porting to Python 2.3}
2297 This section lists previously described changes that may require
2298 changes to your code:
2300 \begin{itemize}
2302 \item \keyword{yield} is now always a keyword; if it's used as a
2303 variable name in your code, a different name must be chosen.
2305 \item For strings \var{X} and \var{Y}, \code{\var{X} in \var{Y}} now works
2306 if \var{X} is more than one character long.
2308 \item The \function{int()} type constructor will now return a long
2309 integer instead of raising an \exception{OverflowError} when a string
2310 or floating-point number is too large to fit into an integer.
2312 \item If you have Unicode strings that contain 8-bit characters, you
2313 must declare the file's encoding (UTF-8, Latin-1, or whatever) by
2314 adding a comment to the top of the file. See
2315 section~\ref{section-encodings} for more information.
2317 \item Calling Tcl methods through \module{_tkinter} no longer
2318 returns only strings. Instead, if Tcl returns other objects those
2319 objects are converted to their Python equivalent, if one exists, or
2320 wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent
2321 exists.
2323 \item Large octal and hex literals such as
2324 \code{0xffffffff} now trigger a \exception{FutureWarning}. Currently
2325 they're stored as 32-bit numbers and result in a negative value, but
2326 in Python 2.4 they'll become positive long integers.
2328 % The empty groups below prevent conversion to guillemets.
2329 There are a few ways to fix this warning. If you really need a
2330 positive number, just add an \samp{L} to the end of the literal. If
2331 you're trying to get a 32-bit integer with low bits set and have
2332 previously used an expression such as \code{\textasciitilde(1 <{}< 31)},
2333 it's probably
2334 clearest to start with all bits set and clear the desired upper bits.
2335 For example, to clear just the top bit (bit 31), you could write
2336 \code{0xffffffffL {\&}{\textasciitilde}(1L<{}<31)}.
2338 \item You can no longer disable assertions by assigning to \code{__debug__}.
2340 \item The Distutils \function{setup()} function has gained various new
2341 keyword arguments such as \var{depends}. Old versions of the
2342 Distutils will abort if passed unknown keywords. A solution is to check
2343 for the presence of the new \function{get_distutil_options()} function
2344 in your \file{setup.py} and only uses the new keywords
2345 with a version of the Distutils that supports them:
2347 \begin{verbatim}
2348 from distutils import core
2350 kw = {'sources': 'foo.c', ...}
2351 if hasattr(core, 'get_distutil_options'):
2352 kw['depends'] = ['foo.h']
2353 ext = Extension(**kw)
2354 \end{verbatim}
2356 \item Using \code{None} as a variable name will now result in a
2357 \exception{SyntaxWarning} warning.
2359 \item Names of extension types defined by the modules included with
2360 Python now contain the module and a \character{.} in front of the type
2361 name.
2363 \end{itemize}
2366 %======================================================================
2367 \section{Acknowledgements \label{acks}}
2369 The author would like to thank the following people for offering
2370 suggestions, corrections and assistance with various drafts of this
2371 article: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside,
2372 Andrew Dalke, Scott David Daniels, Fred~L. Drake, Jr., David Fraser,
2373 Kelly Gerber,
2374 Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert,
2375 Martin von~L\"owis, Andrew MacIntyre, Lalo Martins, Chad Netzer,
2376 Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco
2377 Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler,
2378 Just van~Rossum.
2380 \end{document}