Doc/whatsnew/whatsnew23.tex

   1 \documentclass{howto}
   2 \usepackage{distutils}
   3 % $Id$
   4
   5 \title{What's New in Python 2.3}
   6 \release{1.01}
   7 \author{A.M.\ Kuchling}
   8 \authoraddress{
   9         \strong{Python Software Foundation}\\
  10         Email: \email{amk@amk.ca}
  11 }
  12
  13 \begin{document}
  14 \maketitle
  15 \tableofcontents
  16
  17 This article explains the new features in Python 2.3.  Python 2.3 was
  18 released on July 29, 2003.
  19
  20 The main themes for Python 2.3 are polishing some of the features
  21 added in 2.2, adding various small but useful enhancements to the core
  22 language, and expanding the standard library.  The new object model
  23 introduced in the previous version has benefited from 18 months of
  24 bugfixes and from optimization efforts that have improved the
  25 performance of new-style classes.  A few new built-in functions have
  26 been added such as \function{sum()} and \function{enumerate()}.  The
  27 \keyword{in} operator can now be used for substring searches (e.g.
  28 \code{"ab" in "abc"} returns \constant{True}).
  29
  30 Some of the many new library features include Boolean, set, heap, and
  31 date/time data types, the ability to import modules from ZIP-format
  32 archives, metadata support for the long-awaited Python catalog, an
  33 updated version of IDLE, and modules for logging messages, wrapping
  34 text, parsing CSV files, processing command-line options, using BerkeleyDB
  35 databases...  the list of new and enhanced modules is lengthy.
  36
  37 This article doesn't attempt to provide a complete specification of
  38 the new features, but instead provides a convenient overview.  For
  39 full details, you should refer to the documentation for Python 2.3,
  40 such as the \citetitle[../lib/lib.html]{Python Library Reference} and
  41 the \citetitle[../ref/ref.html]{Python Reference Manual}.  If you want
  42 to understand the complete implementation and design rationale,
  43 refer to the PEP for a particular new feature.
  44
  45
  46 %======================================================================
  47 \section{PEP 218: A Standard Set Datatype}
  48
  49 The new \module{sets} module contains an implementation of a set
  50 datatype.  The \class{Set} class is for mutable sets, sets that can
  51 have members added and removed.  The \class{ImmutableSet} class is for
  52 sets that can't be modified, and instances of \class{ImmutableSet} can
  53 therefore be used as dictionary keys.  Sets are built on top of
  54 dictionaries, so the elements within a set must be hashable.
  55
  56 Here's a simple example:
  57
  58 \begin{verbatim}
  59 >>> import sets
  60 >>> S = sets.Set([1,2,3])
  61 >>> S
  62 Set([1, 2, 3])
  63 >>> 1 in S
  64 True
  65 >>> 0 in S
  66 False
  67 >>> S.add(5)
  68 >>> S.remove(3)
  69 >>> S
  70 Set([1, 2, 5])
  71 >>>
  72 \end{verbatim}
  73
  74 The union and intersection of sets can be computed with the
  75 \method{union()} and \method{intersection()} methods; an alternative
  76 notation uses the bitwise operators \code{\&} and \code{|}.
  77 Mutable sets also have in-place versions of these methods,
  78 \method{union_update()} and \method{intersection_update()}.
  79
  80 \begin{verbatim}
  81 >>> S1 = sets.Set([1,2,3])
  82 >>> S2 = sets.Set([4,5,6])
  83 >>> S1.union(S2)
  84 Set([1, 2, 3, 4, 5, 6])
  85 >>> S1 | S2                  # Alternative notation
  86 Set([1, 2, 3, 4, 5, 6])
  87 >>> S1.intersection(S2)
  88 Set([])
  89 >>> S1 & S2                  # Alternative notation
  90 Set([])
  91 >>> S1.union_update(S2)
  92 >>> S1
  93 Set([1, 2, 3, 4, 5, 6])
  94 >>>
  95 \end{verbatim}
  96
  97 It's also possible to take the symmetric difference of two sets.  This
  98 is the set of all elements in the union that aren't in the
  99 intersection.  Another way of putting it is that the symmetric
 100 difference contains all elements that are in exactly one
 101 set.  Again, there's an alternative notation (\code{\^}), and an
 102 in-place version with the ungainly name
 103 \method{symmetric_difference_update()}.
 104
 105 \begin{verbatim}
 106 >>> S1 = sets.Set([1,2,3,4])
 107 >>> S2 = sets.Set([3,4,5,6])
 108 >>> S1.symmetric_difference(S2)
 109 Set([1, 2, 5, 6])
 110 >>> S1 ^ S2
 111 Set([1, 2, 5, 6])
 112 >>>
 113 \end{verbatim}
 114
 115 There are also \method{issubset()} and \method{issuperset()} methods
 116 for checking whether one set is a subset or superset of another:
 117
 118 \begin{verbatim}
 119 >>> S1 = sets.Set([1,2,3])
 120 >>> S2 = sets.Set([2,3])
 121 >>> S2.issubset(S1)
 122 True
 123 >>> S1.issubset(S2)
 124 False
 125 >>> S1.issuperset(S2)
 126 True
 127 >>>
 128 \end{verbatim}
 129
 130
 131 \begin{seealso}
 132
 133 \seepep{218}{Adding a Built-In Set Object Type}{PEP written by Greg V. Wilson.
 134 Implemented by Greg V. Wilson, Alex Martelli, and GvR.}
 135
 136 \end{seealso}
 137
 138
 139
 140 %======================================================================
 141 \section{PEP 255: Simple Generators\label{section-generators}}
 142
 143 In Python 2.2, generators were added as an optional feature, to be
 144 enabled by a \code{from __future__ import generators} directive.  In
 145 2.3 generators no longer need to be specially enabled, and are now
 146 always present; this means that \keyword{yield} is now always a
 147 keyword.  The rest of this section is a copy of the description of
 148 generators from the ``What's New in Python 2.2'' document; if you read
 149 it back when Python 2.2 came out, you can skip the rest of this section.
 150
 151 You're doubtless familiar with how function calls work in Python or C.
 152 When you call a function, it gets a private namespace where its local
 153 variables are created.  When the function reaches a \keyword{return}
 154 statement, the local variables are destroyed and the resulting value
 155 is returned to the caller.  A later call to the same function will get
 156 a fresh new set of local variables. But, what if the local variables
 157 weren't thrown away on exiting a function?  What if you could later
 158 resume the function where it left off?  This is what generators
 159 provide; they can be thought of as resumable functions.
 160
 161 Here's the simplest example of a generator function:
 162
 163 \begin{verbatim}
 164 def generate_ints(N):
 165     for i in range(N):
 166         yield i
 167 \end{verbatim}
 168
 169 A new keyword, \keyword{yield}, was introduced for generators.  Any
 170 function containing a \keyword{yield} statement is a generator
 171 function; this is detected by Python's bytecode compiler which
 172 compiles the function specially as a result.
 173
 174 When you call a generator function, it doesn't return a single value;
 175 instead it returns a generator object that supports the iterator
 176 protocol.  On executing the \keyword{yield} statement, the generator
 177 outputs the value of \code{i}, similar to a \keyword{return}
 178 statement.  The big difference between \keyword{yield} and a
 179 \keyword{return} statement is that on reaching a \keyword{yield} the
 180 generator's state of execution is suspended and local variables are
 181 preserved.  On the next call to the generator's \code{.next()} method,
 182 the function will resume executing immediately after the
 183 \keyword{yield} statement.  (For complicated reasons, the
 184 \keyword{yield} statement isn't allowed inside the \keyword{try} block
 185 of a \keyword{try}...\keyword{finally} statement; read \pep{255} for a full
 186 explanation of the interaction between \keyword{yield} and
 187 exceptions.)
 188
 189 Here's a sample usage of the \function{generate_ints()} generator:
 190
 191 \begin{verbatim}
 192 >>> gen = generate_ints(3)
 193 >>> gen
 194 <generator object at 0x8117f90>
 195 >>> gen.next()
 196 0
 197 >>> gen.next()
 198 1
 199 >>> gen.next()
 200 2
 201 >>> gen.next()
 202 Traceback (most recent call last):
 203   File "stdin", line 1, in ?
 204   File "stdin", line 2, in generate_ints
 205 StopIteration
 206 \end{verbatim}
 207
 208 You could equally write \code{for i in generate_ints(5)}, or
 209 \code{a,b,c = generate_ints(3)}.
 210
 211 Inside a generator function, the \keyword{return} statement can only
 212 be used without a value, and signals the end of the procession of
 213 values; afterwards the generator cannot return any further values.
 214 \keyword{return} with a value, such as \code{return 5}, is a syntax
 215 error inside a generator function.  The end of the generator's results
 216 can also be indicated by raising \exception{StopIteration} manually,
 217 or by just letting the flow of execution fall off the bottom of the
 218 function.
 219
 220 You could achieve the effect of generators manually by writing your
 221 own class and storing all the local variables of the generator as
 222 instance variables.  For example, returning a list of integers could
 223 be done by setting \code{self.count} to 0, and having the
 224 \method{next()} method increment \code{self.count} and return it.
 225 However, for a moderately complicated generator, writing a
 226 corresponding class would be much messier.
 227 \file{Lib/test/test_generators.py} contains a number of more
 228 interesting examples.  The simplest one implements an in-order
 229 traversal of a tree using generators recursively.
 230
 231 \begin{verbatim}
 232 # A recursive generator that generates Tree leaves in in-order.
 233 def inorder(t):
 234     if t:
 235         for x in inorder(t.left):
 236             yield x
 237         yield t.label
 238         for x in inorder(t.right):
 239             yield x
 240 \end{verbatim}
 241
 242 Two other examples in \file{Lib/test/test_generators.py} produce
 243 solutions for the N-Queens problem (placing $N$ queens on an $NxN$
 244 chess board so that no queen threatens another) and the Knight's Tour
 245 (a route that takes a knight to every square of an $NxN$ chessboard
 246 without visiting any square twice).
 247
 248 The idea of generators comes from other programming languages,
 249 especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
 250 idea of generators is central.  In Icon, every
 251 expression and function call behaves like a generator.  One example
 252 from ``An Overview of the Icon Programming Language'' at
 253 \url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
 254 what this looks like:
 255
 256 \begin{verbatim}
 257 sentence := "Store it in the neighboring harbor"
 258 if (i := find("or", sentence)) > 5 then write(i)
 259 \end{verbatim}
 260
 261 In Icon the \function{find()} function returns the indexes at which the
 262 substring ``or'' is found: 3, 23, 33.  In the \keyword{if} statement,
 263 \code{i} is first assigned a value of 3, but 3 is less than 5, so the
 264 comparison fails, and Icon retries it with the second value of 23.  23
 265 is greater than 5, so the comparison now succeeds, and the code prints
 266 the value 23 to the screen.
 267
 268 Python doesn't go nearly as far as Icon in adopting generators as a
 269 central concept.  Generators are considered part of the core
 270 Python language, but learning or using them isn't compulsory; if they
 271 don't solve any problems that you have, feel free to ignore them.
 272 One novel feature of Python's interface as compared to
 273 Icon's is that a generator's state is represented as a concrete object
 274 (the iterator) that can be passed around to other functions or stored
 275 in a data structure.
 276
 277 \begin{seealso}
 278
 279 \seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
 280 Peters, Magnus Lie Hetland.  Implemented mostly by Neil Schemenauer
 281 and Tim Peters, with other fixes from the Python Labs crew.}
 282
 283 \end{seealso}
 284
 285
 286 %======================================================================
 287 \section{PEP 263: Source Code Encodings \label{section-encodings}}
 288
 289 Python source files can now be declared as being in different
 290 character set encodings.  Encodings are declared by including a
 291 specially formatted comment in the first or second line of the source
 292 file.  For example, a UTF-8 file can be declared with:
 293
 294 \begin{verbatim}
 295 #!/usr/bin/env python
 296 # -*- coding: UTF-8 -*-
 297 \end{verbatim}
 298
 299 Without such an encoding declaration, the default encoding used is
 300 7-bit ASCII.  Executing or importing modules that contain string
 301 literals with 8-bit characters and have no encoding declaration will result
 302 in a \exception{DeprecationWarning} being signalled by Python 2.3; in
 303 2.4 this will be a syntax error.
 304
 305 The encoding declaration only affects Unicode string literals, which
 306 will be converted to Unicode using the specified encoding.  Note that
 307 Python identifiers are still restricted to ASCII characters, so you
 308 can't have variable names that use characters outside of the usual
 309 alphanumerics.
 310
 311 \begin{seealso}
 312
 313 \seepep{263}{Defining Python Source Code Encodings}{Written by
 314 Marc-Andr\'e Lemburg and Martin von~L\"owis; implemented by Suzuki
 315 Hisao and Martin von~L\"owis.}
 316
 317 \end{seealso}
 318
 319
 320 %======================================================================
 321 \section{PEP 273: Importing Modules from Zip Archives}
 322
 323 The new \module{zipimport} module adds support for importing
 324 modules from a ZIP-format archive.  You don't need to import the
 325 module explicitly; it will be automatically imported if a ZIP
 326 archive's filename is added to \code{sys.path}.  For example:
 327
 328 \begin{verbatim}
 329 amk@nyman:~/src/python$ unzip -l /tmp/example.zip
 330 Archive:  /tmp/example.zip
 331   Length     Date   Time    Name
 332  --------    ----   ----    ----
 333      8467  11-26-02 22:30   jwzthreading.py
 334  --------                   -------
 335      8467                   1 file
 336 amk@nyman:~/src/python$ ./python
 337 Python 2.3 (#1, Aug 1 2003, 19:54:32)
 338 >>> import sys
 339 >>> sys.path.insert(0, '/tmp/example.zip')  # Add .zip file to front of path
 340 >>> import jwzthreading
 341 >>> jwzthreading.__file__
 342 '/tmp/example.zip/jwzthreading.py'
 343 >>>
 344 \end{verbatim}
 345
 346 An entry in \code{sys.path} can now be the filename of a ZIP archive.
 347 The ZIP archive can contain any kind of files, but only files named
 348 \file{*.py}, \file{*.pyc}, or \file{*.pyo} can be imported.  If an
 349 archive only contains \file{*.py} files, Python will not attempt to
 350 modify the archive by adding the corresponding \file{*.pyc} file, meaning
 351 that if a ZIP archive doesn't contain \file{*.pyc} files, importing may be
 352 rather slow.
 353
 354 A path within the archive can also be specified to only import from a
 355 subdirectory; for example, the path \file{/tmp/example.zip/lib/}
 356 would only import from the \file{lib/} subdirectory within the
 357 archive.
 358
 359 \begin{seealso}
 360
 361 \seepep{273}{Import Modules from Zip Archives}{Written by James C. Ahlstrom,
 362 who also provided an implementation.
 363 Python 2.3 follows the specification in \pep{273},
 364 but uses an implementation written by Just van~Rossum
 365 that uses the import hooks described in \pep{302}.
 366 See section~\ref{section-pep302} for a description of the new import hooks.
 367 }
 368
 369 \end{seealso}
 370
 371 %======================================================================
 372 \section{PEP 277: Unicode file name support for Windows NT}
 373
 374 On Windows NT, 2000, and XP, the system stores file names as Unicode
 375 strings. Traditionally, Python has represented file names as byte
 376 strings, which is inadequate because it renders some file names
 377 inaccessible.
 378
 379 Python now allows using arbitrary Unicode strings (within the
 380 limitations of the file system) for all functions that expect file
 381 names, most notably the \function{open()} built-in function. If a Unicode
 382 string is passed to \function{os.listdir()}, Python now returns a list
 383 of Unicode strings.  A new function, \function{os.getcwdu()}, returns
 384 the current directory as a Unicode string.
 385
 386 Byte strings still work as file names, and on Windows Python will
 387 transparently convert them to Unicode using the \code{mbcs} encoding.
 388
 389 Other systems also allow Unicode strings as file names but convert
 390 them to byte strings before passing them to the system, which can
 391 cause a \exception{UnicodeError} to be raised. Applications can test
 392 whether arbitrary Unicode strings are supported as file names by
 393 checking \member{os.path.supports_unicode_filenames}, a Boolean value.
 394
 395 Under MacOS, \function{os.listdir()} may now return Unicode filenames.
 396
 397 \begin{seealso}
 398
 399 \seepep{277}{Unicode file name support for Windows NT}{Written by Neil
 400 Hodgson; implemented by Neil Hodgson, Martin von~L\"owis, and Mark
 401 Hammond.}
 402
 403 \end{seealso}
 404
 405
 406 %======================================================================
 407 \section{PEP 278: Universal Newline Support}
 408
 409 The three major operating systems used today are Microsoft Windows,
 410 Apple's Macintosh OS, and the various \UNIX\ derivatives.  A minor
 411 irritation of cross-platform work
 412 is that these three platforms all use different characters
 413 to mark the ends of lines in text files.  \UNIX\ uses the linefeed
 414 (ASCII character 10), MacOS uses the carriage return (ASCII
 415 character 13), and Windows uses a two-character sequence of a
 416 carriage return plus a newline.
 417
 418 Python's file objects can now support end of line conventions other
 419 than the one followed by the platform on which Python is running.
 420 Opening a file with the mode \code{'U'} or \code{'rU'} will open a file
 421 for reading in universal newline mode.  All three line ending
 422 conventions will be translated to a \character{\e n} in the strings
 423 returned by the various file methods such as \method{read()} and
 424 \method{readline()}.
 425
 426 Universal newline support is also used when importing modules and when
 427 executing a file with the \function{execfile()} function.  This means
 428 that Python modules can be shared between all three operating systems
 429 without needing to convert the line-endings.
 430
 431 This feature can be disabled when compiling Python by specifying
 432 the \longprogramopt{without-universal-newlines} switch when running Python's
 433 \program{configure} script.
 434
 435 \begin{seealso}
 436
 437 \seepep{278}{Universal Newline Support}{Written
 438 and implemented by Jack Jansen.}
 439
 440 \end{seealso}
 441
 442
 443 %======================================================================
 444 \section{PEP 279: enumerate()\label{section-enumerate}}
 445
 446 A new built-in function, \function{enumerate()}, will make
 447 certain loops a bit clearer.  \code{enumerate(thing)}, where
 448 \var{thing} is either an iterator or a sequence, returns a iterator
 449 that will return \code{(0, \var{thing}[0])}, \code{(1,
 450 \var{thing}[1])}, \code{(2, \var{thing}[2])}, and so forth.
 451
 452 A common idiom to change every element of a list looks like this:
 453
 454 \begin{verbatim}
 455 for i in range(len(L)):
 456     item = L[i]
 457     # ... compute some result based on item ...
 458     L[i] = result
 459 \end{verbatim}
 460
 461 This can be rewritten using \function{enumerate()} as:
 462
 463 \begin{verbatim}
 464 for i, item in enumerate(L):
 465     # ... compute some result based on item ...
 466     L[i] = result
 467 \end{verbatim}
 468
 469
 470 \begin{seealso}
 471
 472 \seepep{279}{The enumerate() built-in function}{Written
 473 and implemented by Raymond D. Hettinger.}
 474
 475 \end{seealso}
 476
 477
 478 %======================================================================
 479 \section{PEP 282: The logging Package}
 480
 481 A standard package for writing logs, \module{logging}, has been added
 482 to Python 2.3.  It provides a powerful and flexible mechanism for
 483 generating logging output which can then be filtered and processed in
 484 various ways.  A configuration file written in a standard format can
 485 be used to control the logging behavior of a program.  Python
 486 includes handlers that will write log records to
 487 standard error or to a file or socket, send them to the system log, or
 488 even e-mail them to a particular address; of course, it's also
 489 possible to write your own handler classes.
 490
 491 The \class{Logger} class is the primary class.
 492 Most application code will deal with one or more \class{Logger}
 493 objects, each one used by a particular subsystem of the application.
 494 Each \class{Logger} is identified by a name, and names are organized
 495 into a hierarchy using \samp{.}  as the component separator.  For
 496 example, you might have \class{Logger} instances named \samp{server},
 497 \samp{server.auth} and \samp{server.network}.  The latter two
 498 instances are below \samp{server} in the hierarchy.  This means that
 499 if you turn up the verbosity for \samp{server} or direct \samp{server}
 500 messages to a different handler, the changes will also apply to
 501 records logged to \samp{server.auth} and \samp{server.network}.
 502 There's also a root \class{Logger} that's the parent of all other
 503 loggers.
 504
 505 For simple uses, the \module{logging} package contains some
 506 convenience functions that always use the root log:
 507
 508 \begin{verbatim}
 509 import logging
 510
 511 logging.debug('Debugging information')
 512 logging.info('Informational message')
 513 logging.warning('Warning:config file %s not found', 'server.conf')
 514 logging.error('Error occurred')
 515 logging.critical('Critical error -- shutting down')
 516 \end{verbatim}
 517
 518 This produces the following output:
 519
 520 \begin{verbatim}
 521 WARNING:root:Warning:config file server.conf not found
 522 ERROR:root:Error occurred
 523 CRITICAL:root:Critical error -- shutting down
 524 \end{verbatim}
 525
 526 In the default configuration, informational and debugging messages are
 527 suppressed and the output is sent to standard error.  You can enable
 528 the display of informational and debugging messages by calling the
 529 \method{setLevel()} method on the root logger.
 530
 531 Notice the \function{warning()} call's use of string formatting
 532 operators; all of the functions for logging messages take the
 533 arguments \code{(\var{msg}, \var{arg1}, \var{arg2}, ...)} and log the
 534 string resulting from \code{\var{msg} \% (\var{arg1}, \var{arg2},
 535 ...)}.
 536
 537 There's also an \function{exception()} function that records the most
 538 recent traceback.  Any of the other functions will also record the
 539 traceback if you specify a true value for the keyword argument
 540 \var{exc_info}.
 541
 542 \begin{verbatim}
 543 def f():
 544     try:    1/0
 545     except: logging.exception('Problem recorded')
 546
 547 f()
 548 \end{verbatim}
 549
 550 This produces the following output:
 551
 552 \begin{verbatim}
 553 ERROR:root:Problem recorded
 554 Traceback (most recent call last):
 555   File "t.py", line 6, in f
 556     1/0
 557 ZeroDivisionError: integer division or modulo by zero
 558 \end{verbatim}
 559
 560 Slightly more advanced programs will use a logger other than the root
 561 logger.  The \function{getLogger(\var{name})} function is used to get
 562 a particular log, creating it if it doesn't exist yet.
 563 \function{getLogger(None)} returns the root logger.
 564
 565
 566 \begin{verbatim}
 567 log = logging.getLogger('server')
 568  ...
 569 log.info('Listening on port %i', port)
 570  ...
 571 log.critical('Disk full')
 572  ...
 573 \end{verbatim}
 574
 575 Log records are usually propagated up the hierarchy, so a message
 576 logged to \samp{server.auth} is also seen by \samp{server} and
 577 \samp{root}, but a \class{Logger} can prevent this by setting its
 578 \member{propagate} attribute to \constant{False}.
 579
 580 There are more classes provided by the \module{logging} package that
 581 can be customized.  When a \class{Logger} instance is told to log a
 582 message, it creates a \class{LogRecord} instance that is sent to any
 583 number of different \class{Handler} instances.  Loggers and handlers
 584 can also have an attached list of filters, and each filter can cause
 585 the \class{LogRecord} to be ignored or can modify the record before
 586 passing it along.  When they're finally output, \class{LogRecord}
 587 instances are converted to text by a \class{Formatter} class.  All of
 588 these classes can be replaced by your own specially-written classes.
 589
 590 With all of these features the \module{logging} package should provide
 591 enough flexibility for even the most complicated applications.  This
 592 is only an incomplete overview of its features, so please see the
 593 \ulink{package's reference documentation}{../lib/module-logging.html}
 594 for all of the details.  Reading \pep{282} will also be helpful.
 595
 596
 597 \begin{seealso}
 598
 599 \seepep{282}{A Logging System}{Written by Vinay Sajip and Trent Mick;
 600 implemented by Vinay Sajip.}
 601
 602 \end{seealso}
 603
 604
 605 %======================================================================
 606 \section{PEP 285: A Boolean Type\label{section-bool}}
 607
 608 A Boolean type was added to Python 2.3.  Two new constants were added
 609 to the \module{__builtin__} module, \constant{True} and
 610 \constant{False}.  (\constant{True} and
 611 \constant{False} constants were added to the built-ins
 612 in Python 2.2.1, but the 2.2.1 versions are simply set to integer values of
 613 1 and 0 and aren't a different type.)
 614
 615 The type object for this new type is named
 616 \class{bool}; the constructor for it takes any Python value and
 617 converts it to \constant{True} or \constant{False}.
 618
 619 \begin{verbatim}
 620 >>> bool(1)
 621 True
 622 >>> bool(0)
 623 False
 624 >>> bool([])
 625 False
 626 >>> bool( (1,) )
 627 True
 628 \end{verbatim}
 629
 630 Most of the standard library modules and built-in functions have been
 631 changed to return Booleans.
 632
 633 \begin{verbatim}
 634 >>> obj = []
 635 >>> hasattr(obj, 'append')
 636 True
 637 >>> isinstance(obj, list)
 638 True
 639 >>> isinstance(obj, tuple)
 640 False
 641 \end{verbatim}
 642
 643 Python's Booleans were added with the primary goal of making code
 644 clearer.  For example, if you're reading a function and encounter the
 645 statement \code{return 1}, you might wonder whether the \code{1}
 646 represents a Boolean truth value, an index, or a
 647 coefficient that multiplies some other quantity.  If the statement is
 648 \code{return True}, however, the meaning of the return value is quite
 649 clear.
 650
 651 Python's Booleans were \emph{not} added for the sake of strict
 652 type-checking.  A very strict language such as Pascal would also
 653 prevent you performing arithmetic with Booleans, and would require
 654 that the expression in an \keyword{if} statement always evaluate to a
 655 Boolean result.  Python is not this strict and never will be, as
 656 \pep{285} explicitly says.  This means you can still use any
 657 expression in an \keyword{if} statement, even ones that evaluate to a
 658 list or tuple or some random object.  The Boolean type is a
 659 subclass of the \class{int} class so that arithmetic using a Boolean
 660 still works.
 661
 662 \begin{verbatim}
 663 >>> True + 1
 664 2
 665 >>> False + 1
 666 1
 667 >>> False * 75
 668 0
 669 >>> True * 75
 670 75
 671 \end{verbatim}
 672
 673 To sum up \constant{True} and \constant{False} in a sentence: they're
 674 alternative ways to spell the integer values 1 and 0, with the single
 675 difference that \function{str()} and \function{repr()} return the
 676 strings \code{'True'} and \code{'False'} instead of \code{'1'} and
 677 \code{'0'}.
 678
 679 \begin{seealso}
 680
 681 \seepep{285}{Adding a bool type}{Written and implemented by GvR.}
 682
 683 \end{seealso}
 684
 685
 686 %======================================================================
 687 \section{PEP 293: Codec Error Handling Callbacks}
 688
 689 When encoding a Unicode string into a byte string, unencodable
 690 characters may be encountered.  So far, Python has allowed specifying
 691 the error processing as either ``strict'' (raising
 692 \exception{UnicodeError}), ``ignore'' (skipping the character), or
 693 ``replace'' (using a question mark in the output string), with
 694 ``strict'' being the default behavior. It may be desirable to specify
 695 alternative processing of such errors, such as inserting an XML
 696 character reference or HTML entity reference into the converted
 697 string.
 698
 699 Python now has a flexible framework to add different processing
 700 strategies.  New error handlers can be added with
 701 \function{codecs.register_error}, and codecs then can access the error
 702 handler with \function{codecs.lookup_error}. An equivalent C API has
 703 been added for codecs written in C. The error handler gets the
 704 necessary state information such as the string being converted, the
 705 position in the string where the error was detected, and the target
 706 encoding.  The handler can then either raise an exception or return a
 707 replacement string.
 708
 709 Two additional error handlers have been implemented using this
 710 framework: ``backslashreplace'' uses Python backslash quoting to
 711 represent unencodable characters and ``xmlcharrefreplace'' emits
 712 XML character references.
 713
 714 \begin{seealso}
 715
 716 \seepep{293}{Codec Error Handling Callbacks}{Written and implemented by
 717 Walter D\"orwald.}
 718
 719 \end{seealso}
 720
 721
 722 %======================================================================
 723 \section{PEP 301: Package Index and Metadata for
 724 Distutils\label{section-pep301}}
 725
 726 Support for the long-requested Python catalog makes its first
 727 appearance in 2.3.
 728
 729 The heart of the catalog is the new Distutils \command{register} command.
 730 Running \code{python setup.py register} will collect the metadata
 731 describing a package, such as its name, version, maintainer,
 732 description, \&c., and send it to a central catalog server.  The
 733 resulting catalog is available from \url{http://www.python.org/pypi}.
 734
 735 To make the catalog a bit more useful, a new optional
 736 \var{classifiers} keyword argument has been added to the Distutils
 737 \function{setup()} function.  A list of
 738 \ulink{Trove}{http://catb.org/\textasciitilde esr/trove/}-style
 739 strings can be supplied to help classify the software.
 740
 741 Here's an example \file{setup.py} with classifiers, written to be compatible
 742 with older versions of the Distutils:
 743
 744 \begin{verbatim}
 745 from distutils import core
 746 kw = {'name': "Quixote",
 747       'version': "0.5.1",
 748       'description': "A highly Pythonic Web application framework",
 749       # ...
 750       }
 751
 752 if (hasattr(core, 'setup_keywords') and
 753     'classifiers' in core.setup_keywords):
 754     kw['classifiers'] = \
 755         ['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
 756          'Environment :: No Input/Output (Daemon)',
 757          'Intended Audience :: Developers'],
 758
 759 core.setup(**kw)
 760 \end{verbatim}
 761
 762 The full list of classifiers can be obtained by running
 763 \verb|python setup.py register --list-classifiers|.
 764
 765 \begin{seealso}
 766
 767 \seepep{301}{Package Index and Metadata for Distutils}{Written and
 768 implemented by Richard Jones.}
 769
 770 \end{seealso}
 771
 772
 773 %======================================================================
 774 \section{PEP 302: New Import Hooks \label{section-pep302}}
 775
 776 While it's been possible to write custom import hooks ever since the
 777 \module{ihooks} module was introduced in Python 1.3, no one has ever
 778 been really happy with it because writing new import hooks is
 779 difficult and messy.  There have been various proposed alternatives
 780 such as the \module{imputil} and \module{iu} modules, but none of them
 781 has ever gained much acceptance, and none of them were easily usable
 782 from \C{} code.
 783
 784 \pep{302} borrows ideas from its predecessors, especially from
 785 Gordon McMillan's \module{iu} module.  Three new items
 786 are added to the \module{sys} module:
 787
 788 \begin{itemize}
 789   \item \code{sys.path_hooks} is a list of callable objects; most
 790   often they'll be classes.  Each callable takes a string containing a
 791   path and either returns an importer object that will handle imports
 792   from this path or raises an \exception{ImportError} exception if it
 793   can't handle this path.
 794
 795   \item \code{sys.path_importer_cache} caches importer objects for
 796   each path, so \code{sys.path_hooks} will only need to be traversed
 797   once for each path.
 798
 799   \item \code{sys.meta_path} is a list of importer objects that will
 800   be traversed before \code{sys.path} is checked.  This list is
 801   initially empty, but user code can add objects to it.  Additional
 802   built-in and frozen modules can be imported by an object added to
 803   this list.
 804
 805 \end{itemize}
 806
 807 Importer objects must have a single method,
 808 \method{find_module(\var{fullname}, \var{path}=None)}.  \var{fullname}
 809 will be a module or package name, e.g. \samp{string} or
 810 \samp{distutils.core}.  \method{find_module()} must return a loader object
 811 that has a single method, \method{load_module(\var{fullname})}, that
 812 creates and returns the corresponding module object.
 813
 814 Pseudo-code for Python's new import logic, therefore, looks something
 815 like this (simplified a bit; see \pep{302} for the full details):
 816
 817 \begin{verbatim}
 818 for mp in sys.meta_path:
 819     loader = mp(fullname)
 820     if loader is not None:
 821         <module> = loader.load_module(fullname)
 822
 823 for path in sys.path:
 824     for hook in sys.path_hooks:
 825         try:
 826             importer = hook(path)
 827         except ImportError:
 828             # ImportError, so try the other path hooks
 829             pass
 830         else:
 831             loader = importer.find_module(fullname)
 832             <module> = loader.load_module(fullname)
 833
 834 # Not found!
 835 raise ImportError
 836 \end{verbatim}
 837
 838 \begin{seealso}
 839
 840 \seepep{302}{New Import Hooks}{Written by Just van~Rossum and Paul Moore.
 841 Implemented by Just van~Rossum.
 842 }
 843
 844 \end{seealso}
 845
 846
 847 %======================================================================
 848 \section{PEP 305: Comma-separated Files \label{section-pep305}}
 849
 850 Comma-separated files are a format frequently used for exporting data
 851 from databases and spreadsheets.  Python 2.3 adds a parser for
 852 comma-separated files.
 853
 854 Comma-separated format is deceptively simple at first glance:
 855
 856 \begin{verbatim}
 857 Costs,150,200,3.95
 858 \end{verbatim}
 859
 860 Read a line and call \code{line.split(',')}: what could be simpler?
 861 But toss in string data that can contain commas, and things get more
 862 complicated:
 863
 864 \begin{verbatim}
 865 "Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"
 866 \end{verbatim}
 867
 868 A big ugly regular expression can parse this, but using the new
 869 \module{csv} package is much simpler:
 870
 871 \begin{verbatim}
 872 import csv
 873
 874 input = open('datafile', 'rb')
 875 reader = csv.reader(input)
 876 for line in reader:
 877     print line
 878 \end{verbatim}
 879
 880 The \function{reader} function takes a number of different options.
 881 The field separator isn't limited to the comma and can be changed to
 882 any character, and so can the quoting and line-ending characters.
 883
 884 Different dialects of comma-separated files can be defined and
 885 registered; currently there are two dialects, both used by Microsoft Excel.
 886 A separate \class{csv.writer} class will generate comma-separated files
 887 from a succession of tuples or lists, quoting strings that contain the
 888 delimiter.
 889
 890 \begin{seealso}
 891
 892 \seepep{305}{CSV File API}{Written and implemented
 893 by Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells.
 894 }
 895
 896 \end{seealso}
 897
 898 %======================================================================
 899 \section{PEP 307: Pickle Enhancements \label{section-pep305}}
 900
 901 The \module{pickle} and \module{cPickle} modules received some
 902 attention during the 2.3 development cycle.  In 2.2, new-style classes
 903 could be pickled without difficulty, but they weren't pickled very
 904 compactly; \pep{307} quotes a trivial example where a new-style class
 905 results in a pickled string three times longer than that for a classic
 906 class.
 907
 908 The solution was to invent a new pickle protocol.  The
 909 \function{pickle.dumps()} function has supported a text-or-binary flag
 910 for a long time.  In 2.3, this flag is redefined from a Boolean to an
 911 integer: 0 is the old text-mode pickle format, 1 is the old binary
 912 format, and now 2 is a new 2.3-specific format.  A new constant,
 913 \constant{pickle.HIGHEST_PROTOCOL}, can be used to select the fanciest
 914 protocol available.
 915
 916 Unpickling is no longer considered a safe operation.  2.2's
 917 \module{pickle} provided hooks for trying to prevent unsafe classes
 918 from being unpickled (specifically, a
 919 \member{__safe_for_unpickling__} attribute), but none of this code
 920 was ever audited and therefore it's all been ripped out in 2.3.  You
 921 should not unpickle untrusted data in any version of Python.
 922
 923 To reduce the pickling overhead for new-style classes, a new interface
 924 for customizing pickling was added using three special methods:
 925 \method{__getstate__}, \method{__setstate__}, and
 926 \method{__getnewargs__}.  Consult \pep{307} for the full semantics
 927 of these methods.
 928
 929 As a way to compress pickles yet further, it's now possible to use
 930 integer codes instead of long strings to identify pickled classes.
 931 The Python Software Foundation will maintain a list of standardized
 932 codes; there's also a range of codes for private use.  Currently no
 933 codes have been specified.
 934
 935 \begin{seealso}
 936
 937 \seepep{307}{Extensions to the pickle protocol}{Written and implemented
 938 by Guido van Rossum and Tim Peters.}
 939
 940 \end{seealso}
 941
 942 %======================================================================
 943 \section{Extended Slices\label{section-slices}}
 944
 945 Ever since Python 1.4, the slicing syntax has supported an optional
 946 third ``step'' or ``stride'' argument.  For example, these are all
 947 legal Python syntax: \code{L[1:10:2]}, \code{L[:-1:1]},
 948 \code{L[::-1]}.  This was added to Python at the request of
 949 the developers of Numerical Python, which uses the third argument
 950 extensively.  However, Python's built-in list, tuple, and string
 951 sequence types have never supported this feature, raising a
 952 \exception{TypeError} if you tried it.  Michael Hudson contributed a
 953 patch to fix this shortcoming.
 954
 955 For example, you can now easily extract the elements of a list that
 956 have even indexes:
 957
 958 \begin{verbatim}
 959 >>> L = range(10)
 960 >>> L[::2]
 961 [0, 2, 4, 6, 8]
 962 \end{verbatim}
 963
 964 Negative values also work to make a copy of the same list in reverse
 965 order:
 966
 967 \begin{verbatim}
 968 >>> L[::-1]
 969 [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
 970 \end{verbatim}
 971
 972 This also works for tuples, arrays, and strings:
 973
 974 \begin{verbatim}
 975 >>> s='abcd'
 976 >>> s[::2]
 977 'ac'
 978 >>> s[::-1]
 979 'dcba'
 980 \end{verbatim}
 981
 982 If you have a mutable sequence such as a list or an array you can
 983 assign to or delete an extended slice, but there are some differences
 984 between assignment to extended and regular slices.  Assignment to a
 985 regular slice can be used to change the length of the sequence:
 986
 987 \begin{verbatim}
 988 >>> a = range(3)
 989 >>> a
 990 [0, 1, 2]
 991 >>> a[1:3] = [4, 5, 6]
 992 >>> a
 993 [0, 4, 5, 6]
 994 \end{verbatim}
 995
 996 Extended slices aren't this flexible.  When assigning to an extended
 997 slice, the list on the right hand side of the statement must contain
 998 the same number of items as the slice it is replacing:
 999
1000 \begin{verbatim}
1001 >>> a = range(4)
1002 >>> a
1003 [0, 1, 2, 3]
1004 >>> a[::2]
1005 [0, 2]
1006 >>> a[::2] = [0, -1]
1007 >>> a
1008 [0, 1, -1, 3]
1009 >>> a[::2] = [0,1,2]
1010 Traceback (most recent call last):
1011   File "<stdin>", line 1, in ?
1012 ValueError: attempt to assign sequence of size 3 to extended slice of size 2
1013 \end{verbatim}
1014
1015 Deletion is more straightforward:
1016
1017 \begin{verbatim}
1018 >>> a = range(4)
1019 >>> a
1020 [0, 1, 2, 3]
1021 >>> a[::2]
1022 [0, 2]
1023 >>> del a[::2]
1024 >>> a
1025 [1, 3]
1026 \end{verbatim}
1027
1028 One can also now pass slice objects to the
1029 \method{__getitem__} methods of the built-in sequences:
1030
1031 \begin{verbatim}
1032 >>> range(10).__getitem__(slice(0, 5, 2))
1033 [0, 2, 4]
1034 \end{verbatim}
1035
1036 Or use slice objects directly in subscripts:
1037
1038 \begin{verbatim}
1039 >>> range(10)[slice(0, 5, 2)]
1040 [0, 2, 4]
1041 \end{verbatim}
1042
1043 To simplify implementing sequences that support extended slicing,
1044 slice objects now have a method \method{indices(\var{length})} which,
1045 given the length of a sequence, returns a \code{(\var{start},
1046 \var{stop}, \var{step})} tuple that can be passed directly to
1047 \function{range()}.
1048 \method{indices()} handles omitted and out-of-bounds indices in a
1049 manner consistent with regular slices (and this innocuous phrase hides
1050 a welter of confusing details!).  The method is intended to be used
1051 like this:
1052
1053 \begin{verbatim}
1054 class FakeSeq:
1055     ...
1056     def calc_item(self, i):
1057         ...
1058     def __getitem__(self, item):
1059         if isinstance(item, slice):
1060             indices = item.indices(len(self))
1061             return FakeSeq([self.calc_item(i) for i in range(*indices)])
1062         else:
1063             return self.calc_item(i)
1064 \end{verbatim}
1065
1066 From this example you can also see that the built-in \class{slice}
1067 object is now the type object for the slice type, and is no longer a
1068 function.  This is consistent with Python 2.2, where \class{int},
1069 \class{str}, etc., underwent the same change.
1070
1071
1072 %======================================================================
1073 \section{Other Language Changes}
1074
1075 Here are all of the changes that Python 2.3 makes to the core Python
1076 language.
1077
1078 \begin{itemize}
1079 \item The \keyword{yield} statement is now always a keyword, as
1080 described in section~\ref{section-generators} of this document.
1081
1082 \item A new built-in function \function{enumerate()}
1083 was added, as described in section~\ref{section-enumerate} of this
1084 document.
1085
1086 \item Two new constants, \constant{True} and \constant{False} were
1087 added along with the built-in \class{bool} type, as described in
1088 section~\ref{section-bool} of this document.
1089
1090 \item The \function{int()} type constructor will now return a long
1091 integer instead of raising an \exception{OverflowError} when a string
1092 or floating-point number is too large to fit into an integer.  This
1093 can lead to the paradoxical result that
1094 \code{isinstance(int(\var{expression}), int)} is false, but that seems
1095 unlikely to cause problems in practice.
1096
1097 \item Built-in types now support the extended slicing syntax,
1098 as described in section~\ref{section-slices} of this document.
1099
1100 \item A new built-in function, \function{sum(\var{iterable}, \var{start}=0)},
1101 adds up the numeric items in the iterable object and returns their sum.
1102 \function{sum()} only accepts numbers, meaning that you can't use it
1103 to concatenate a bunch of strings.   (Contributed by Alex
1104 Martelli.)
1105
1106 \item \code{list.insert(\var{pos}, \var{value})} used to
1107 insert \var{value} at the front of the list when \var{pos} was
1108 negative.  The behaviour has now been changed to be consistent with
1109 slice indexing, so when \var{pos} is -1 the value will be inserted
1110 before the last element, and so forth.
1111
1112 \item \code{list.index(\var{value})}, which searches for \var{value}
1113 within the list and returns its index, now takes optional
1114 \var{start} and \var{stop} arguments to limit the search to
1115 only part of the list.
1116
1117 \item Dictionaries have a new method, \method{pop(\var{key}\optional{,
1118 \var{default}})}, that returns the value corresponding to \var{key}
1119 and removes that key/value pair from the dictionary.  If the requested
1120 key isn't present in the dictionary, \var{default} is returned if it's
1121 specified and \exception{KeyError} raised if it isn't.
1122
1123 \begin{verbatim}
1124 >>> d = {1:2}
1125 >>> d
1126 {1: 2}
1127 >>> d.pop(4)
1128 Traceback (most recent call last):
1129   File "stdin", line 1, in ?
1130 KeyError: 4
1131 >>> d.pop(1)
1132 2
1133 >>> d.pop(1)
1134 Traceback (most recent call last):
1135   File "stdin", line 1, in ?
1136 KeyError: 'pop(): dictionary is empty'
1137 >>> d
1138 {}
1139 >>>
1140 \end{verbatim}
1141
1142 There's also a new class method,
1143 \method{dict.fromkeys(\var{iterable}, \var{value})}, that
1144 creates a dictionary with keys taken from the supplied iterator
1145 \var{iterable} and all values set to \var{value}, defaulting to
1146 \code{None}.
1147
1148 (Patches contributed by Raymond Hettinger.)
1149
1150 Also, the \function{dict()} constructor now accepts keyword arguments to
1151 simplify creating small dictionaries:
1152
1153 \begin{verbatim}
1154 >>> dict(red=1, blue=2, green=3, black=4)
1155 {'blue': 2, 'black': 4, 'green': 3, 'red': 1}
1156 \end{verbatim}
1157
1158 (Contributed by Just van~Rossum.)
1159
1160 \item The \keyword{assert} statement no longer checks the \code{__debug__}
1161 flag, so you can no longer disable assertions by assigning to \code{__debug__}.
1162 Running Python with the \programopt{-O} switch will still generate
1163 code that doesn't execute any assertions.
1164
1165 \item Most type objects are now callable, so you can use them
1166 to create new objects such as functions, classes, and modules.  (This
1167 means that the \module{new} module can be deprecated in a future
1168 Python version, because you can now use the type objects available in
1169 the \module{types} module.)
1170 % XXX should new.py use PendingDeprecationWarning?
1171 For example, you can create a new module object with the following code:
1172
1173 \begin{verbatim}
1174 >>> import types
1175 >>> m = types.ModuleType('abc','docstring')
1176 >>> m
1177 <module 'abc' (built-in)>
1178 >>> m.__doc__
1179 'docstring'
1180 \end{verbatim}
1181
1182 \item
1183 A new warning, \exception{PendingDeprecationWarning} was added to
1184 indicate features which are in the process of being
1185 deprecated.  The warning will \emph{not} be printed by default.  To
1186 check for use of features that will be deprecated in the future,
1187 supply \programopt{-Walways::PendingDeprecationWarning::} on the
1188 command line or use \function{warnings.filterwarnings()}.
1189
1190 \item The process of deprecating string-based exceptions, as
1191 in \code{raise "Error occurred"}, has begun.  Raising a string will
1192 now trigger \exception{PendingDeprecationWarning}.
1193
1194 \item Using \code{None} as a variable name will now result in a
1195 \exception{SyntaxWarning} warning.  In a future version of Python,
1196 \code{None} may finally become a keyword.
1197
1198 \item The \method{xreadlines()} method of file objects, introduced in
1199 Python 2.1, is no longer necessary because files now behave as their
1200 own iterator.  \method{xreadlines()} was originally introduced as a
1201 faster way to loop over all the lines in a file, but now you can
1202 simply write \code{for line in file_obj}.  File objects also have a
1203 new read-only \member{encoding} attribute that gives the encoding used
1204 by the file; Unicode strings written to the file will be automatically
1205 converted to bytes using the given encoding.
1206
1207 \item The method resolution order used by new-style classes has
1208 changed, though you'll only notice the difference if you have a really
1209 complicated inheritance hierarchy.  Classic classes are unaffected by
1210 this change.  Python 2.2 originally used a topological sort of a
1211 class's ancestors, but 2.3 now uses the C3 algorithm as described in
1212 the paper \ulink{``A Monotonic Superclass Linearization for
1213 Dylan''}{http://www.webcom.com/haahr/dylan/linearization-oopsla96.html}.
1214 To understand the motivation for this change,
1215 read Michele Simionato's article
1216 \ulink{``Python 2.3 Method Resolution Order''}
1217       {http://www.python.org/2.3/mro.html}, or
1218 read the thread on python-dev starting with the message at
1219 \url{http://mail.python.org/pipermail/python-dev/2002-October/029035.html}.
1220 Samuele Pedroni first pointed out the problem and also implemented the
1221 fix by coding the C3 algorithm.
1222
1223 \item Python runs multithreaded programs by switching between threads
1224 after executing N bytecodes.  The default value for N has been
1225 increased from 10 to 100 bytecodes, speeding up single-threaded
1226 applications by reducing the switching overhead.  Some multithreaded
1227 applications may suffer slower response time, but that's easily fixed
1228 by setting the limit back to a lower number using
1229 \function{sys.setcheckinterval(\var{N})}.
1230 The limit can be retrieved with the new
1231 \function{sys.getcheckinterval()} function.
1232
1233 \item One minor but far-reaching change is that the names of extension
1234 types defined by the modules included with Python now contain the
1235 module and a \character{.} in front of the type name.  For example, in
1236 Python 2.2, if you created a socket and printed its
1237 \member{__class__}, you'd get this output:
1238
1239 \begin{verbatim}
1240 >>> s = socket.socket()
1241 >>> s.__class__
1242 <type 'socket'>
1243 \end{verbatim}
1244
1245 In 2.3, you get this:
1246 \begin{verbatim}
1247 >>> s.__class__
1248 <type '_socket.socket'>
1249 \end{verbatim}
1250
1251 \item One of the noted incompatibilities between old- and new-style
1252   classes has been removed: you can now assign to the
1253   \member{__name__} and \member{__bases__} attributes of new-style
1254   classes.  There are some restrictions on what can be assigned to
1255   \member{__bases__} along the lines of those relating to assigning to
1256   an instance's \member{__class__} attribute.
1257
1258 \end{itemize}
1259
1260
1261 %======================================================================
1262 \subsection{String Changes}
1263
1264 \begin{itemize}
1265
1266 \item The \keyword{in} operator now works differently for strings.
1267 Previously, when evaluating \code{\var{X} in \var{Y}} where \var{X}
1268 and \var{Y} are strings, \var{X} could only be a single character.
1269 That's now changed; \var{X} can be a string of any length, and
1270 \code{\var{X} in \var{Y}} will return \constant{True} if \var{X} is a
1271 substring of \var{Y}.  If \var{X} is the empty string, the result is
1272 always \constant{True}.
1273
1274 \begin{verbatim}
1275 >>> 'ab' in 'abcd'
1276 True
1277 >>> 'ad' in 'abcd'
1278 False
1279 >>> '' in 'abcd'
1280 True
1281 \end{verbatim}
1282
1283 Note that this doesn't tell you where the substring starts; if you
1284 need that information, use the \method{find()} string method.
1285
1286 \item The \method{strip()}, \method{lstrip()}, and \method{rstrip()}
1287 string methods now have an optional argument for specifying the
1288 characters to strip.  The default is still to remove all whitespace
1289 characters:
1290
1291 \begin{verbatim}
1292 >>> '   abc '.strip()
1293 'abc'
1294 >>> '><><abc<><><>'.strip('<>')
1295 'abc'
1296 >>> '><><abc<><><>\n'.strip('<>')
1297 'abc<><><>\n'
1298 >>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
1299 u'\u4001abc'
1300 >>>
1301 \end{verbatim}
1302
1303 (Suggested by Simon Brunning and implemented by Walter D\"orwald.)
1304
1305 \item The \method{startswith()} and \method{endswith()}
1306 string methods now accept negative numbers for the \var{start} and \var{end}
1307 parameters.
1308
1309 \item Another new string method is \method{zfill()}, originally a
1310 function in the \module{string} module.  \method{zfill()} pads a
1311 numeric string with zeros on the left until it's the specified width.
1312 Note that the \code{\%} operator is still more flexible and powerful
1313 than \method{zfill()}.
1314
1315 \begin{verbatim}
1316 >>> '45'.zfill(4)
1317 '0045'
1318 >>> '12345'.zfill(4)
1319 '12345'
1320 >>> 'goofy'.zfill(6)
1321 '0goofy'
1322 \end{verbatim}
1323
1324 (Contributed by Walter D\"orwald.)
1325
1326 \item A new type object, \class{basestring}, has been added.
1327    Both 8-bit strings and Unicode strings inherit from this type, so
1328    \code{isinstance(obj, basestring)} will return \constant{True} for
1329    either kind of string.  It's a completely abstract type, so you
1330    can't create \class{basestring} instances.
1331
1332 \item Interned strings are no longer immortal and will now be
1333 garbage-collected in the usual way when the only reference to them is
1334 from the internal dictionary of interned strings.  (Implemented by
1335 Oren Tirosh.)
1336
1337 \end{itemize}
1338
1339
1340 %======================================================================
1341 \subsection{Optimizations}
1342
1343 \begin{itemize}
1344
1345 \item The creation of new-style class instances has been made much
1346 faster; they're now faster than classic classes!
1347
1348 \item The \method{sort()} method of list objects has been extensively
1349 rewritten by Tim Peters, and the implementation is significantly
1350 faster.
1351
1352 \item Multiplication of large long integers is now much faster thanks
1353 to an implementation of Karatsuba multiplication, an algorithm that
1354 scales better than the O(n*n) required for the grade-school
1355 multiplication algorithm.  (Original patch by Christopher A. Craig,
1356 and significantly reworked by Tim Peters.)
1357
1358 \item The \code{SET_LINENO} opcode is now gone.  This may provide a
1359 small speed increase, depending on your compiler's idiosyncrasies.
1360 See section~\ref{section-other} for a longer explanation.
1361 (Removed by Michael Hudson.)
1362
1363 \item \function{xrange()} objects now have their own iterator, making
1364 \code{for i in xrange(n)} slightly faster than
1365 \code{for i in range(n)}.  (Patch by Raymond Hettinger.)
1366
1367 \item A number of small rearrangements have been made in various
1368 hotspots to improve performance, such as inlining a function or removing
1369 some code.  (Implemented mostly by GvR, but lots of people have
1370 contributed single changes.)
1371
1372 \end{itemize}
1373
1374 The net result of the 2.3 optimizations is that Python 2.3 runs the
1375 pystone benchmark around 25\% faster than Python 2.2.
1376
1377
1378 %======================================================================
1379 \section{New, Improved, and Deprecated Modules}
1380
1381 As usual, Python's standard library received a number of enhancements and
1382 bug fixes.  Here's a partial list of the most notable changes, sorted
1383 alphabetically by module name. Consult the
1384 \file{Misc/NEWS} file in the source tree for a more
1385 complete list of changes, or look through the CVS logs for all the
1386 details.
1387
1388 \begin{itemize}
1389
1390 \item The \module{array} module now supports arrays of Unicode
1391 characters using the \character{u} format character.  Arrays also now
1392 support using the \code{+=} assignment operator to add another array's
1393 contents, and the \code{*=} assignment operator to repeat an array.
1394 (Contributed by Jason Orendorff.)
1395
1396 \item The \module{bsddb} module has been replaced by version 4.1.6
1397 of the \ulink{PyBSDDB}{http://pybsddb.sourceforge.net} package,
1398 providing a more complete interface to the transactional features of
1399 the BerkeleyDB library.
1400
1401 The old version of the module has been renamed to
1402 \module{bsddb185} and is no longer built automatically; you'll
1403 have to edit \file{Modules/Setup} to enable it.  Note that the new
1404 \module{bsddb} package is intended to be compatible with the
1405 old module, so be sure to file bugs if you discover any
1406 incompatibilities.  When upgrading to Python 2.3, if the new interpreter is compiled
1407 with a new version of
1408 the underlying BerkeleyDB library, you will almost certainly have to
1409 convert your database files to the new version.  You can do this
1410 fairly easily with the new scripts \file{db2pickle.py} and
1411 \file{pickle2db.py} which you will find in the distribution's
1412 \file{Tools/scripts} directory.  If you've already been using the PyBSDDB
1413 package and importing it as \module{bsddb3}, you will have to change your
1414 \code{import} statements to import it as \module{bsddb}.
1415
1416 \item The new \module{bz2} module is an interface to the bz2 data
1417 compression library.  bz2-compressed data is usually smaller than
1418 corresponding \module{zlib}-compressed data. (Contributed by Gustavo Niemeyer.)
1419
1420 \item A set of standard date/time types has been added in the new \module{datetime}
1421 module.  See the following section for more details.
1422
1423 \item The Distutils \class{Extension} class now supports
1424 an extra constructor argument named \var{depends} for listing
1425 additional source files that an extension depends on.  This lets
1426 Distutils recompile the module if any of the dependency files are
1427 modified.  For example, if \file{sampmodule.c} includes the header
1428 file \file{sample.h}, you would create the \class{Extension} object like
1429 this:
1430
1431 \begin{verbatim}
1432 ext = Extension("samp",
1433                 sources=["sampmodule.c"],
1434                 depends=["sample.h"])
1435 \end{verbatim}
1436
1437 Modifying \file{sample.h} would then cause the module to be recompiled.
1438 (Contributed by Jeremy Hylton.)
1439
1440 \item Other minor changes to Distutils:
1441 it now checks for the \envvar{CC}, \envvar{CFLAGS}, \envvar{CPP},
1442 \envvar{LDFLAGS}, and \envvar{CPPFLAGS} environment variables, using
1443 them to override the settings in Python's configuration (contributed
1444 by Robert Weber).
1445
1446 \item Previously the \module{doctest} module would only search the
1447 docstrings of public methods and functions for test cases, but it now
1448 also examines private ones as well.  The \function{DocTestSuite(}
1449 function creates a \class{unittest.TestSuite} object from a set of
1450 \module{doctest} tests.
1451
1452 \item The new \function{gc.get_referents(\var{object})} function returns a
1453 list of all the objects referenced by \var{object}.
1454
1455 \item The \module{getopt} module gained a new function,
1456 \function{gnu_getopt()}, that supports the same arguments as the existing
1457 \function{getopt()} function but uses GNU-style scanning mode.
1458 The existing \function{getopt()} stops processing options as soon as a
1459 non-option argument is encountered, but in GNU-style mode processing
1460 continues, meaning that options and arguments can be mixed.  For
1461 example:
1462
1463 \begin{verbatim}
1464 >>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1465 ([('-f', 'filename')], ['output', '-v'])
1466 >>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1467 ([('-f', 'filename'), ('-v', '')], ['output'])
1468 \end{verbatim}
1469
1470 (Contributed by Peter \AA{strand}.)
1471
1472 \item The \module{grp}, \module{pwd}, and \module{resource} modules
1473 now return enhanced tuples:
1474
1475 \begin{verbatim}
1476 >>> import grp
1477 >>> g = grp.getgrnam('amk')
1478 >>> g.gr_name, g.gr_gid
1479 ('amk', 500)
1480 \end{verbatim}
1481
1482 \item The \module{gzip} module can now handle files exceeding 2~Gb.
1483
1484 \item The new \module{heapq} module contains an implementation of a
1485 heap queue algorithm.  A heap is an array-like data structure that
1486 keeps items in a partially sorted order such that, for every index
1487 \var{k}, \code{heap[\var{k}] <= heap[2*\var{k}+1]} and
1488 \code{heap[\var{k}] <= heap[2*\var{k}+2]}.  This makes it quick to
1489 remove the smallest item, and inserting a new item while maintaining
1490 the heap property is O(lg~n).  (See
1491 \url{http://www.nist.gov/dads/HTML/priorityque.html} for more
1492 information about the priority queue data structure.)
1493
1494 The \module{heapq} module provides \function{heappush()} and
1495 \function{heappop()} functions for adding and removing items while
1496 maintaining the heap property on top of some other mutable Python
1497 sequence type.  Here's an example that uses a Python list:
1498
1499 \begin{verbatim}
1500 >>> import heapq
1501 >>> heap = []
1502 >>> for item in [3, 7, 5, 11, 1]:
1503 ...    heapq.heappush(heap, item)
1504 ...
1505 >>> heap
1506 [1, 3, 5, 11, 7]
1507 >>> heapq.heappop(heap)
1508 1
1509 >>> heapq.heappop(heap)
1510 3
1511 >>> heap
1512 [5, 7, 11]
1513 \end{verbatim}
1514
1515 (Contributed by Kevin O'Connor.)
1516
1517 \item The IDLE integrated development environment has been updated
1518 using the code from the IDLEfork project
1519 (\url{http://idlefork.sf.net}).  The most notable feature is that the
1520 code being developed is now executed in a subprocess, meaning that
1521 there's no longer any need for manual \code{reload()} operations.
1522 IDLE's core code has been incorporated into the standard library as the
1523 \module{idlelib} package.
1524
1525 \item The \module{imaplib} module now supports IMAP over SSL.
1526 (Contributed by Piers Lauder and Tino Lange.)
1527
1528 \item The \module{itertools} contains a number of useful functions for
1529 use with iterators, inspired by various functions provided by the ML
1530 and Haskell languages.  For example,
1531 \code{itertools.ifilter(predicate, iterator)} returns all elements in
1532 the iterator for which the function \function{predicate()} returns
1533 \constant{True}, and \code{itertools.repeat(obj, \var{N})} returns
1534 \code{obj} \var{N} times.  There are a number of other functions in
1535 the module; see the \ulink{package's reference
1536 documentation}{../lib/module-itertools.html} for details.
1537 (Contributed by Raymond Hettinger.)
1538
1539 \item Two new functions in the \module{math} module,
1540 \function{degrees(\var{rads})} and \function{radians(\var{degs})},
1541 convert between radians and degrees.  Other functions in the
1542 \module{math} module such as \function{math.sin()} and
1543 \function{math.cos()} have always required input values measured in
1544 radians.  Also, an optional \var{base} argument was added to
1545 \function{math.log()} to make it easier to compute logarithms for
1546 bases other than \code{e} and \code{10}.  (Contributed by Raymond
1547 Hettinger.)
1548
1549 \item Several new POSIX functions (\function{getpgid()}, \function{killpg()},
1550 \function{lchown()}, \function{loadavg()}, \function{major()}, \function{makedev()},
1551 \function{minor()}, and \function{mknod()}) were added to the
1552 \module{posix} module that underlies the \module{os} module.
1553 (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.)
1554
1555 \item In the \module{os} module, the \function{*stat()} family of
1556 functions can now report fractions of a second in a timestamp.  Such
1557 time stamps are represented as floats, similar to
1558 the value returned by \function{time.time()}.
1559
1560 During testing, it was found that some applications will break if time
1561 stamps are floats.  For compatibility, when using the tuple interface
1562 of the \class{stat_result} time stamps will be represented as integers.
1563 When using named fields (a feature first introduced in Python 2.2),
1564 time stamps are still represented as integers, unless
1565 \function{os.stat_float_times()} is invoked to enable float return
1566 values:
1567
1568 \begin{verbatim}
1569 >>> os.stat("/tmp").st_mtime
1570 1034791200
1571 >>> os.stat_float_times(True)
1572 >>> os.stat("/tmp").st_mtime
1573 1034791200.6335014
1574 \end{verbatim}
1575
1576 In Python 2.4, the default will change to always returning floats.
1577
1578 Application developers should enable this feature only if all their
1579 libraries work properly when confronted with floating point time
1580 stamps, or if they use the tuple API. If used, the feature should be
1581 activated on an application level instead of trying to enable it on a
1582 per-use basis.
1583
1584 \item The \module{optparse} module contains a new parser for command-line arguments
1585 that can convert option values to a particular Python type
1586 and will automatically generate a usage message.  See the following section for
1587 more details.
1588
1589 \item The old and never-documented \module{linuxaudiodev} module has
1590 been deprecated, and a new version named \module{ossaudiodev} has been
1591 added.  The module was renamed because the OSS sound drivers can be
1592 used on platforms other than Linux, and the interface has also been
1593 tidied and brought up to date in various ways. (Contributed by Greg
1594 Ward and Nicholas FitzRoy-Dale.)
1595
1596 \item The new \module{platform} module contains a number of functions
1597 that try to determine various properties of the platform you're
1598 running on.  There are functions for getting the architecture, CPU
1599 type, the Windows OS version, and even the Linux distribution version.
1600 (Contributed by Marc-Andr\'e Lemburg.)
1601
1602 \item The parser objects provided by the \module{pyexpat} module
1603 can now optionally buffer character data, resulting in fewer calls to
1604 your character data handler and therefore faster performance.  Setting
1605 the parser object's \member{buffer_text} attribute to \constant{True}
1606 will enable buffering.
1607
1608 \item The \function{sample(\var{population}, \var{k})} function was
1609 added to the \module{random} module.  \var{population} is a sequence or
1610 \class{xrange} object containing the elements of a population, and
1611 \function{sample()} chooses \var{k} elements from the population without
1612 replacing chosen elements.  \var{k} can be any value up to
1613 \code{len(\var{population})}. For example:
1614
1615 \begin{verbatim}
1616 >>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn']
1617 >>> random.sample(days, 3)      # Choose 3 elements
1618 ['St', 'Sn', 'Th']
1619 >>> random.sample(days, 7)      # Choose 7 elements
1620 ['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']
1621 >>> random.sample(days, 7)      # Choose 7 again
1622 ['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']
1623 >>> random.sample(days, 8)      # Can't choose eight
1624 Traceback (most recent call last):
1625   File "<stdin>", line 1, in ?
1626   File "random.py", line 414, in sample
1627       raise ValueError, "sample larger than population"
1628 ValueError: sample larger than population
1629 >>> random.sample(xrange(1,10000,2), 10)   # Choose ten odd nos. under 10000
1630 [3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
1631 \end{verbatim}
1632
1633 The \module{random} module now uses a new algorithm, the Mersenne
1634 Twister, implemented in C.  It's faster and more extensively studied
1635 than the previous algorithm.
1636
1637 (All changes contributed by Raymond Hettinger.)
1638
1639 \item The \module{readline} module also gained a number of new
1640 functions: \function{get_history_item()},
1641 \function{get_current_history_length()}, and \function{redisplay()}.
1642
1643 \item The \module{rexec} and \module{Bastion} modules have been
1644 declared dead, and attempts to import them will fail with a
1645 \exception{RuntimeError}.  New-style classes provide new ways to break
1646 out of the restricted execution environment provided by
1647 \module{rexec}, and no one has interest in fixing them or time to do
1648 so.  If you have applications using \module{rexec}, rewrite them to
1649 use something else.
1650
1651 (Sticking with Python 2.2 or 2.1 will not make your applications any
1652 safer because there are known bugs in the \module{rexec} module in
1653 those versions.  To repeat: if you're using \module{rexec}, stop using
1654 it immediately.)
1655
1656 \item The \module{rotor} module has been deprecated because the
1657   algorithm it uses for encryption is not believed to be secure.  If
1658   you need encryption, use one of the several AES Python modules
1659   that are available separately.
1660
1661 \item The \module{shutil} module gained a \function{move(\var{src},
1662 \var{dest})} function that recursively moves a file or directory to a new
1663 location.
1664
1665 \item Support for more advanced POSIX signal handling was added
1666 to the \module{signal} but then removed again as it proved impossible
1667 to make it work reliably across platforms.
1668
1669 \item The \module{socket} module now supports timeouts.  You
1670 can call the \method{settimeout(\var{t})} method on a socket object to
1671 set a timeout of \var{t} seconds.  Subsequent socket operations that
1672 take longer than \var{t} seconds to complete will abort and raise a
1673 \exception{socket.timeout} exception.
1674
1675 The original timeout implementation was by Tim O'Malley.  Michael
1676 Gilfix integrated it into the Python \module{socket} module and
1677 shepherded it through a lengthy review.  After the code was checked
1678 in, Guido van~Rossum rewrote parts of it.  (This is a good example of
1679 a collaborative development process in action.)
1680
1681 \item On Windows, the \module{socket} module now ships with Secure
1682 Sockets Layer (SSL) support.
1683
1684 \item The value of the C \constant{PYTHON_API_VERSION} macro is now
1685 exposed at the Python level as \code{sys.api_version}.  The current
1686 exception can be cleared by calling the new \function{sys.exc_clear()}
1687 function.
1688
1689 \item The new \module{tarfile} module
1690 allows reading from and writing to \program{tar}-format archive files.
1691 (Contributed by Lars Gust\"abel.)
1692
1693 \item The new \module{textwrap} module contains functions for wrapping
1694 strings containing paragraphs of text.  The \function{wrap(\var{text},
1695 \var{width})} function takes a string and returns a list containing
1696 the text split into lines of no more than the chosen width.  The
1697 \function{fill(\var{text}, \var{width})} function returns a single
1698 string, reformatted to fit into lines no longer than the chosen width.
1699 (As you can guess, \function{fill()} is built on top of
1700 \function{wrap()}.  For example:
1701
1702 \begin{verbatim}
1703 >>> import textwrap
1704 >>> paragraph = "Not a whit, we defy augury: ... more text ..."
1705 >>> textwrap.wrap(paragraph, 60)
1706 ["Not a whit, we defy augury: there's a special providence in",
1707  "the fall of a sparrow. If it be now, 'tis not to come; if it",
1708  ...]
1709 >>> print textwrap.fill(paragraph, 35)
1710 Not a whit, we defy augury: there's
1711 a special providence in the fall of
1712 a sparrow. If it be now, 'tis not
1713 to come; if it be not to come, it
1714 will be now; if it be not now, yet
1715 it will come: the readiness is all.
1716 >>>
1717 \end{verbatim}
1718
1719 The module also contains a \class{TextWrapper} class that actually
1720 implements the text wrapping strategy.   Both the
1721 \class{TextWrapper} class and the \function{wrap()} and
1722 \function{fill()} functions support a number of additional keyword
1723 arguments for fine-tuning the formatting; consult the \ulink{module's
1724 documentation}{../lib/module-textwrap.html} for details.
1725 (Contributed by Greg Ward.)
1726
1727 \item The \module{thread} and \module{threading} modules now have
1728 companion modules, \module{dummy_thread} and \module{dummy_threading},
1729 that provide a do-nothing implementation of the \module{thread}
1730 module's interface for platforms where threads are not supported.  The
1731 intention is to simplify thread-aware modules (ones that \emph{don't}
1732 rely on threads to run) by putting the following code at the top:
1733
1734 \begin{verbatim}
1735 try:
1736     import threading as _threading
1737 except ImportError:
1738     import dummy_threading as _threading
1739 \end{verbatim}
1740
1741 In this example, \module{_threading} is used as the module name to make
1742 it clear that the module being used is not necessarily the actual
1743 \module{threading} module. Code can call functions and use classes in
1744 \module{_threading} whether or not threads are supported, avoiding an
1745 \keyword{if} statement and making the code slightly clearer.  This
1746 module will not magically make multithreaded code run without threads;
1747 code that waits for another thread to return or to do something will
1748 simply hang forever.
1749
1750 \item The \module{time} module's \function{strptime()} function has
1751 long been an annoyance because it uses the platform C library's
1752 \function{strptime()} implementation, and different platforms
1753 sometimes have odd bugs.  Brett Cannon contributed a portable
1754 implementation that's written in pure Python and should behave
1755 identically on all platforms.
1756
1757 \item The new \module{timeit} module helps measure how long snippets
1758 of Python code take to execute.  The \file{timeit.py} file can be run
1759 directly from the command line, or the module's \class{Timer} class
1760 can be imported and used directly.  Here's a short example that
1761 figures out whether it's faster to convert an 8-bit string to Unicode
1762 by appending an empty Unicode string to it or by using the
1763 \function{unicode()} function:
1764
1765 \begin{verbatim}
1766 import timeit
1767
1768 timer1 = timeit.Timer('unicode("abc")')
1769 timer2 = timeit.Timer('"abc" + u""')
1770
1771 # Run three trials
1772 print timer1.repeat(repeat=3, number=100000)
1773 print timer2.repeat(repeat=3, number=100000)
1774
1775 # On my laptop this outputs:
1776 # [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]
1777 # [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
1778 \end{verbatim}
1779
1780 \item The \module{Tix} module has received various bug fixes and
1781 updates for the current version of the Tix package.
1782
1783 \item The \module{Tkinter} module now works with a thread-enabled
1784 version of Tcl.  Tcl's threading model requires that widgets only be
1785 accessed from the thread in which they're created; accesses from
1786 another thread can cause Tcl to panic.  For certain Tcl interfaces,
1787 \module{Tkinter} will now automatically avoid this
1788 when a widget is accessed from a different thread by marshalling a
1789 command, passing it to the correct thread, and waiting for the
1790 results.  Other interfaces can't be handled automatically but
1791 \module{Tkinter} will now raise an exception on such an access so that
1792 you can at least find out about the problem.  See
1793 \url{http://mail.python.org/pipermail/python-dev/2002-December/031107.html} %
1794 for a more detailed explanation of this change.  (Implemented by
1795 Martin von~L\"owis.)
1796
1797 \item Calling Tcl methods through \module{_tkinter} no longer
1798 returns only strings. Instead, if Tcl returns other objects those
1799 objects are converted to their Python equivalent, if one exists, or
1800 wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent
1801 exists. This behavior can be controlled through the
1802 \method{wantobjects()} method of \class{tkapp} objects.
1803
1804 When using \module{_tkinter} through the \module{Tkinter} module (as
1805 most Tkinter applications will), this feature is always activated. It
1806 should not cause compatibility problems, since Tkinter would always
1807 convert string results to Python types where possible.
1808
1809 If any incompatibilities are found, the old behavior can be restored
1810 by setting the \member{wantobjects} variable in the \module{Tkinter}
1811 module to false before creating the first \class{tkapp} object.
1812
1813 \begin{verbatim}
1814 import Tkinter
1815 Tkinter.wantobjects = 0
1816 \end{verbatim}
1817
1818 Any breakage caused by this change should be reported as a bug.
1819
1820 \item The \module{UserDict} module has a new \class{DictMixin} class which
1821 defines all dictionary methods for classes that already have a minimum
1822 mapping interface.  This greatly simplifies writing classes that need
1823 to be substitutable for dictionaries, such as the classes in
1824 the \module{shelve} module.
1825
1826 Adding the mix-in as a superclass provides the full dictionary
1827 interface whenever the class defines \method{__getitem__},
1828 \method{__setitem__}, \method{__delitem__}, and \method{keys}.
1829 For example:
1830
1831 \begin{verbatim}
1832 >>> import UserDict
1833 >>> class SeqDict(UserDict.DictMixin):
1834 ...     """Dictionary lookalike implemented with lists."""
1835 ...     def __init__(self):
1836 ...         self.keylist = []
1837 ...         self.valuelist = []
1838 ...     def __getitem__(self, key):
1839 ...         try:
1840 ...             i = self.keylist.index(key)
1841 ...         except ValueError:
1842 ...             raise KeyError
1843 ...         return self.valuelist[i]
1844 ...     def __setitem__(self, key, value):
1845 ...         try:
1846 ...             i = self.keylist.index(key)
1847 ...             self.valuelist[i] = value
1848 ...         except ValueError:
1849 ...             self.keylist.append(key)
1850 ...             self.valuelist.append(value)
1851 ...     def __delitem__(self, key):
1852 ...         try:
1853 ...             i = self.keylist.index(key)
1854 ...         except ValueError:
1855 ...             raise KeyError
1856 ...         self.keylist.pop(i)
1857 ...         self.valuelist.pop(i)
1858 ...     def keys(self):
1859 ...         return list(self.keylist)
1860 ...
1861 >>> s = SeqDict()
1862 >>> dir(s)      # See that other dictionary methods are implemented
1863 ['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__',
1864  '__init__', '__iter__', '__len__', '__module__', '__repr__',
1865  '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems',
1866  'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem',
1867  'setdefault', 'update', 'valuelist', 'values']
1868 \end{verbatim}
1869
1870 (Contributed by Raymond Hettinger.)
1871
1872 \item The DOM implementation
1873 in \module{xml.dom.minidom} can now generate XML output in a
1874 particular encoding by providing an optional encoding argument to
1875 the \method{toxml()} and \method{toprettyxml()} methods of DOM nodes.
1876
1877 \item The \module{xmlrpclib} module now supports an XML-RPC extension
1878 for handling nil data values such as Python's \code{None}.  Nil values
1879 are always supported on unmarshalling an XML-RPC response.  To
1880 generate requests containing \code{None}, you must supply a true value
1881 for the \var{allow_none} parameter when creating a \class{Marshaller}
1882 instance.
1883
1884 \item The new \module{DocXMLRPCServer} module allows writing
1885 self-documenting XML-RPC servers. Run it in demo mode (as a program)
1886 to see it in action.   Pointing the Web browser to the RPC server
1887 produces pydoc-style documentation; pointing xmlrpclib to the
1888 server allows invoking the actual methods.
1889 (Contributed by Brian Quinlan.)
1890
1891 \item Support for internationalized domain names (RFCs 3454, 3490,
1892 3491, and 3492) has been added. The ``idna'' encoding can be used
1893 to convert between a Unicode domain name and the ASCII-compatible
1894 encoding (ACE) of that name.
1895
1896 \begin{alltt}
1897 >{}>{}> u"www.Alliancefran\c caise.nu".encode("idna")
1898 'www.xn--alliancefranaise-npb.nu'
1899 \end{alltt}
1900
1901 The \module{socket} module has also been extended to transparently
1902 convert Unicode hostnames to the ACE version before passing them to
1903 the C library.  Modules that deal with hostnames such as
1904 \module{httplib} and \module{ftplib}) also support Unicode host names;
1905 \module{httplib} also sends HTTP \samp{Host} headers using the ACE
1906 version of the domain name.  \module{urllib} supports Unicode URLs
1907 with non-ASCII host names as long as the \code{path} part of the URL
1908 is ASCII only.
1909
1910 To implement this change, the \module{stringprep} module, the
1911 \code{mkstringprep} tool and the \code{punycode} encoding have been added.
1912
1913 \end{itemize}
1914
1915
1916 %======================================================================
1917 \subsection{Date/Time Type}
1918
1919 Date and time types suitable for expressing timestamps were added as
1920 the \module{datetime} module.  The types don't support different
1921 calendars or many fancy features, and just stick to the basics of
1922 representing time.
1923
1924 The three primary types are: \class{date}, representing a day, month,
1925 and year; \class{time}, consisting of hour, minute, and second; and
1926 \class{datetime}, which contains all the attributes of both
1927 \class{date} and \class{time}.  There's also a
1928 \class{timedelta} class representing differences between two points
1929 in time, and time zone logic is implemented by classes inheriting from
1930 the abstract \class{tzinfo} class.
1931
1932 You can create instances of \class{date} and \class{time} by either
1933 supplying keyword arguments to the appropriate constructor,
1934 e.g. \code{datetime.date(year=1972, month=10, day=15)}, or by using
1935 one of a number of class methods.  For example, the \method{date.today()}
1936 class method returns the current local date.
1937
1938 Once created, instances of the date/time classes are all immutable.
1939 There are a number of methods for producing formatted strings from
1940 objects:
1941
1942 \begin{verbatim}
1943 >>> import datetime
1944 >>> now = datetime.datetime.now()
1945 >>> now.isoformat()
1946 '2002-12-30T21:27:03.994956'
1947 >>> now.ctime()  # Only available on date, datetime
1948 'Mon Dec 30 21:27:03 2002'
1949 >>> now.strftime('%Y %d %b')
1950 '2002 30 Dec'
1951 \end{verbatim}
1952
1953 The \method{replace()} method allows modifying one or more fields
1954 of a \class{date} or \class{datetime} instance, returning a new instance:
1955
1956 \begin{verbatim}
1957 >>> d = datetime.datetime.now()
1958 >>> d
1959 datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
1960 >>> d.replace(year=2001, hour = 12)
1961 datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
1962 >>>
1963 \end{verbatim}
1964
1965 Instances can be compared, hashed, and converted to strings (the
1966 result is the same as that of \method{isoformat()}).  \class{date} and
1967 \class{datetime} instances can be subtracted from each other, and
1968 added to \class{timedelta} instances.  The largest missing feature is
1969 that there's no standard library support for parsing strings and getting back a
1970 \class{date} or \class{datetime}.
1971
1972 For more information, refer to the \ulink{module's reference
1973 documentation}{../lib/module-datetime.html}.
1974 (Contributed by Tim Peters.)
1975
1976
1977 %======================================================================
1978 \subsection{The optparse Module}
1979
1980 The \module{getopt} module provides simple parsing of command-line
1981 arguments.  The new \module{optparse} module (originally named Optik)
1982 provides more elaborate command-line parsing that follows the Unix
1983 conventions, automatically creates the output for \longprogramopt{help},
1984 and can perform different actions for different options.
1985
1986 You start by creating an instance of \class{OptionParser} and telling
1987 it what your program's options are.
1988
1989 \begin{verbatim}
1990 import sys
1991 from optparse import OptionParser
1992
1993 op = OptionParser()
1994 op.add_option('-i', '--input',
1995               action='store', type='string', dest='input',
1996               help='set input filename')
1997 op.add_option('-l', '--length',
1998               action='store', type='int', dest='length',
1999               help='set maximum length of output')
2000 \end{verbatim}
2001
2002 Parsing a command line is then done by calling the \method{parse_args()}
2003 method.
2004
2005 \begin{verbatim}
2006 options, args = op.parse_args(sys.argv[1:])
2007 print options
2008 print args
2009 \end{verbatim}
2010
2011 This returns an object containing all of the option values,
2012 and a list of strings containing the remaining arguments.
2013
2014 Invoking the script with the various arguments now works as you'd
2015 expect it to.  Note that the length argument is automatically
2016 converted to an integer.
2017
2018 \begin{verbatim}
2019 $ ./python opt.py -i data arg1
2020 <Values at 0x400cad4c: {'input': 'data', 'length': None}>
2021 ['arg1']
2022 $ ./python opt.py --input=data --length=4
2023 <Values at 0x400cad2c: {'input': 'data', 'length': 4}>
2024 []
2025 $
2026 \end{verbatim}
2027
2028 The help message is automatically generated for you:
2029
2030 \begin{verbatim}
2031 $ ./python opt.py --help
2032 usage: opt.py [options]
2033
2034 options:
2035   -h, --help            show this help message and exit
2036   -iINPUT, --input=INPUT
2037                         set input filename
2038   -lLENGTH, --length=LENGTH
2039                         set maximum length of output
2040 $
2041 \end{verbatim}
2042 % $ prevent Emacs tex-mode from getting confused
2043
2044 See the \ulink{module's documentation}{../lib/module-optparse.html}
2045 for more details.
2046
2047 Optik was written by Greg Ward, with suggestions from the readers of
2048 the Getopt SIG.
2049
2050
2051 %======================================================================
2052 \section{Pymalloc: A Specialized Object Allocator\label{section-pymalloc}}
2053
2054 Pymalloc, a specialized object allocator written by Vladimir
2055 Marangozov, was a feature added to Python 2.1.  Pymalloc is intended
2056 to be faster than the system \cfunction{malloc()} and to have less
2057 memory overhead for allocation patterns typical of Python programs.
2058 The allocator uses C's \cfunction{malloc()} function to get large
2059 pools of memory and then fulfills smaller memory requests from these
2060 pools.
2061
2062 In 2.1 and 2.2, pymalloc was an experimental feature and wasn't
2063 enabled by default; you had to explicitly enable it when compiling
2064 Python by providing the
2065 \longprogramopt{with-pymalloc} option to the \program{configure}
2066 script.  In 2.3, pymalloc has had further enhancements and is now
2067 enabled by default; you'll have to supply
2068 \longprogramopt{without-pymalloc} to disable it.
2069
2070 This change is transparent to code written in Python; however,
2071 pymalloc may expose bugs in C extensions.  Authors of C extension
2072 modules should test their code with pymalloc enabled,
2073 because some incorrect code may cause core dumps at runtime.
2074
2075 There's one particularly common error that causes problems.  There are
2076 a number of memory allocation functions in Python's C API that have
2077 previously just been aliases for the C library's \cfunction{malloc()}
2078 and \cfunction{free()}, meaning that if you accidentally called
2079 mismatched functions the error wouldn't be noticeable.  When the
2080 object allocator is enabled, these functions aren't aliases of
2081 \cfunction{malloc()} and \cfunction{free()} any more, and calling the
2082 wrong function to free memory may get you a core dump.  For example,
2083 if memory was allocated using \cfunction{PyObject_Malloc()}, it has to
2084 be freed using \cfunction{PyObject_Free()}, not \cfunction{free()}.  A
2085 few modules included with Python fell afoul of this and had to be
2086 fixed; doubtless there are more third-party modules that will have the
2087 same problem.
2088
2089 As part of this change, the confusing multiple interfaces for
2090 allocating memory have been consolidated down into two API families.
2091 Memory allocated with one family must not be manipulated with
2092 functions from the other family.  There is one family for allocating
2093 chunks of memory and another family of functions specifically for
2094 allocating Python objects.
2095
2096 \begin{itemize}
2097   \item To allocate and free an undistinguished chunk of memory use
2098   the ``raw memory'' family: \cfunction{PyMem_Malloc()},
2099   \cfunction{PyMem_Realloc()}, and \cfunction{PyMem_Free()}.
2100
2101   \item The ``object memory'' family is the interface to the pymalloc
2102   facility described above and is biased towards a large number of
2103   ``small'' allocations: \cfunction{PyObject_Malloc},
2104   \cfunction{PyObject_Realloc}, and \cfunction{PyObject_Free}.
2105
2106   \item To allocate and free Python objects, use the ``object'' family
2107   \cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()}, and
2108   \cfunction{PyObject_Del()}.
2109 \end{itemize}
2110
2111 Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides
2112 debugging features to catch memory overwrites and doubled frees in
2113 both extension modules and in the interpreter itself.  To enable this
2114 support, compile a debugging version of the Python interpreter by
2115 running \program{configure} with \longprogramopt{with-pydebug}.
2116
2117 To aid extension writers, a header file \file{Misc/pymemcompat.h} is
2118 distributed with the source to Python 2.3 that allows Python
2119 extensions to use the 2.3 interfaces to memory allocation while
2120 compiling against any version of Python since 1.5.2.  You would copy
2121 the file from Python's source distribution and bundle it with the
2122 source of your extension.
2123
2124 \begin{seealso}
2125
2126 \seeurl{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/obmalloc.c}
2127 {For the full details of the pymalloc implementation, see
2128 the comments at the top of the file \file{Objects/obmalloc.c} in the
2129 Python source code.  The above link points to the file within the
2130 SourceForge CVS browser.}
2131
2132 \end{seealso}
2133
2134
2135 % ======================================================================
2136 \section{Build and C API Changes}
2137
2138 Changes to Python's build process and to the C API include:
2139
2140 \begin{itemize}
2141
2142 \item The cycle detection implementation used by the garbage collection
2143 has proven to be stable, so it's now been made mandatory.  You can no
2144 longer compile Python without it, and the
2145 \longprogramopt{with-cycle-gc} switch to \program{configure} has been removed.
2146
2147 \item Python can now optionally be built as a shared library
2148 (\file{libpython2.3.so}) by supplying \longprogramopt{enable-shared}
2149 when running Python's \program{configure} script.  (Contributed by Ondrej
2150 Palkovsky.)
2151
2152 \item The \csimplemacro{DL_EXPORT} and \csimplemacro{DL_IMPORT} macros
2153 are now deprecated.  Initialization functions for Python extension
2154 modules should now be declared using the new macro
2155 \csimplemacro{PyMODINIT_FUNC}, while the Python core will generally
2156 use the \csimplemacro{PyAPI_FUNC} and \csimplemacro{PyAPI_DATA}
2157 macros.
2158
2159 \item The interpreter can be compiled without any docstrings for
2160 the built-in functions and modules by supplying
2161 \longprogramopt{without-doc-strings} to the \program{configure} script.
2162 This makes the Python executable about 10\% smaller, but will also
2163 mean that you can't get help for Python's built-ins.  (Contributed by
2164 Gustavo Niemeyer.)
2165
2166 \item The \cfunction{PyArg_NoArgs()} macro is now deprecated, and code
2167 that uses it should be changed.  For Python 2.2 and later, the method
2168 definition table can specify the
2169 \constant{METH_NOARGS} flag, signalling that there are no arguments, and
2170 the argument checking can then be removed.  If compatibility with
2171 pre-2.2 versions of Python is important, the code could use
2172 \code{PyArg_ParseTuple(\var{args}, "")} instead, but this will be slower
2173 than using \constant{METH_NOARGS}.
2174
2175 \item \cfunction{PyArg_ParseTuple()} accepts new format characters for various sizes of unsigned integers: \samp{B} for \ctype{unsigned char},
2176 \samp{H} for \ctype{unsigned short int},
2177 \samp{I} for \ctype{unsigned int},
2178 and \samp{K} for \ctype{unsigned long long}.
2179
2180 \item A new function, \cfunction{PyObject_DelItemString(\var{mapping},
2181 char *\var{key})} was added as shorthand for
2182 \code{PyObject_DelItem(\var{mapping}, PyString_New(\var{key}))}.
2183
2184 \item File objects now manage their internal string buffer
2185 differently, increasing it exponentially when needed.  This results in
2186 the benchmark tests in \file{Lib/test/test_bufio.py} speeding up
2187 considerably (from 57 seconds to 1.7 seconds, according to one
2188 measurement).
2189
2190 \item It's now possible to define class and static methods for a C
2191 extension type by setting either the \constant{METH_CLASS} or
2192 \constant{METH_STATIC} flags in a method's \ctype{PyMethodDef}
2193 structure.
2194
2195 \item Python now includes a copy of the Expat XML parser's source code,
2196 removing any dependence on a system version or local installation of
2197 Expat.
2198
2199 \item If you dynamically allocate type objects in your extension, you
2200 should be aware of a change in the rules relating to the
2201 \member{__module__} and \member{__name__} attributes.  In summary,
2202 you will want to ensure the type's dictionary contains a
2203 \code{'__module__'} key; making the module name the part of the type
2204 name leading up to the final period will no longer have the desired
2205 effect.  For more detail, read the API reference documentation or the
2206 source.
2207
2208 \end{itemize}
2209
2210
2211 %======================================================================
2212 \subsection{Port-Specific Changes}
2213
2214 Support for a port to IBM's OS/2 using the EMX runtime environment was
2215 merged into the main Python source tree.  EMX is a POSIX emulation
2216 layer over the OS/2 system APIs.  The Python port for EMX tries to
2217 support all the POSIX-like capability exposed by the EMX runtime, and
2218 mostly succeeds; \function{fork()} and \function{fcntl()} are
2219 restricted by the limitations of the underlying emulation layer.  The
2220 standard OS/2 port, which uses IBM's Visual Age compiler, also gained
2221 support for case-sensitive import semantics as part of the integration
2222 of the EMX port into CVS.  (Contributed by Andrew MacIntyre.)
2223
2224 On MacOS, most toolbox modules have been weaklinked to improve
2225 backward compatibility.  This means that modules will no longer fail
2226 to load if a single routine is missing on the current OS version.
2227 Instead calling the missing routine will raise an exception.
2228 (Contributed by Jack Jansen.)
2229
2230 The RPM spec files, found in the \file{Misc/RPM/} directory in the
2231 Python source distribution, were updated for 2.3.  (Contributed by
2232 Sean Reifschneider.)
2233
2234 Other new platforms now supported by Python include AtheOS
2235 (\url{http://www.atheos.cx/}), GNU/Hurd, and OpenVMS.
2236
2237
2238 %======================================================================
2239 \section{Other Changes and Fixes \label{section-other}}
2240
2241 As usual, there were a bunch of other improvements and bugfixes
2242 scattered throughout the source tree.  A search through the CVS change
2243 logs finds there were 523 patches applied and 514 bugs fixed between
2244 Python 2.2 and 2.3.  Both figures are likely to be underestimates.
2245
2246 Some of the more notable changes are:
2247
2248 \begin{itemize}
2249
2250 \item If the \envvar{PYTHONINSPECT} environment variable is set, the
2251 Python interpreter will enter the interactive prompt after running a
2252 Python program, as if Python had been invoked with the \programopt{-i}
2253 option. The environment variable can be set before running the Python
2254 interpreter, or it can be set by the Python program as part of its
2255 execution.
2256
2257 \item The \file{regrtest.py} script now provides a way to allow ``all
2258 resources except \var{foo}.''  A resource name passed to the
2259 \programopt{-u} option can now be prefixed with a hyphen
2260 (\character{-}) to mean ``remove this resource.''  For example, the
2261 option `\code{\programopt{-u}all,-bsddb}' could be used to enable the
2262 use of all resources except \code{bsddb}.
2263
2264 \item The tools used to build the documentation now work under Cygwin
2265 as well as \UNIX.
2266
2267 \item The \code{SET_LINENO} opcode has been removed.  Back in the
2268 mists of time, this opcode was needed to produce line numbers in
2269 tracebacks and support trace functions (for, e.g., \module{pdb}).
2270 Since Python 1.5, the line numbers in tracebacks have been computed
2271 using a different mechanism that works with ``python -O''.  For Python
2272 2.3 Michael Hudson implemented a similar scheme to determine when to
2273 call the trace function, removing the need for \code{SET_LINENO}
2274 entirely.
2275
2276 It would be difficult to detect any resulting difference from Python
2277 code, apart from a slight speed up when Python is run without
2278 \programopt{-O}.
2279
2280 C extensions that access the \member{f_lineno} field of frame objects
2281 should instead call \code{PyCode_Addr2Line(f->f_code, f->f_lasti)}.
2282 This will have the added effect of making the code work as desired
2283 under ``python -O'' in earlier versions of Python.
2284
2285 A nifty new feature is that trace functions can now assign to the
2286 \member{f_lineno} attribute of frame objects, changing the line that
2287 will be executed next.  A \samp{jump} command has been added to the
2288 \module{pdb} debugger taking advantage of this new feature.
2289 (Implemented by Richie Hindle.)
2290
2291 \end{itemize}
2292
2293
2294 %======================================================================
2295 \section{Porting to Python 2.3}
2296
2297 This section lists previously described changes that may require
2298 changes to your code:
2299
2300 \begin{itemize}
2301
2302 \item \keyword{yield} is now always a keyword; if it's used as a
2303 variable name in your code, a different name must be chosen.
2304
2305 \item For strings \var{X} and \var{Y}, \code{\var{X} in \var{Y}} now works
2306 if \var{X} is more than one character long.
2307
2308 \item The \function{int()} type constructor will now return a long
2309 integer instead of raising an \exception{OverflowError} when a string
2310 or floating-point number is too large to fit into an integer.
2311
2312 \item If you have Unicode strings that contain 8-bit characters, you
2313 must declare the file's encoding (UTF-8, Latin-1, or whatever) by
2314 adding a comment to the top of the file.  See
2315 section~\ref{section-encodings} for more information.
2316
2317 \item Calling Tcl methods through \module{_tkinter} no longer
2318 returns only strings. Instead, if Tcl returns other objects those
2319 objects are converted to their Python equivalent, if one exists, or
2320 wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent
2321 exists.
2322
2323 \item Large octal and hex literals such as
2324 \code{0xffffffff} now trigger a \exception{FutureWarning}. Currently
2325 they're stored as 32-bit numbers and result in a negative value, but
2326 in Python 2.4 they'll become positive long integers.
2327
2328 % The empty groups below prevent conversion to guillemets.
2329 There are a few ways to fix this warning.  If you really need a
2330 positive number, just add an \samp{L} to the end of the literal.  If
2331 you're trying to get a 32-bit integer with low bits set and have
2332 previously used an expression such as \code{\textasciitilde(1 <{}< 31)},
2333 it's probably
2334 clearest to start with all bits set and clear the desired upper bits.
2335 For example, to clear just the top bit (bit 31), you could write
2336 \code{0xffffffffL {\&}{\textasciitilde}(1L<{}<31)}.
2337
2338 \item You can no longer disable assertions by assigning to \code{__debug__}.
2339
2340 \item The Distutils \function{setup()} function has gained various new
2341 keyword arguments such as \var{depends}.  Old versions of the
2342 Distutils will abort if passed unknown keywords.  A solution is to check
2343 for the presence of the new \function{get_distutil_options()} function
2344 in your \file{setup.py} and only uses the new keywords
2345 with a version of the Distutils that supports them:
2346
2347 \begin{verbatim}
2348 from distutils import core
2349
2350 kw = {'sources': 'foo.c', ...}
2351 if hasattr(core, 'get_distutil_options'):
2352     kw['depends'] = ['foo.h']
2353 ext = Extension(**kw)
2354 \end{verbatim}
2355
2356 \item Using \code{None} as a variable name will now result in a
2357 \exception{SyntaxWarning} warning.
2358
2359 \item Names of extension types defined by the modules included with
2360 Python now contain the module and a \character{.} in front of the type
2361 name.
2362
2363 \end{itemize}
2364
2365
2366 %======================================================================
2367 \section{Acknowledgements \label{acks}}
2368
2369 The author would like to thank the following people for offering
2370 suggestions, corrections and assistance with various drafts of this
2371 article: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside,
2372 Andrew Dalke, Scott David Daniels, Fred~L. Drake, Jr., David Fraser,
2373 Kelly Gerber,
2374 Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert,
2375 Martin von~L\"owis, Andrew MacIntyre, Lalo Martins, Chad Netzer,
2376 Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco
2377 Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler,
2378 Just van~Rossum.
2379
2380 \end{document}