Doc/whatsnew/whatsnew25.tex

   1 \documentclass{howto}
   2 \usepackage{distutils}
   3 % $Id$
   4
   5 % Fix XXX comments
   6 % The easy_install stuff
   7 % Stateful codec changes
   8 % Count up the patches and bugs
   9
  10 \title{What's New in Python 2.5}
  11 \release{0.1}
  12 \author{A.M. Kuchling}
  13 \authoraddress{\email{amk@amk.ca}}
  14
  15 \begin{document}
  16 \maketitle
  17 \tableofcontents
  18
  19 This article explains the new features in Python 2.5.  No release date
  20 for Python 2.5 has been set; it will probably be released in the
  21 autumn of 2006.  \pep{356} describes the planned release schedule.
  22
  23 (This is still an early draft, and some sections are still skeletal or
  24 completely missing.  Comments on the present material will still be
  25 welcomed.)
  26
  27 % XXX Compare with previous release in 2 - 3 sentences here.
  28
  29 This article doesn't attempt to provide a complete specification of
  30 the new features, but instead provides a convenient overview.  For
  31 full details, you should refer to the documentation for Python 2.5.
  32 % XXX add hyperlink when the documentation becomes available online.
  33 If you want to understand the complete implementation and design
  34 rationale, refer to the PEP for a particular new feature.
  35
  36
  37 %======================================================================
  38 \section{PEP 243: Uploading Modules to PyPI}
  39
  40 PEP 243 describes an HTTP-based protocol for submitting software
  41 packages to a central archive.  The Python package index at
  42 \url{http://cheeseshop.python.org} now supports package uploads, and
  43 the new \command{upload} Distutils command will upload a package to the
  44 repository.
  45
  46 Before a package can be uploaded, you must be able to build a
  47 distribution using the \command{sdist} Distutils command.  Once that
  48 works, you can run \code{python setup.py upload} to add your package
  49 to the PyPI archive.  Optionally you can GPG-sign the package by
  50 supplying the \programopt{--sign} and
  51 \programopt{--identity} options.
  52
  53 \begin{seealso}
  54
  55 \seepep{243}{Module Repository Upload Mechanism}{PEP written by
  56 Sean Reifschneider; implemented by Martin von~L\"owis
  57 and Richard Jones.  Note that the PEP doesn't exactly
  58 describe what's implemented in PyPI.}
  59
  60 \end{seealso}
  61
  62
  63 %======================================================================
  64 \section{PEP 308: Conditional Expressions}
  65
  66 For a long time, people have been requesting a way to write
  67 conditional expressions, expressions that return value A or value B
  68 depending on whether a Boolean value is true or false.  A conditional
  69 expression lets you write a single assignment statement that has the
  70 same effect as the following:
  71
  72 \begin{verbatim}
  73 if condition:
  74     x = true_value
  75 else:
  76     x = false_value
  77 \end{verbatim}
  78
  79 There have been endless tedious discussions of syntax on both
  80 python-dev and comp.lang.python.  A vote was even held that found the
  81 majority of voters wanted conditional expressions in some form,
  82 but there was no syntax that was preferred by a clear majority.
  83 Candidates included C's \code{cond ? true_v : false_v},
  84 \code{if cond then true_v else false_v}, and 16 other variations.
  85
  86 GvR eventually chose a surprising syntax:
  87
  88 \begin{verbatim}
  89 x = true_value if condition else false_value
  90 \end{verbatim}
  91
  92 Evaluation is still lazy as in existing Boolean expressions, so the
  93 order of evaluation jumps around a bit.  The \var{condition}
  94 expression in the middle is evaluated first, and the \var{true_value}
  95 expression is evaluated only if the condition was true.  Similarly,
  96 the \var{false_value} expression is only evaluated when the condition
  97 is false.
  98
  99 This syntax may seem strange and backwards; why does the condition go
 100 in the \emph{middle} of the expression, and not in the front as in C's
 101 \code{c ? x : y}?  The decision was checked by applying the new syntax
 102 to the modules in the standard library and seeing how the resulting
 103 code read.  In many cases where a conditional expression is used, one
 104 value seems to be the 'common case' and one value is an 'exceptional
 105 case', used only on rarer occasions when the condition isn't met.  The
 106 conditional syntax makes this pattern a bit more obvious:
 107
 108 \begin{verbatim}
 109 contents = ((doc + '\n') if doc else '')
 110 \end{verbatim}
 111
 112 I read the above statement as meaning ``here \var{contents} is
 113 usually assigned a value of \code{doc+'\e n'}; sometimes
 114 \var{doc} is empty, in which special case an empty string is returned.''
 115 I doubt I will use conditional expressions very often where there
 116 isn't a clear common and uncommon case.
 117
 118 There was some discussion of whether the language should require
 119 surrounding conditional expressions with parentheses.  The decision
 120 was made to \emph{not} require parentheses in the Python language's
 121 grammar, but as a matter of style I think you should always use them.
 122 Consider these two statements:
 123
 124 \begin{verbatim}
 125 # First version -- no parens
 126 level = 1 if logging else 0
 127
 128 # Second version -- with parens
 129 level = (1 if logging else 0)
 130 \end{verbatim}
 131
 132 In the first version, I think a reader's eye might group the statement
 133 into 'level = 1', 'if logging', 'else 0', and think that the condition
 134 decides whether the assignment to \var{level} is performed.  The
 135 second version reads better, in my opinion, because it makes it clear
 136 that the assignment is always performed and the choice is being made
 137 between two values.
 138
 139 Another reason for including the brackets: a few odd combinations of
 140 list comprehensions and lambdas could look like incorrect conditional
 141 expressions. See \pep{308} for some examples.  If you put parentheses
 142 around your conditional expressions, you won't run into this case.
 143
 144
 145 \begin{seealso}
 146
 147 \seepep{308}{Conditional Expressions}{PEP written by
 148 Guido van Rossum and Raymond D. Hettinger; implemented by Thomas
 149 Wouters.}
 150
 151 \end{seealso}
 152
 153
 154 %======================================================================
 155 \section{PEP 309: Partial Function Application}
 156
 157 The \module{functional} module is intended to contain tools for
 158 functional-style programming.  Currently it only contains a
 159 \class{partial()} function, but new functions will probably be added
 160 in future versions of Python.
 161
 162 For programs written in a functional style, it can be useful to
 163 construct variants of existing functions that have some of the
 164 parameters filled in.  Consider a Python function \code{f(a, b, c)};
 165 you could create a new function \code{g(b, c)} that was equivalent to
 166 \code{f(1, b, c)}.  This is called ``partial function application'',
 167 and is provided by the \class{partial} class in the new
 168 \module{functional} module.
 169
 170 The constructor for \class{partial} takes the arguments
 171 \code{(\var{function}, \var{arg1}, \var{arg2}, ...
 172 \var{kwarg1}=\var{value1}, \var{kwarg2}=\var{value2})}.  The resulting
 173 object is callable, so you can just call it to invoke \var{function}
 174 with the filled-in arguments.
 175
 176 Here's a small but realistic example:
 177
 178 \begin{verbatim}
 179 import functional
 180
 181 def log (message, subsystem):
 182     "Write the contents of 'message' to the specified subsystem."
 183     print '%s: %s' % (subsystem, message)
 184     ...
 185
 186 server_log = functional.partial(log, subsystem='server')
 187 server_log('Unable to open socket')
 188 \end{verbatim}
 189
 190 Here's another example, from a program that uses PyGTk.  Here a
 191 context-sensitive pop-up menu is being constructed dynamically.  The
 192 callback provided for the menu option is a partially applied version
 193 of the \method{open_item()} method, where the first argument has been
 194 provided.
 195
 196 \begin{verbatim}
 197 ...
 198 class Application:
 199     def open_item(self, path):
 200        ...
 201     def init (self):
 202         open_func = functional.partial(self.open_item, item_path)
 203         popup_menu.append( ("Open", open_func, 1) )
 204 \end{verbatim}
 205
 206
 207 \begin{seealso}
 208
 209 \seepep{309}{Partial Function Application}{PEP proposed and written by
 210 Peter Harris; implemented by Hye-Shik Chang, with adaptations by
 211 Raymond Hettinger.}
 212
 213 \end{seealso}
 214
 215
 216 %======================================================================
 217 \section{PEP 314: Metadata for Python Software Packages v1.1}
 218
 219 Some simple dependency support was added to Distutils.  The
 220 \function{setup()} function now has \code{requires}, \code{provides},
 221 and \code{obsoletes} keyword parameters.  When you build a source
 222 distribution using the \code{sdist} command, the dependency
 223 information will be recorded in the \file{PKG-INFO} file.
 224
 225 Another new keyword parameter is \code{download_url}, which should be
 226 set to a URL for the package's source code.  This means it's now
 227 possible to look up an entry in the package index, determine the
 228 dependencies for a package, and download the required packages.
 229
 230 % XXX put example here
 231
 232 \begin{seealso}
 233
 234 \seepep{314}{Metadata for Python Software Packages v1.1}{PEP proposed
 235 and written by A.M. Kuchling, Richard Jones, and Fred Drake;
 236 implemented by Richard Jones and Fred Drake.}
 237
 238 \end{seealso}
 239
 240
 241 %======================================================================
 242 \section{PEP 328: Absolute and Relative Imports}
 243
 244 The simpler part of PEP 328 was implemented in Python 2.4: parentheses
 245 could now be used to enclose the names imported from a module using
 246 the \code{from ... import ...} statement, making it easier to import
 247 many different names.
 248
 249 The more complicated part has been implemented in Python 2.5:
 250 importing a module can be specified to use absolute or
 251 package-relative imports.  The plan is to move toward making absolute
 252 imports the default in future versions of Python.
 253
 254 Let's say you have a package directory like this:
 255 \begin{verbatim}
 256 pkg/
 257 pkg/__init__.py
 258 pkg/main.py
 259 pkg/string.py
 260 \end{verbatim}
 261
 262 This defines a package named \module{pkg} containing the
 263 \module{pkg.main} and \module{pkg.string} submodules.
 264
 265 Consider the code in the \file{main.py} module.  What happens if it
 266 executes the statement \code{import string}?  In Python 2.4 and
 267 earlier, it will first look in the package's directory to perform a
 268 relative import, finds \file{pkg/string.py}, imports the contents of
 269 that file as the \module{pkg.string} module, and that module is bound
 270 to the name \samp{string} in the \module{pkg.main} module's namespace.
 271
 272 That's fine if \module{pkg.string} was what you wanted.  But what if
 273 you wanted Python's standard \module{string} module?  There's no clean
 274 way to ignore \module{pkg.string} and look for the standard module;
 275 generally you had to look at the contents of \code{sys.modules}, which
 276 is slightly unclean.
 277 Holger Krekel's \module{py.std} package provides a tidier way to perform
 278 imports from the standard library, \code{import py ; py.std.string.join()},
 279 but that package isn't available on all Python installations.
 280
 281 Reading code which relies on relative imports is also less clear,
 282 because a reader may be confused about which module, \module{string}
 283 or \module{pkg.string}, is intended to be used.  Python users soon
 284 learned not to duplicate the names of standard library modules in the
 285 names of their packages' submodules, but you can't protect against
 286 having your submodule's name being used for a new module added in a
 287 future version of Python.
 288
 289 In Python 2.5, you can switch \keyword{import}'s behaviour to
 290 absolute imports using a \code{from __future__ import absolute_import}
 291 directive.  This absolute-import behaviour will become the default in
 292 a future version (probably Python 2.7).  Once absolute imports
 293 are the default, \code{import string} will
 294 always find the standard library's version.
 295 It's suggested that users should begin using absolute imports as much
 296 as possible, so it's preferable to begin writing \code{from pkg import
 297 string} in your code.
 298
 299 Relative imports are still possible by adding a leading period
 300 to the module name when using the \code{from ... import} form:
 301
 302 \begin{verbatim}
 303 # Import names from pkg.string
 304 from .string import name1, name2
 305 # Import pkg.string
 306 from . import string
 307 \end{verbatim}
 308
 309 This imports the \module{string} module relative to the current
 310 package, so in \module{pkg.main} this will import \var{name1} and
 311 \var{name2} from \module{pkg.string}.  Additional leading periods
 312 perform the relative import starting from the parent of the current
 313 package.  For example, code in the \module{A.B.C} module can do:
 314
 315 \begin{verbatim}
 316 from . import D                 # Imports A.B.D
 317 from .. import E                # Imports A.E
 318 from ..F import G               # Imports A.F.G
 319 \end{verbatim}
 320
 321 Leading periods cannot be used with the \code{import \var{modname}}
 322 form of the import statement, only the \code{from ... import} form.
 323
 324 \begin{seealso}
 325
 326 \seepep{328}{Imports: Multi-Line and Absolute/Relative}
 327 {PEP written by Aahz; implemented by Thomas Wouters.}
 328
 329 \seeurl{http://codespeak.net/py/current/doc/index.html}
 330 {The py library by Holger Krekel, which contains the \module{py.std} package.}
 331
 332 \end{seealso}
 333
 334
 335 %======================================================================
 336 \section{PEP 338: Executing Modules as Scripts}
 337
 338 The \programopt{-m} switch added in Python 2.4 to execute a module as
 339 a script gained a few more abilities.  Instead of being implemented in
 340 C code inside the Python interpreter, the switch now uses an
 341 implementation in a new module, \module{runpy}.
 342
 343 The \module{runpy} module implements a more sophisticated import
 344 mechanism so that it's now possible to run modules in a package such
 345 as \module{pychecker.checker}.  The module also supports alternative
 346 import mechanisms such as the \module{zipimport} module.  (This means
 347 you can add a .zip archive's path to \code{sys.path} and then use the
 348 \programopt{-m} switch to execute code from the archive.
 349
 350
 351 \begin{seealso}
 352
 353 \seepep{338}{Executing modules as scripts}{PEP written and
 354 implemented by Nick Coghlan.}
 355
 356 \end{seealso}
 357
 358
 359 %======================================================================
 360 \section{PEP 341: Unified try/except/finally}
 361
 362 Until Python 2.5, the \keyword{try} statement came in two
 363 flavours. You could use a \keyword{finally} block to ensure that code
 364 is always executed, or a number of \keyword{except} blocks to catch an
 365 exception.  You couldn't combine both \keyword{except} blocks and a
 366 \keyword{finally} block, because generating the right bytecode for the
 367 combined version was complicated and it wasn't clear what the
 368 semantics of the combined should be.
 369
 370 GvR spent some time working with Java, which does support the
 371 equivalent of combining \keyword{except} blocks and a
 372 \keyword{finally} block, and this clarified what the statement should
 373 mean.  In Python 2.5, you can now write:
 374
 375 \begin{verbatim}
 376 try:
 377     block-1 ...
 378 except Exception1:
 379     handler-1 ...
 380 except Exception2:
 381     handler-2 ...
 382 else:
 383     else-block
 384 finally:
 385     final-block
 386 \end{verbatim}
 387
 388 The code in \var{block-1} is executed.  If the code raises an
 389 exception, the handlers are tried in order: \var{handler-1},
 390 \var{handler-2}, ...  If no exception is raised, the \var{else-block}
 391 is executed.  No matter what happened previously, the
 392 \var{final-block} is executed once the code block is complete and any
 393 raised exceptions handled.  Even if there's an error in an exception
 394 handler or the \var{else-block} and a new exception is raised, the
 395 \var{final-block} is still executed.
 396
 397 \begin{seealso}
 398
 399 \seepep{341}{Unifying try-except and try-finally}{PEP written by Georg Brandl;
 400 implementation by Thomas Lee.}
 401
 402 \end{seealso}
 403
 404
 405 %======================================================================
 406 \section{PEP 342: New Generator Features}
 407
 408 Python 2.5 adds a simple way to pass values \emph{into} a generator.
 409 As introduced in Python 2.3, generators only produce output; once a
 410 generator's code is invoked to create an iterator, there's no way to
 411 pass any new information into the function when its execution is
 412 resumed.  Sometimes the ability to pass in some information would be
 413 useful.  Hackish solutions to this include making the generator's code
 414 look at a global variable and then changing the global variable's
 415 value, or passing in some mutable object that callers then modify.
 416
 417 To refresh your memory of basic generators, here's a simple example:
 418
 419 \begin{verbatim}
 420 def counter (maximum):
 421     i = 0
 422     while i < maximum:
 423         yield i
 424         i += 1
 425 \end{verbatim}
 426
 427 When you call \code{counter(10)}, the result is an iterator that
 428 returns the values from 0 up to 9.  On encountering the
 429 \keyword{yield} statement, the iterator returns the provided value and
 430 suspends the function's execution, preserving the local variables.
 431 Execution resumes on the following call to the iterator's
 432 \method{next()} method, picking up after the \keyword{yield} statement.
 433
 434 In Python 2.3, \keyword{yield} was a statement; it didn't return any
 435 value.  In 2.5, \keyword{yield} is now an expression, returning a
 436 value that can be assigned to a variable or otherwise operated on:
 437
 438 \begin{verbatim}
 439 val = (yield i)
 440 \end{verbatim}
 441
 442 I recommend that you always put parentheses around a \keyword{yield}
 443 expression when you're doing something with the returned value, as in
 444 the above example.  The parentheses aren't always necessary, but it's
 445 easier to always add them instead of having to remember when they're
 446 needed.\footnote{The exact rules are that a \keyword{yield}-expression must
 447 always be parenthesized except when it occurs at the top-level
 448 expression on the right-hand side of an assignment, meaning you can
 449 write \code{val = yield i} but have to use parentheses when there's an
 450 operation, as in \code{val = (yield i) + 12}.}
 451
 452 Values are sent into a generator by calling its
 453 \method{send(\var{value})} method.  The generator's code is then
 454 resumed and the \keyword{yield} expression returns the specified
 455 \var{value}.  If the regular \method{next()} method is called, the
 456 \keyword{yield} returns \constant{None}.
 457
 458 Here's the previous example, modified to allow changing the value of
 459 the internal counter.
 460
 461 \begin{verbatim}
 462 def counter (maximum):
 463     i = 0
 464     while i < maximum:
 465         val = (yield i)
 466         # If value provided, change counter
 467         if val is not None:
 468             i = val
 469         else:
 470             i += 1
 471 \end{verbatim}
 472
 473 And here's an example of changing the counter:
 474
 475 \begin{verbatim}
 476 >>> it = counter(10)
 477 >>> print it.next()
 478 0
 479 >>> print it.next()
 480 1
 481 >>> print it.send(8)
 482 8
 483 >>> print it.next()
 484 9
 485 >>> print it.next()
 486 Traceback (most recent call last):
 487   File ``t.py'', line 15, in ?
 488     print it.next()
 489 StopIteration
 490 \end{verbatim}
 491
 492 Because \keyword{yield} will often be returning \constant{None}, you
 493 should always check for this case.  Don't just use its value in
 494 expressions unless you're sure that the \method{send()} method
 495 will be the only method used resume your generator function.
 496
 497 In addition to \method{send()}, there are two other new methods on
 498 generators:
 499
 500 \begin{itemize}
 501
 502   \item \method{throw(\var{type}, \var{value}=None,
 503   \var{traceback}=None)} is used to raise an exception inside the
 504   generator; the exception is raised by the \keyword{yield} expression
 505   where the generator's execution is paused.
 506
 507   \item \method{close()} raises a new \exception{GeneratorExit}
 508   exception inside the generator to terminate the iteration.
 509   On receiving this
 510   exception, the generator's code must either raise
 511   \exception{GeneratorExit} or \exception{StopIteration}; catching the
 512   exception and doing anything else is illegal and will trigger
 513   a \exception{RuntimeError}.  \method{close()} will also be called by
 514   Python's garbage collection when the generator is garbage-collected.
 515
 516   If you need to run cleanup code in case of a \exception{GeneratorExit},
 517   I suggest using a \code{try: ... finally:} suite instead of
 518   catching \exception{GeneratorExit}.
 519
 520 \end{itemize}
 521
 522 The cumulative effect of these changes is to turn generators from
 523 one-way producers of information into both producers and consumers.
 524
 525 Generators also become \emph{coroutines}, a more generalized form of
 526 subroutines.  Subroutines are entered at one point and exited at
 527 another point (the top of the function, and a \keyword{return
 528 statement}), but coroutines can be entered, exited, and resumed at
 529 many different points (the \keyword{yield} statements).  We'll have to
 530 figure out patterns for using coroutines effectively in Python.
 531
 532 The addition of the \method{close()} method has one side effect that
 533 isn't obvious.  \method{close()} is called when a generator is
 534 garbage-collected, so this means the generator's code gets one last
 535 chance to run before the generator is destroyed, and this last chance
 536 means that \code{try...finally} statements in generators can now be
 537 guaranteed to work; the \keyword{finally} clause will now always get a
 538 chance to run.  The syntactic restriction that you couldn't mix
 539 \keyword{yield} statements with a \code{try...finally} suite has
 540 therefore been removed.  This seems like a minor bit of language
 541 trivia, but using generators and \code{try...finally} is actually
 542 necessary in order to implement the  \keyword{with} statement
 543 described by PEP 343.  We'll look at this new statement in the following
 544 section.
 545
 546 \begin{seealso}
 547
 548 \seepep{342}{Coroutines via Enhanced Generators}{PEP written by
 549 Guido van Rossum and Phillip J. Eby;
 550 implemented by Phillip J. Eby.  Includes examples of
 551 some fancier uses of generators as coroutines.}
 552
 553 \seeurl{http://en.wikipedia.org/wiki/Coroutine}{The Wikipedia entry for
 554 coroutines.}
 555
 556 \seeurl{http://www.sidhe.org/\~{}dan/blog/archives/000178.html}{An
 557 explanation of coroutines from a Perl point of view, written by Dan
 558 Sugalski.}
 559
 560 \end{seealso}
 561
 562
 563 %======================================================================
 564 \section{PEP 343: The 'with' statement}
 565
 566 The \keyword{with} statement allows a clearer
 567 version of code that uses \code{try...finally} blocks
 568
 569 First, I'll discuss the statement as it will commonly be used, and
 570 then I'll discuss the detailed implementation and how to write objects
 571 (called ``context managers'') that can be used with this statement.
 572 Most people, who will only use \keyword{with} in company with an
 573 existing object, don't need to know these details and can
 574 just use objects that are documented to work as context managers.
 575 Authors of new context managers will need to understand the details of
 576 the underlying implementation.
 577
 578 The \keyword{with} statement is a new control-flow structure whose
 579 basic structure is:
 580
 581 \begin{verbatim}
 582 with expression as variable:
 583     with-block
 584 \end{verbatim}
 585
 586 The expression is evaluated, and it should result in a type of object
 587 that's called a context manager.  The context manager can return a
 588 value that will be bound to the name \var{variable}.  (Note carefully:
 589 \var{variable} is \emph{not} assigned the result of \var{expression}.
 590 One method of the context manager is run before \var{with-block} is
 591 executed, and another method is run after the block is done, even if
 592 the block raised an exception.
 593
 594 To enable the statement in Python 2.5, you need
 595 to add the following directive to your module:
 596
 597 \begin{verbatim}
 598 from __future__ import with_statement
 599 \end{verbatim}
 600
 601 Some standard Python objects can now behave as context managers.  For
 602 example, file objects:
 603
 604 \begin{verbatim}
 605 with open('/etc/passwd', 'r') as f:
 606     for line in f:
 607         print line
 608
 609 # f has been automatically closed at this point.
 610 \end{verbatim}
 611
 612 The \module{threading} module's locks and condition variables
 613 also support the \keyword{with} statement:
 614
 615 \begin{verbatim}
 616 lock = threading.Lock()
 617 with lock:
 618     # Critical section of code
 619     ...
 620 \end{verbatim}
 621
 622 The lock is acquired before the block is executed, and released once
 623 the block is complete.
 624
 625 The \module{decimal} module's contexts, which encapsulate the desired
 626 precision and rounding characteristics for computations, can also be
 627 used as context managers.
 628
 629 \begin{verbatim}
 630 import decimal
 631
 632 v1 = decimal.Decimal('578')
 633
 634 # Displays with default precision of 28 digits
 635 print v1.sqrt()
 636
 637 with decimal.Context(prec=16):
 638     # All code in this block uses a precision of 16 digits.
 639     # The original context is restored on exiting the block.
 640     print v1.sqrt()
 641 \end{verbatim}
 642
 643 \subsection{Writing Context Managers}
 644
 645 % XXX write this
 646
 647 This section still needs to be written.
 648
 649 The new \module{contextlib} module provides some functions and a
 650 decorator that are useful for writing context managers.
 651 Future versions will go into more detail.
 652
 653 % XXX describe further
 654
 655 \begin{seealso}
 656
 657 \seepep{343}{The ``with'' statement}{PEP written by
 658 Guido van Rossum and Nick Coghlan. }
 659
 660 \end{seealso}
 661
 662
 663 %======================================================================
 664 \section{PEP 352: Exceptions as New-Style Classes}
 665
 666 Exception classes can now be new-style classes, not just classic
 667 classes, and the built-in \exception{Exception} class and all the
 668 standard built-in exceptions (\exception{NameError},
 669 \exception{ValueError}, etc.) are now new-style classes.
 670
 671 The inheritance hierarchy for exceptions has been rearranged a bit.
 672 In 2.5, the inheritance relationships are:
 673
 674 \begin{verbatim}
 675 BaseException       # New in Python 2.5
 676 |- KeyboardInterrupt
 677 |- SystemExit
 678 |- Exception
 679    |- (all other current built-in exceptions)
 680 \end{verbatim}
 681
 682 This rearrangement was done because people often want to catch all
 683 exceptions that indicate program errors.  \exception{KeyboardInterrupt} and
 684 \exception{SystemExit} aren't errors, though, and usually represent an explicit
 685 action such as the user hitting Control-C or code calling
 686 \function{sys.exit()}.  A bare \code{except:} will catch all exceptions,
 687 so you commonly need to list \exception{KeyboardInterrupt} and
 688 \exception{SystemExit} in order to re-raise them.  The usual pattern is:
 689
 690 \begin{verbatim}
 691 try:
 692     ...
 693 except (KeyboardInterrupt, SystemExit):
 694     raise
 695 except:
 696     # Log error...
 697     # Continue running program...
 698 \end{verbatim}
 699
 700 In Python 2.5, you can now write \code{except Exception} to achieve
 701 the same result, catching all the exceptions that usually indicate errors
 702 but leaving \exception{KeyboardInterrupt} and
 703 \exception{SystemExit} alone.  As in previous versions,
 704 a bare \code{except:} still catches all exceptions.
 705
 706 The goal for Python 3.0 is to require any class raised as an exception
 707 to derive from \exception{BaseException} or some descendant of
 708 \exception{BaseException}, and future releases in the
 709 Python 2.x series may begin to enforce this constraint.  Therefore, I
 710 suggest you begin making all your exception classes derive from
 711 \exception{Exception} now.  It's been suggested that the bare
 712 \code{except:} form should be removed in Python 3.0, but Guido van~Rossum
 713 hasn't decided whether to do this or not.
 714
 715 Raising of strings as exceptions, as in the statement \code{raise
 716 "Error occurred"}, is deprecated in Python 2.5 and will trigger a
 717 warning.  The aim is to be able to remove the string-exception feature
 718 in a few releases.
 719
 720
 721 \begin{seealso}
 722
 723 \seepep{352}{Required Superclass for Exceptions}{PEP written by
 724 Brett Cannon and Guido van Rossum; implemented by Brett Cannon.}
 725
 726 \end{seealso}
 727
 728
 729 %======================================================================
 730 \section{PEP 353: Using ssize_t as the index type\label{section-353}}
 731
 732 A wide-ranging change to Python's C API, using a new
 733 \ctype{Py_ssize_t} type definition instead of \ctype{int},
 734 will permit the interpreter to handle more data on 64-bit platforms.
 735 This change doesn't affect Python's capacity on 32-bit platforms.
 736
 737 Various pieces of the Python interpreter used C's \ctype{int} type to
 738 store sizes or counts; for example, the number of items in a list or
 739 tuple were stored in an \ctype{int}.  The C compilers for most 64-bit
 740 platforms still define \ctype{int} as a 32-bit type, so that meant
 741 that lists could only hold up to \code{2**31 - 1} = 2147483647 items.
 742 (There are actually a few different programming models that 64-bit C
 743 compilers can use -- see
 744 \url{http://www.unix.org/version2/whatsnew/lp64_wp.html} for a
 745 discussion -- but the most commonly available model leaves \ctype{int}
 746 as 32 bits.)
 747
 748 A limit of 2147483647 items doesn't really matter on a 32-bit platform
 749 because you'll run out of memory before hitting the length limit.
 750 Each list item requires space for a pointer, which is 4 bytes, plus
 751 space for a \ctype{PyObject} representing the item.  2147483647*4 is
 752 already more bytes than a 32-bit address space can contain.
 753
 754 It's possible to address that much memory on a 64-bit platform,
 755 however.  The pointers for a list that size would only require 16GiB
 756 of space, so it's not unreasonable that Python programmers might
 757 construct lists that large.  Therefore, the Python interpreter had to
 758 be changed to use some type other than \ctype{int}, and this will be a
 759 64-bit type on 64-bit platforms.  The change will cause
 760 incompatibilities on 64-bit machines, so it was deemed worth making
 761 the transition now, while the number of 64-bit users is still
 762 relatively small.  (In 5 or 10 years, we may \emph{all} be on 64-bit
 763 machines, and the transition would be more painful then.)
 764
 765 This change most strongly affects authors of C extension modules.
 766 Python strings and container types such as lists and tuples
 767 now use \ctype{Py_ssize_t} to store their size.
 768 Functions such as \cfunction{PyList_Size()}
 769 now return \ctype{Py_ssize_t}.  Code in extension modules
 770 may therefore need to have some variables changed to
 771 \ctype{Py_ssize_t}.
 772
 773 The \cfunction{PyArg_ParseTuple()} and \cfunction{Py_BuildValue()} functions
 774 have a new conversion code, \samp{n}, for \ctype{Py_ssize_t}.
 775 \cfunction{PyArg_ParseTuple()}'s \samp{s\#} and \samp{t\#} still output
 776 \ctype{int} by default, but you can define the macro
 777 \csimplemacro{PY_SSIZE_T_CLEAN} before including \file{Python.h}
 778 to make them return \ctype{Py_ssize_t}.
 779
 780 \pep{353} has a section on conversion guidelines that
 781 extension authors should read to learn about supporting 64-bit
 782 platforms.
 783
 784 \begin{seealso}
 785
 786 \seepep{353}{Using ssize_t as the index type}{PEP written and implemented by Martin von~L\"owis.}
 787
 788 \end{seealso}
 789
 790
 791 %======================================================================
 792 \section{PEP 357: The '__index__' method}
 793
 794 The NumPy developers had a problem that could only be solved by adding
 795 a new special method, \method{__index__}.  When using slice notation,
 796 as in \code{[\var{start}:\var{stop}:\var{step}]}, the values of the
 797 \var{start}, \var{stop}, and \var{step} indexes must all be either
 798 integers or long integers.  NumPy defines a variety of specialized
 799 integer types corresponding to unsigned and signed integers of 8, 16,
 800 32, and 64 bits, but there was no way to signal that these types could
 801 be used as slice indexes.
 802
 803 Slicing can't just use the existing \method{__int__} method because
 804 that method is also used to implement coercion to integers.  If
 805 slicing used \method{__int__}, floating-point numbers would also
 806 become legal slice indexes and that's clearly an undesirable
 807 behaviour.
 808
 809 Instead, a new special method called \method{__index__} was added.  It
 810 takes no arguments and returns an integer giving the slice index to
 811 use.  For example:
 812
 813 \begin{verbatim}
 814 class C:
 815     def __index__ (self):
 816         return self.value
 817 \end{verbatim}
 818
 819 The return value must be either a Python integer or long integer.
 820 The interpreter will check that the type returned is correct, and
 821 raises a \exception{TypeError} if this requirement isn't met.
 822
 823 A corresponding \member{nb_index} slot was added to the C-level
 824 \ctype{PyNumberMethods} structure to let C extensions implement this
 825 protocol.  \cfunction{PyNumber_Index(\var{obj})} can be used in
 826 extension code to call the \method{__index__} function and retrieve
 827 its result.
 828
 829 \begin{seealso}
 830
 831 \seepep{357}{Allowing Any Object to be Used for Slicing}{PEP written
 832 and implemented by Travis Oliphant.}
 833
 834 \end{seealso}
 835
 836
 837 %======================================================================
 838 \section{Other Language Changes}
 839
 840 Here are all of the changes that Python 2.5 makes to the core Python
 841 language.
 842
 843 \begin{itemize}
 844
 845 \item The \function{min()} and \function{max()} built-in functions
 846 gained a \code{key} keyword argument analogous to the \code{key}
 847 argument for \method{sort()}.  This argument supplies a function
 848 that takes a single argument and is called for every value in the list;
 849 \function{min()}/\function{max()} will return the element with the
 850 smallest/largest return value from this function.
 851 For example, to find the longest string in a list, you can do:
 852
 853 \begin{verbatim}
 854 L = ['medium', 'longest', 'short']
 855 # Prints 'longest'
 856 print max(L, key=len)
 857 # Prints 'short', because lexicographically 'short' has the largest value
 858 print max(L)
 859 \end{verbatim}
 860
 861 (Contributed by Steven Bethard and Raymond Hettinger.)
 862
 863 \item Two new built-in functions, \function{any()} and
 864 \function{all()}, evaluate whether an iterator contains any true or
 865 false values.  \function{any()} returns \constant{True} if any value
 866 returned by the iterator is true; otherwise it will return
 867 \constant{False}.  \function{all()} returns \constant{True} only if
 868 all of the values returned by the iterator evaluate as being true.
 869 (Suggested by GvR, and implemented by Raymond Hettinger.)
 870
 871 \item ASCII is now the default encoding for modules.  It's now
 872 a syntax error if a module contains string literals with 8-bit
 873 characters but doesn't have an encoding declaration.  In Python 2.4
 874 this triggered a warning, not a syntax error.  See \pep{263}
 875 for how to declare a module's encoding; for example, you might add
 876 a line like this near the top of the source file:
 877
 878 \begin{verbatim}
 879 # -*- coding: latin1 -*-
 880 \end{verbatim}
 881
 882 \item The list of base classes in a class definition can now be empty.
 883 As an example, this is now legal:
 884
 885 \begin{verbatim}
 886 class C():
 887     pass
 888 \end{verbatim}
 889 (Implemented by Brett Cannon.)
 890
 891 % XXX __missing__ hook in dictionaries
 892
 893 \end{itemize}
 894
 895
 896 %======================================================================
 897 \subsection{Interactive Interpreter Changes}
 898
 899 In the interactive interpreter, \code{quit} and \code{exit}
 900 have long been strings so that new users get a somewhat helpful message
 901 when they try to quit:
 902
 903 \begin{verbatim}
 904 >>> quit
 905 'Use Ctrl-D (i.e. EOF) to exit.'
 906 \end{verbatim}
 907
 908 In Python 2.5, \code{quit} and \code{exit} are now objects that still
 909 produce string representations of themselves, but are also callable.
 910 Newbies who try \code{quit()} or \code{exit()} will now exit the
 911 interpreter as they expect.  (Implemented by Georg Brandl.)
 912
 913
 914 %======================================================================
 915 \subsection{Optimizations}
 916
 917 \begin{itemize}
 918
 919 \item When they were introduced
 920 in Python 2.4, the built-in \class{set} and \class{frozenset} types
 921 were built on top of Python's dictionary type.
 922 In 2.5 the internal data structure has been customized for implementing sets,
 923 and as a result sets will use a third less memory and are somewhat faster.
 924 (Implemented by Raymond Hettinger.)
 925
 926 \item The performance of some Unicode operations has been improved.
 927 % XXX provide details?
 928
 929 \item The code generator's peephole optimizer now performs
 930 simple constant folding in expressions.  If you write something like
 931 \code{a = 2+3}, the code generator will do the arithmetic and produce
 932 code corresponding to \code{a = 5}.
 933
 934 \end{itemize}
 935
 936 The net result of the 2.5 optimizations is that Python 2.5 runs the
 937 pystone benchmark around XXX\% faster than Python 2.4.
 938
 939
 940 %======================================================================
 941 \section{New, Improved, and Deprecated Modules}
 942
 943 As usual, Python's standard library received a number of enhancements and
 944 bug fixes.  Here's a partial list of the most notable changes, sorted
 945 alphabetically by module name. Consult the
 946 \file{Misc/NEWS} file in the source tree for a more
 947 complete list of changes, or look through the SVN logs for all the
 948 details.
 949
 950 \begin{itemize}
 951
 952 % collections.deque now has .remove()
 953 % collections.defaultdict
 954
 955 % the cPickle module no longer accepts the deprecated None option in the
 956 % args tuple returned by __reduce__().
 957
 958 % csv module improvements
 959
 960 % datetime.datetime() now has a strptime class method which can be used to
 961 % create datetime object using a string and format.
 962
 963 % fileinput: opening hook used to control how files are opened.
 964 % .input() now has a mode parameter
 965 % now has a fileno() function
 966 % accepts Unicode filenames
 967
 968 \item In the \module{gc} module, the new \function{get_count()} function
 969 returns a 3-tuple containing the current collection counts for the
 970 three GC generations.  This is accounting information for the garbage
 971 collector; when these counts reach a specified threshold, a garbage
 972 collection sweep will be made.  The existing \function{gc.collect()}
 973 function now takes an optional \var{generation} argument of 0, 1, or 2
 974 to specify which generation to collect.
 975
 976 \item The \function{nsmallest()} and
 977 \function{nlargest()} functions in the \module{heapq} module
 978 now support a \code{key} keyword argument similar to the one
 979 provided by the \function{min()}/\function{max()} functions
 980 and the \method{sort()} methods.  For example:
 981 Example:
 982
 983 \begin{verbatim}
 984 >>> import heapq
 985 >>> L = ["short", 'medium', 'longest', 'longer still']
 986 >>> heapq.nsmallest(2, L)  # Return two lowest elements, lexicographically
 987 ['longer still', 'longest']
 988 >>> heapq.nsmallest(2, L, key=len)   # Return two shortest elements
 989 ['short', 'medium']
 990 \end{verbatim}
 991
 992 (Contributed by Raymond Hettinger.)
 993
 994 \item The \function{itertools.islice()} function now accepts
 995 \code{None} for the start and step arguments.  This makes it more
 996 compatible with the attributes of slice objects, so that you can now write
 997 the following:
 998
 999 \begin{verbatim}
1000 s = slice(5)     # Create slice object
1001 itertools.islice(iterable, s.start, s.stop, s.step)
1002 \end{verbatim}
1003
1004 (Contributed by Raymond Hettinger.)
1005
1006 \item The \module{operator} module's \function{itemgetter()}
1007 and \function{attrgetter()} functions now support multiple fields.
1008 A call such as \code{operator.attrgetter('a', 'b')}
1009 will return a function
1010 that retrieves the \member{a} and \member{b} attributes.  Combining
1011 this new feature with the \method{sort()} method's \code{key} parameter
1012 lets you easily sort lists using multiple fields.
1013 (Contributed by Raymond Hettinger.)
1014
1015
1016 \item The \module{os} module underwent a number of changes.  The
1017 \member{stat_float_times} variable now defaults to true, meaning that
1018 \function{os.stat()} will now return time values as floats.  (This
1019 doesn't necessarily mean that \function{os.stat()} will return times
1020 that are precise to fractions of a second; not all systems support
1021 such precision.)
1022
1023 Constants named \member{os.SEEK_SET}, \member{os.SEEK_CUR}, and
1024 \member{os.SEEK_END} have been added; these are the parameters to the
1025 \function{os.lseek()} function.  Two new constants for locking are
1026 \member{os.O_SHLOCK} and \member{os.O_EXLOCK}.
1027
1028 Two new functions, \function{wait3()} and \function{wait4()}, were
1029 added.  They're similar the \function{waitpid()} function which waits
1030 for a child process to exit and returns a tuple of the process ID and
1031 its exit status, but \function{wait3()} and \function{wait4()} return
1032 additional information.  \function{wait3()} doesn't take a process ID
1033 as input, so it waits for any child process to exit and returns a
1034 3-tuple of \var{process-id}, \var{exit-status}, \var{resource-usage}
1035 as returned from the \function{resource.getrusage()} function.
1036 \function{wait4(\var{pid})} does take a process ID.
1037 (Contributed by Chad J. Schroeder.)
1038
1039 On FreeBSD, the \function{os.stat()} function now returns
1040 times with nanosecond resolution, and the returned object
1041 now has \member{st_gen} and \member{st_birthtime}.
1042 The \member{st_flags} member is also available, if the platform supports it.
1043 (Contributed by Antti Louko and  Diego Petten\`o.)
1044 % (Patch 1180695, 1212117)
1045
1046 \item The old \module{regex} and \module{regsub} modules, which have been
1047 deprecated ever since Python 2.0, have finally been deleted.
1048 Other deleted modules: \module{statcache}, \module{tzparse},
1049 \module{whrandom}.
1050
1051 \item The \file{lib-old} directory,
1052 which includes ancient modules such as \module{dircmp} and
1053 \module{ni}, was also deleted.  \file{lib-old} wasn't on the default
1054 \code{sys.path}, so unless your programs explicitly added the directory to
1055 \code{sys.path}, this removal shouldn't affect your code.
1056
1057 \item The \module{socket} module now supports \constant{AF_NETLINK}
1058 sockets on Linux, thanks to a patch from Philippe Biondi.
1059 Netlink sockets are a Linux-specific mechanism for communications
1060 between a user-space process and kernel code; an introductory
1061 article about them is at \url{http://www.linuxjournal.com/article/7356}.
1062 In Python code, netlink addresses are represented as a tuple of 2 integers,
1063 \code{(\var{pid}, \var{group_mask})}.
1064
1065 Socket objects also gained accessor methods \method{getfamily()},
1066 \method{gettype()}, and \method{getproto()} methods to retrieve the
1067 family, type, and protocol values for the socket.
1068
1069 \item New module: \module{spwd} provides functions for accessing the
1070 shadow password database on systems that support it.
1071 % XXX give example
1072
1073 % XXX patch #1382163: sys.subversion,  Py_GetBuildNumber()
1074
1075 \item The \class{TarFile} class in the \module{tarfile} module now has
1076 an \method{extractall()} method that extracts all members from the
1077 archive into the current working directory.  It's also possible to set
1078 a different directory as the extraction target, and to unpack only a
1079 subset of the archive's members.
1080
1081 A tarfile's compression can be autodetected by
1082 using the mode \code{'r|*'}.
1083 % patch 918101
1084 (Contributed by Lars Gust\"abel.)
1085
1086 \item The \module{unicodedata} module has been updated to use version 4.1.0
1087 of the Unicode character database.  Version 3.2.0 is required
1088 by some specifications, so it's still available as
1089 \member{unicodedata.db_3_2_0}.
1090
1091 % patch #754022: Greatly enhanced webbrowser.py (by Oleg Broytmann).
1092
1093
1094 \item The \module{xmlrpclib} module now supports returning
1095       \class{datetime} objects for the XML-RPC date type.  Supply
1096       \code{use_datetime=True} to the \function{loads()} function
1097       or the \class{Unmarshaller} class to enable this feature.
1098       (Contributed by Skip Montanaro.)
1099 % Patch 1120353
1100
1101
1102 \end{itemize}
1103
1104
1105
1106 %======================================================================
1107 % whole new modules get described in subsections here
1108
1109 \subsection{The ctypes package}
1110
1111 The \module{ctypes} package, written by Thomas Heller, has been added
1112 to the standard library.  \module{ctypes} lets you call arbitrary functions
1113 in shared libraries or DLLs.  Long-time users may remember the \module{dl} module, which
1114 provides functions for loading shared libraries and calling functions in them.  The \module{ctypes} package is much fancier.
1115
1116 To load a shared library or DLL, you must create an instance of the
1117 \class{CDLL} class and provide the name or path of the shared library
1118 or DLL.  Once that's done, you can call arbitrary functions
1119 by accessing them as attributes of the \class{CDLL} object.
1120
1121 \begin{verbatim}
1122 import ctypes
1123
1124 libc = ctypes.CDLL('libc.so.6')
1125 result = libc.printf("Line of output\n")
1126 \end{verbatim}
1127
1128 Type constructors for the various C types are provided: \function{c_int},
1129 \function{c_float}, \function{c_double}, \function{c_char_p} (equivalent to \ctype{char *}), and so forth.  Unlike Python's types, the C versions are all mutable; you can assign to their \member{value} attribute
1130 to change the wrapped value.  Python integers and strings will be automatically
1131 converted to the corresponding C types, but for other types you
1132 must call the correct type constructor.  (And I mean \emph{must};
1133 getting it wrong will often result in the interpreter crashing
1134 with a segmentation fault.)
1135
1136 You shouldn't use \function{c_char_p} with a Python string when the C function will be modifying the memory area, because Python strings are
1137 supposed to be immutable; breaking this rule will cause puzzling bugs.  When you need a modifiable memory area,
1138 use \function{create_string_buffer()}:
1139
1140 \begin{verbatim}
1141 s = "this is a string"
1142 buf = ctypes.create_string_buffer(s)
1143 libc.strfry(buf)
1144 \end{verbatim}
1145
1146 C functions are assumed to return integers, but you can set
1147 the \member{restype} attribute of the function object to
1148 change this:
1149
1150 \begin{verbatim}
1151 >>> libc.atof('2.71828')
1152 -1783957616
1153 >>> libc.atof.restype = ctypes.c_double
1154 >>> libc.atof('2.71828')
1155 2.71828
1156 \end{verbatim}
1157
1158 \module{ctypes} also provides a wrapper for Python's C API
1159 as the \code{ctypes.pythonapi} object.  This object does \emph{not}
1160 release the global interpreter lock before calling a function, because the lock must be held when calling into the interpreter's code.
1161 There's a \class{py_object()} type constructor that will create a
1162 \ctype{PyObject *} pointer.  A simple usage:
1163
1164 \begin{verbatim}
1165 import ctypes
1166
1167 d = {}
1168 ctypes.pythonapi.PyObject_SetItem(ctypes.py_object(d),
1169           ctypes.py_object("abc"),  ctypes.py_object(1))
1170 # d is now {'abc', 1}.
1171 \end{verbatim}
1172
1173 Don't forget to use \class{py_object()}; if it's omitted you end
1174 up with a segmentation fault.
1175
1176 \module{ctypes} has been around for a while, but people still write
1177 and distribution hand-coded extension modules because you can't rely on \module{ctypes} being present.
1178 Perhaps developers will begin to write
1179 Python wrappers atop a library accessed through \module{ctypes} instead
1180 of extension modules, now that \module{ctypes} is included with core Python.
1181
1182 % XXX write introduction
1183
1184 \begin{seealso}
1185
1186 \seeurl{http://starship.python.net/crew/theller/ctypes/}
1187 {The ctypes web page, with a tutorial, reference, and FAQ.}
1188
1189 \end{seealso}
1190
1191 \subsection{The ElementTree package}
1192
1193 A subset of Fredrik Lundh's ElementTree library for processing XML has
1194 been added to the standard library as \module{xmlcore.etree}.  The
1195 available modules are
1196 \module{ElementTree}, \module{ElementPath}, and
1197 \module{ElementInclude} from ElementTree 1.2.6.
1198 The \module{cElementTree} accelerator module is also included.
1199
1200 The rest of this section will provide a brief overview of using
1201 ElementTree.  Full documentation for ElementTree is available at
1202 \url{http://effbot.org/zone/element-index.htm}.
1203
1204 ElementTree represents an XML document as a tree of element nodes.
1205 The text content of the document is stored as the \member{.text}
1206 and \member{.tail} attributes of
1207 (This is one of the major differences between ElementTree and
1208 the Document Object Model; in the DOM there are many different
1209 types of node, including \class{TextNode}.)
1210
1211 The most commonly used parsing function is \function{parse()}, that
1212 takes either a string (assumed to contain a filename) or a file-like
1213 object and returns an \class{ElementTree} instance:
1214
1215 \begin{verbatim}
1216 from xmlcore.etree import ElementTree as ET
1217
1218 tree = ET.parse('ex-1.xml')
1219
1220 feed = urllib.urlopen(
1221           'http://planet.python.org/rss10.xml')
1222 tree = ET.parse(feed)
1223 \end{verbatim}
1224
1225 Once you have an \class{ElementTree} instance, you
1226 can call its \method{getroot()} method to get the root \class{Element} node.
1227
1228 There's also an \function{XML()} function that takes a string literal
1229 and returns an \class{Element} node (not an \class{ElementTree}).
1230 This function provides a tidy way to incorporate XML fragments,
1231 approaching the convenience of an XML literal:
1232
1233 \begin{verbatim}
1234 svg = et.XML("""<svg width="10px" version="1.0">
1235              </svg>""")
1236 svg.set('height', '320px')
1237 svg.append(elem1)
1238 \end{verbatim}
1239
1240 Each XML element supports some dictionary-like and some list-like
1241 access methods.  Dictionary-like operations are used to access attribute
1242 values, and list-like operations are used to access child nodes.
1243
1244 \begin{tableii}{c|l}{code}{Operation}{Result}
1245   \lineii{elem[n]}{Returns n'th child element.}
1246   \lineii{elem[m:n]}{Returns list of m'th through n'th child elements.}
1247   \lineii{len(elem)}{Returns number of child elements.}
1248   \lineii{elem.getchildren()}{Returns list of child elements.}
1249   \lineii{elem.append(elem2)}{Adds \var{elem2} as a child.}
1250   \lineii{elem.insert(index, elem2)}{Inserts \var{elem2} at the specified location.}
1251   \lineii{del elem[n]}{Deletes n'th child element.}
1252   \lineii{elem.keys()}{Returns list of attribute names.}
1253   \lineii{elem.get(name)}{Returns value of attribute \var{name}.}
1254   \lineii{elem.set(name, value)}{Sets new value for attribute \var{name}.}
1255   \lineii{elem.attrib}{Retrieves the dictionary containing attributes.}
1256   \lineii{del elem.attrib[name]}{Deletes attribute \var{name}.}
1257 \end{tableii}
1258
1259 Comments and processing instructions are also represented as
1260 \class{Element} nodes.  To check if a node is a comment or processing
1261 instructions:
1262
1263 \begin{verbatim}
1264 if elem.tag is ET.Comment:
1265     ...
1266 elif elem.tag is ET.ProcessingInstruction:
1267     ...
1268 \end{verbatim}
1269
1270 To generate XML output, you should call the
1271 \method{ElementTree.write()} method.  Like \function{parse()},
1272 it can take either a string or a file-like object:
1273
1274 \begin{verbatim}
1275 # Encoding is US-ASCII
1276 tree.write('output.xml')
1277
1278 # Encoding is UTF-8
1279 f = open('output.xml', 'w')
1280 tree.write(f, 'utf-8')
1281 \end{verbatim}
1282
1283 (Caution: the default encoding used for output is ASCII, which isn't
1284 very useful for general XML work, raising an exception if there are
1285 any characters with values greater than 127.  You should always
1286 specify a different encoding such as UTF-8 that can handle any Unicode
1287 character.)
1288
1289 This section is only a partial description of the ElementTree interfaces.
1290 Please read the package's official documentation for more details.
1291
1292 \begin{seealso}
1293
1294 \seeurl{http://effbot.org/zone/element-index.htm}
1295 {Official documentation for ElementTree.}
1296
1297
1298 \end{seealso}
1299
1300
1301 \subsection{The hashlib package}
1302
1303 A new \module{hashlib} module has been added to replace the
1304 \module{md5} and \module{sha} modules.  \module{hashlib} adds support
1305 for additional secure hashes (SHA-224, SHA-256, SHA-384, and SHA-512).
1306 When available, the module uses OpenSSL for fast platform optimized
1307 implementations of algorithms.
1308
1309 The old \module{md5} and \module{sha} modules still exist as wrappers
1310 around hashlib to preserve backwards compatibility.  The new module's
1311 interface is very close to that of the old modules, but not identical.
1312 The most significant difference is that the constructor functions
1313 for creating new hashing objects are named differently.
1314
1315 \begin{verbatim}
1316 # Old versions
1317 h = md5.md5()
1318 h = md5.new()
1319
1320 # New version
1321 h = hashlib.md5()
1322
1323 # Old versions
1324 h = sha.sha()
1325 h = sha.new()
1326
1327 # New version
1328 h = hashlib.sha1()
1329
1330 # Hash that weren't previously available
1331 h = hashlib.sha224()
1332 h = hashlib.sha256()
1333 h = hashlib.sha384()
1334 h = hashlib.sha512()
1335
1336 # Alternative form
1337 h = hashlib.new('md5')          # Provide algorithm as a string
1338 \end{verbatim}
1339
1340 Once a hash object has been created, its methods are the same as before:
1341 \method{update(\var{string})} hashes the specified string into the
1342 current digest state, \method{digest()} and \method{hexdigest()}
1343 return the digest value as a binary string or a string of hex digits,
1344 and \method{copy()} returns a new hashing object with the same digest state.
1345
1346 This module was contributed by Gregory P. Smith.
1347
1348
1349 \subsection{The sqlite3 package}
1350
1351 The pysqlite module (\url{http://www.pysqlite.org}), a wrapper for the
1352 SQLite embedded database, has been added to the standard library under
1353 the package name \module{sqlite3}.  SQLite is a C library that
1354 provides a SQL-language database that stores data in disk files
1355 without requiring a separate server process.  pysqlite was written by
1356 Gerhard H\"aring, and provides a SQL interface that complies with the
1357 DB-API 2.0 specification described by \pep{249}. This means that it
1358 should be possible to write the first version of your applications
1359 using SQLite for data storage and, if switching to a larger database
1360 such as PostgreSQL or Oracle is necessary, the switch should be
1361 relatively easy.
1362
1363 If you're compiling the Python source yourself, note that the source
1364 tree doesn't include the SQLite code itself, only the wrapper module.
1365 You'll need to have the SQLite libraries and headers installed before
1366 compiling Python, and the build process will compile the module when
1367 the necessary headers are available.
1368
1369 To use the module, you must first create a \class{Connection} object
1370 that represents the database.  Here the data will be stored in the
1371 \file{/tmp/example} file:
1372
1373 \begin{verbatim}
1374 conn = sqlite3.connect('/tmp/example')
1375 \end{verbatim}
1376
1377 You can also supply the special name \samp{:memory:} to create
1378 a database in RAM.
1379
1380 Once you have a \class{Connection}, you can create a \class{Cursor}
1381 object and call its \method{execute()} method to perform SQL commands:
1382
1383 \begin{verbatim}
1384 c = conn.cursor()
1385
1386 # Create table
1387 c.execute('''create table stocks
1388 (date timestamp, trans varchar, symbol varchar,
1389  qty decimal, price decimal)''')
1390
1391 # Insert a row of data
1392 c.execute("""insert into stocks
1393           values ('2006-01-05','BUY','RHAT',100, 35.14)""")
1394 \end{verbatim}
1395
1396 Usually your SQL queries will need to reflect the value of Python
1397 variables.  You shouldn't assemble your query using Python's string
1398 operations because doing so is insecure; it makes your program
1399 vulnerable to what's called an SQL injection attack.  Instead, use
1400 SQLite's parameter substitution, putting \samp{?} as a placeholder
1401 wherever you want to use a value, and then provide a tuple of values
1402 as the second argument to the cursor's \method{execute()} method.  For
1403 example:
1404
1405 \begin{verbatim}
1406 # Never do this -- insecure!
1407 symbol = 'IBM'
1408 c.execute("... where symbol = '%s'" % symbol)
1409
1410 # Do this instead
1411 t = (symbol,)
1412 c.execute("... where symbol = '?'", t)
1413
1414 # Larger example
1415 for t in (('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
1416           ('2006-04-05', 'BUY', 'MSOFT', 1000, 72.00),
1417           ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
1418          ):
1419     c.execute('insert into stocks values (?,?,?,?,?)', t)
1420 \end{verbatim}
1421
1422 To retrieve data after executing a SELECT statement, you can either
1423 treat the cursor as an iterator, call the cursor's \method{fetchone()}
1424 method to retrieve a single matching row,
1425 or call \method{fetchall()} to get a list of the matching rows.
1426
1427 This example uses the iterator form:
1428
1429 \begin{verbatim}
1430 >>> c = conn.cursor()
1431 >>> c.execute('select * from stocks order by price')
1432 >>> for row in c:
1433 ...    print row
1434 ...
1435 (u'2006-01-05', u'BUY', u'RHAT', 100, 35.140000000000001)
1436 (u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
1437 (u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
1438 (u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0)
1439 >>>
1440 \end{verbatim}
1441
1442 You should also use parameter substitution with SELECT statements:
1443
1444 \begin{verbatim}
1445 >>> c.execute('select * from stocks where symbol=?', ('IBM',))
1446 >>> print c.fetchall()
1447 [(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0),
1448  (u'2006-04-06', u'SELL', u'IBM', 500, 53.0)]
1449 \end{verbatim}
1450
1451 For more information about the SQL dialect supported by SQLite, see
1452 \url{http://www.sqlite.org}.
1453
1454 \begin{seealso}
1455
1456 \seeurl{http://www.pysqlite.org}
1457 {The pysqlite web page.}
1458
1459 \seeurl{http://www.sqlite.org}
1460 {The SQLite web page; the documentation describes the syntax and the
1461 available data types for the supported SQL dialect.}
1462
1463 \seepep{249}{Database API Specification 2.0}{PEP written by
1464 Marc-Andr\'e Lemburg.}
1465
1466 \end{seealso}
1467
1468
1469 % ======================================================================
1470 \section{Build and C API Changes}
1471
1472 Changes to Python's build process and to the C API include:
1473
1474 \begin{itemize}
1475
1476 \item The largest change to the C API came from \pep{353},
1477 which modifies the interpreter to use a \ctype{Py_ssize_t} type
1478 definition instead of \ctype{int}.  See the earlier
1479 section~ref{section-353} for a discussion of this change.
1480
1481 \item The design of the bytecode compiler has changed a great deal, to
1482 no longer generate bytecode by traversing the parse tree.  Instead
1483 the parse tree is converted to an abstract syntax tree (or AST), and it is
1484 the abstract syntax tree that's traversed to produce the bytecode.
1485
1486 It's possible for Python code to obtain AST objects by using the
1487 \function{compile()} built-in and specifying \code{_ast.PyCF_ONLY_AST}
1488 as the value of the
1489 \var{flags} parameter:
1490
1491 \begin{verbatim}
1492 from _ast import PyCF_ONLY_AST
1493 ast = compile("""a=0
1494 for i in range(10):
1495     a += i
1496 """, "<string>", 'exec', PyCF_ONLY_AST)
1497
1498 assignment = ast.body[0]
1499 for_loop = ast.body[1]
1500 \end{verbatim}
1501
1502 No documentation has been written for the AST code yet.  To start
1503 learning about it, read the definition of the various AST nodes in
1504 \file{Parser/Python.asdl}.  A Python script reads this file and
1505 generates a set of C structure definitions in
1506 \file{Include/Python-ast.h}.  The \cfunction{PyParser_ASTFromString()}
1507 and \cfunction{PyParser_ASTFromFile()}, defined in
1508 \file{Include/pythonrun.h}, take Python source as input and return the
1509 root of an AST representing the contents.  This AST can then be turned
1510 into a code object by \cfunction{PyAST_Compile()}.  For more
1511 information, read the source code, and then ask questions on
1512 python-dev.
1513
1514 % List of names taken from Jeremy's python-dev post at
1515 % http://mail.python.org/pipermail/python-dev/2005-October/057500.html
1516 The AST code was developed under Jeremy Hylton's management, and
1517 implemented by (in alphabetical order) Brett Cannon, Nick Coghlan,
1518 Grant Edwards, John Ehresman, Kurt Kaiser, Neal Norwitz, Tim Peters,
1519 Armin Rigo, and Neil Schemenauer, plus the participants in a number of
1520 AST sprints at conferences such as PyCon.
1521
1522 \item The built-in set types now have an official C API.  Call
1523 \cfunction{PySet_New()} and \cfunction{PyFrozenSet_New()} to create a
1524 new set, \cfunction{PySet_Add()} and \cfunction{PySet_Discard()} to
1525 add and remove elements, and \cfunction{PySet_Contains} and
1526 \cfunction{PySet_Size} to examine the set's state.
1527
1528 \item The \cfunction{PyRange_New()} function was removed.  It was
1529 never documented, never used in the core code, and had dangerously lax
1530 error checking.
1531
1532 \end{itemize}
1533
1534
1535 %======================================================================
1536 %\subsection{Port-Specific Changes}
1537
1538 %Platform-specific changes go here.
1539
1540
1541 %======================================================================
1542 \section{Other Changes and Fixes \label{section-other}}
1543
1544 As usual, there were a bunch of other improvements and bugfixes
1545 scattered throughout the source tree.  A search through the SVN change
1546 logs finds there were XXX patches applied and YYY bugs fixed between
1547 Python 2.4 and 2.5.  Both figures are likely to be underestimates.
1548
1549 Some of the more notable changes are:
1550
1551 \begin{itemize}
1552
1553 \item Evan Jones's patch to obmalloc, first described in a talk
1554 at PyCon DC 2005, was applied.  Python 2.4 allocated small objects in
1555 256K-sized arenas, but never freed arenas.  With this patch, Python
1556 will free arenas when they're empty.  The net effect is that on some
1557 platforms, when you allocate many objects, Python's memory usage may
1558 actually drop when you delete them, and the memory may be returned to
1559 the operating system.  (Implemented by Evan Jones, and reworked by Tim
1560 Peters.)
1561
1562 Note that this change means extension modules need to be more careful
1563 with how they allocate memory.  Python's API has a number of different
1564 functions for allocating memory that are grouped into families.  For
1565 example, \cfunction{PyMem_Malloc()}, \cfunction{PyMem_Realloc()}, and
1566 \cfunction{PyMem_Free()} are one family that allocates raw memory,
1567 while \cfunction{PyObject_Malloc()}, \cfunction{PyObject_Realloc()},
1568 and \cfunction{PyObject_Free()} are another family that's supposed to
1569 be used for creating Python objects.
1570
1571 Previously these different families all reduced to the platform's
1572 \cfunction{malloc()} and \cfunction{free()} functions.  This meant
1573 it didn't matter if you got things wrong and allocated memory with the
1574 \cfunction{PyMem} function but freed it with the \cfunction{PyObject}
1575 function.  With the obmalloc change, these families now do different
1576 things, and mismatches will probably result in a segfault.  You should
1577 carefully test your C extension modules with Python 2.5.
1578
1579 \item Coverity, a company that markets a source code analysis tool
1580   called Prevent, provided the results of their examination of the Python
1581   source code.  The analysis found a number of refcounting bugs, often
1582   in error-handling code.  These bugs have been fixed.
1583   % XXX provide reference?
1584
1585 \end{itemize}
1586
1587
1588 %======================================================================
1589 \section{Porting to Python 2.5}
1590
1591 This section lists previously described changes that may require
1592 changes to your code:
1593
1594 \begin{itemize}
1595
1596 \item ASCII is now the default encoding for modules.  It's now
1597 a syntax error if a module contains string literals with 8-bit
1598 characters but doesn't have an encoding declaration.  In Python 2.4
1599 this triggered a warning, not a syntax error.
1600
1601 \item The \module{pickle} module no longer uses the deprecated \var{bin} parameter.
1602
1603 \item C API: Many functions now use \ctype{Py_ssize_t}
1604 instead of \ctype{int} to allow processing more data
1605 on 64-bit machines.  Extension code may need to make
1606 the same change to avoid warnings and to support 64-bit machines.
1607 See the earlier
1608 section~ref{section-353} for a discussion of this change.
1609
1610 \item C API:
1611 The obmalloc changes mean that
1612 you must be careful to not mix usage
1613 of the \cfunction{PyMem_*()} and \cfunction{PyObject_*()}
1614 families of functions. Memory allocated with
1615 one family's \cfunction{*_Malloc()} must be
1616 freed with the corresponding family's \cfunction{*_Free()} function.
1617
1618 \end{itemize}
1619
1620
1621 %======================================================================
1622 \section{Acknowledgements \label{acks}}
1623
1624 The author would like to thank the following people for offering
1625 suggestions, corrections and assistance with various drafts of this
1626 article: Martin von~L\"owis, Mike Rovner, Thomas Wouters.
1627
1628 \end{document}