Added a test for the ability to specify a class attribute in Formatter configuration...
[python.git] / Doc / lib / libpickle.tex
blob99dd330e6324aeef4d36a82c8f1b9b0fecae9afd
1 \section{\module{pickle} --- Python object serialization}
3 \declaremodule{standard}{pickle}
4 \modulesynopsis{Convert Python objects to streams of bytes and back.}
5 % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
6 % Rewritten by Barry Warsaw <barry@zope.com>
8 \index{persistence}
9 \indexii{persistent}{objects}
10 \indexii{serializing}{objects}
11 \indexii{marshalling}{objects}
12 \indexii{flattening}{objects}
13 \indexii{pickling}{objects}
15 The \module{pickle} module implements a fundamental, but powerful
16 algorithm for serializing and de-serializing a Python object
17 structure. ``Pickling'' is the process whereby a Python object
18 hierarchy is converted into a byte stream, and ``unpickling'' is the
19 inverse operation, whereby a byte stream is converted back into an
20 object hierarchy. Pickling (and unpickling) is alternatively known as
21 ``serialization'', ``marshalling,''\footnote{Don't confuse this with
22 the \refmodule{marshal} module} or ``flattening'',
23 however, to avoid confusion, the terms used here are ``pickling'' and
24 ``unpickling''.
26 This documentation describes both the \module{pickle} module and the
27 \refmodule{cPickle} module.
29 \subsection{Relationship to other Python modules}
31 The \module{pickle} module has an optimized cousin called the
32 \module{cPickle} module. As its name implies, \module{cPickle} is
33 written in C, so it can be up to 1000 times faster than
34 \module{pickle}. However it does not support subclassing of the
35 \function{Pickler()} and \function{Unpickler()} classes, because in
36 \module{cPickle} these are functions, not classes. Most applications
37 have no need for this functionality, and can benefit from the improved
38 performance of \module{cPickle}. Other than that, the interfaces of
39 the two modules are nearly identical; the common interface is
40 described in this manual and differences are pointed out where
41 necessary. In the following discussions, we use the term ``pickle''
42 to collectively describe the \module{pickle} and
43 \module{cPickle} modules.
45 The data streams the two modules produce are guaranteed to be
46 interchangeable.
48 Python has a more primitive serialization module called
49 \refmodule{marshal}, but in general
50 \module{pickle} should always be the preferred way to serialize Python
51 objects. \module{marshal} exists primarily to support Python's
52 \file{.pyc} files.
54 The \module{pickle} module differs from \refmodule{marshal} several
55 significant ways:
57 \begin{itemize}
59 \item The \module{pickle} module keeps track of the objects it has
60 already serialized, so that later references to the same object
61 won't be serialized again. \module{marshal} doesn't do this.
63 This has implications both for recursive objects and object
64 sharing. Recursive objects are objects that contain references
65 to themselves. These are not handled by marshal, and in fact,
66 attempting to marshal recursive objects will crash your Python
67 interpreter. Object sharing happens when there are multiple
68 references to the same object in different places in the object
69 hierarchy being serialized. \module{pickle} stores such objects
70 only once, and ensures that all other references point to the
71 master copy. Shared objects remain shared, which can be very
72 important for mutable objects.
74 \item \module{marshal} cannot be used to serialize user-defined
75 classes and their instances. \module{pickle} can save and
76 restore class instances transparently, however the class
77 definition must be importable and live in the same module as
78 when the object was stored.
80 \item The \module{marshal} serialization format is not guaranteed to
81 be portable across Python versions. Because its primary job in
82 life is to support \file{.pyc} files, the Python implementers
83 reserve the right to change the serialization format in
84 non-backwards compatible ways should the need arise. The
85 \module{pickle} serialization format is guaranteed to be
86 backwards compatible across Python releases.
88 \end{itemize}
90 \begin{notice}[warning]
91 The \module{pickle} module is not intended to be secure against
92 erroneous or maliciously constructed data. Never unpickle data
93 received from an untrusted or unauthenticated source.
94 \end{notice}
96 Note that serialization is a more primitive notion than persistence;
97 although
98 \module{pickle} reads and writes file objects, it does not handle the
99 issue of naming persistent objects, nor the (even more complicated)
100 issue of concurrent access to persistent objects. The \module{pickle}
101 module can transform a complex object into a byte stream and it can
102 transform the byte stream into an object with the same internal
103 structure. Perhaps the most obvious thing to do with these byte
104 streams is to write them onto a file, but it is also conceivable to
105 send them across a network or store them in a database. The module
106 \refmodule{shelve} provides a simple interface
107 to pickle and unpickle objects on DBM-style database files.
109 \subsection{Data stream format}
111 The data format used by \module{pickle} is Python-specific. This has
112 the advantage that there are no restrictions imposed by external
113 standards such as XDR\index{XDR}\index{External Data Representation}
114 (which can't represent pointer sharing); however it means that
115 non-Python programs may not be able to reconstruct pickled Python
116 objects.
118 By default, the \module{pickle} data format uses a printable \ASCII{}
119 representation. This is slightly more voluminous than a binary
120 representation. The big advantage of using printable \ASCII{} (and of
121 some other characteristics of \module{pickle}'s representation) is that
122 for debugging or recovery purposes it is possible for a human to read
123 the pickled file with a standard text editor.
125 There are currently 3 different protocols which can be used for pickling.
127 \begin{itemize}
129 \item Protocol version 0 is the original ASCII protocol and is backwards
130 compatible with earlier versions of Python.
132 \item Protocol version 1 is the old binary format which is also compatible
133 with earlier versions of Python.
135 \item Protocol version 2 was introduced in Python 2.3. It provides
136 much more efficient pickling of new-style classes.
138 \end{itemize}
140 Refer to PEP 307 for more information.
142 If a \var{protocol} is not specified, protocol 0 is used.
143 If \var{protocol} is specified as a negative value
144 or \constant{HIGHEST_PROTOCOL},
145 the highest protocol version available will be used.
147 \versionchanged[Introduced the \var{protocol} parameter]{2.3}
149 A binary format, which is slightly more efficient, can be chosen by
150 specifying a \var{protocol} version >= 1.
152 \subsection{Usage}
154 To serialize an object hierarchy, you first create a pickler, then you
155 call the pickler's \method{dump()} method. To de-serialize a data
156 stream, you first create an unpickler, then you call the unpickler's
157 \method{load()} method. The \module{pickle} module provides the
158 following constant:
160 \begin{datadesc}{HIGHEST_PROTOCOL}
161 The highest protocol version available. This value can be passed
162 as a \var{protocol} value.
163 \versionadded{2.3}
164 \end{datadesc}
166 The \module{pickle} module provides the
167 following functions to make this process more convenient:
169 \begin{funcdesc}{dump}{obj, file\optional{, protocol}}
170 Write a pickled representation of \var{obj} to the open file object
171 \var{file}. This is equivalent to
172 \code{Pickler(\var{file}, \var{protocol}).dump(\var{obj})}.
174 If the \var{protocol} parameter is omitted, protocol 0 is used.
175 If \var{protocol} is specified as a negative value
176 or \constant{HIGHEST_PROTOCOL},
177 the highest protocol version will be used.
179 \versionchanged[Introduced the \var{protocol} parameter]{2.3}
181 \var{file} must have a \method{write()} method that accepts a single
182 string argument. It can thus be a file object opened for writing, a
183 \refmodule{StringIO} object, or any other custom
184 object that meets this interface.
185 \end{funcdesc}
187 \begin{funcdesc}{load}{file}
188 Read a string from the open file object \var{file} and interpret it as
189 a pickle data stream, reconstructing and returning the original object
190 hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
192 \var{file} must have two methods, a \method{read()} method that takes
193 an integer argument, and a \method{readline()} method that requires no
194 arguments. Both methods should return a string. Thus \var{file} can
195 be a file object opened for reading, a
196 \module{StringIO} object, or any other custom
197 object that meets this interface.
199 This function automatically determines whether the data stream was
200 written in binary mode or not.
201 \end{funcdesc}
203 \begin{funcdesc}{dumps}{obj\optional{, protocol}}
204 Return the pickled representation of the object as a string, instead
205 of writing it to a file.
207 If the \var{protocol} parameter is omitted, protocol 0 is used.
208 If \var{protocol} is specified as a negative value
209 or \constant{HIGHEST_PROTOCOL},
210 the highest protocol version will be used.
212 \versionchanged[The \var{protocol} parameter was added]{2.3}
214 \end{funcdesc}
216 \begin{funcdesc}{loads}{string}
217 Read a pickled object hierarchy from a string. Characters in the
218 string past the pickled object's representation are ignored.
219 \end{funcdesc}
221 The \module{pickle} module also defines three exceptions:
223 \begin{excdesc}{PickleError}
224 A common base class for the other exceptions defined below. This
225 inherits from \exception{Exception}.
226 \end{excdesc}
228 \begin{excdesc}{PicklingError}
229 This exception is raised when an unpicklable object is passed to
230 the \method{dump()} method.
231 \end{excdesc}
233 \begin{excdesc}{UnpicklingError}
234 This exception is raised when there is a problem unpickling an object.
235 Note that other exceptions may also be raised during unpickling,
236 including (but not necessarily limited to) \exception{AttributeError},
237 \exception{EOFError}, \exception{ImportError}, and \exception{IndexError}.
238 \end{excdesc}
240 The \module{pickle} module also exports two callables\footnote{In the
241 \module{pickle} module these callables are classes, which you could
242 subclass to customize the behavior. However, in the \refmodule{cPickle}
243 module these callables are factory functions and so cannot be
244 subclassed. One common reason to subclass is to control what
245 objects can actually be unpickled. See section~\ref{pickle-sub} for
246 more details.}, \class{Pickler} and \class{Unpickler}:
248 \begin{classdesc}{Pickler}{file\optional{, protocol}}
249 This takes a file-like object to which it will write a pickle data
250 stream.
252 If the \var{protocol} parameter is omitted, protocol 0 is used.
253 If \var{protocol} is specified as a negative value,
254 the highest protocol version will be used.
256 \versionchanged[Introduced the \var{protocol} parameter]{2.3}
258 \var{file} must have a \method{write()} method that accepts a single
259 string argument. It can thus be an open file object, a
260 \module{StringIO} object, or any other custom
261 object that meets this interface.
262 \end{classdesc}
264 \class{Pickler} objects define one (or two) public methods:
266 \begin{methoddesc}[Pickler]{dump}{obj}
267 Write a pickled representation of \var{obj} to the open file object
268 given in the constructor. Either the binary or \ASCII{} format will
269 be used, depending on the value of the \var{protocol} argument passed to the
270 constructor.
271 \end{methoddesc}
273 \begin{methoddesc}[Pickler]{clear_memo}{}
274 Clears the pickler's ``memo''. The memo is the data structure that
275 remembers which objects the pickler has already seen, so that shared
276 or recursive objects pickled by reference and not by value. This
277 method is useful when re-using picklers.
279 \begin{notice}
280 Prior to Python 2.3, \method{clear_memo()} was only available on the
281 picklers created by \refmodule{cPickle}. In the \module{pickle} module,
282 picklers have an instance variable called \member{memo} which is a
283 Python dictionary. So to clear the memo for a \module{pickle} module
284 pickler, you could do the following:
286 \begin{verbatim}
287 mypickler.memo.clear()
288 \end{verbatim}
290 Code that does not need to support older versions of Python should
291 simply use \method{clear_memo()}.
292 \end{notice}
293 \end{methoddesc}
295 It is possible to make multiple calls to the \method{dump()} method of
296 the same \class{Pickler} instance. These must then be matched to the
297 same number of calls to the \method{load()} method of the
298 corresponding \class{Unpickler} instance. If the same object is
299 pickled by multiple \method{dump()} calls, the \method{load()} will
300 all yield references to the same object.\footnote{\emph{Warning}: this
301 is intended for pickling multiple objects without intervening
302 modifications to the objects or their parts. If you modify an object
303 and then pickle it again using the same \class{Pickler} instance, the
304 object is not pickled again --- a reference to it is pickled and the
305 \class{Unpickler} will return the old value, not the modified one.
306 There are two problems here: (1) detecting changes, and (2)
307 marshalling a minimal set of changes. Garbage Collection may also
308 become a problem here.}
310 \class{Unpickler} objects are defined as:
312 \begin{classdesc}{Unpickler}{file}
313 This takes a file-like object from which it will read a pickle data
314 stream. This class automatically determines whether the data stream
315 was written in binary mode or not, so it does not need a flag as in
316 the \class{Pickler} factory.
318 \var{file} must have two methods, a \method{read()} method that takes
319 an integer argument, and a \method{readline()} method that requires no
320 arguments. Both methods should return a string. Thus \var{file} can
321 be a file object opened for reading, a
322 \module{StringIO} object, or any other custom
323 object that meets this interface.
324 \end{classdesc}
326 \class{Unpickler} objects have one (or two) public methods:
328 \begin{methoddesc}[Unpickler]{load}{}
329 Read a pickled object representation from the open file object given
330 in the constructor, and return the reconstituted object hierarchy
331 specified therein.
332 \end{methoddesc}
334 \begin{methoddesc}[Unpickler]{noload}{}
335 This is just like \method{load()} except that it doesn't actually
336 create any objects. This is useful primarily for finding what's
337 called ``persistent ids'' that may be referenced in a pickle data
338 stream. See section~\ref{pickle-protocol} below for more details.
340 \strong{Note:} the \method{noload()} method is currently only
341 available on \class{Unpickler} objects created with the
342 \module{cPickle} module. \module{pickle} module \class{Unpickler}s do
343 not have the \method{noload()} method.
344 \end{methoddesc}
346 \subsection{What can be pickled and unpickled?}
348 The following types can be pickled:
350 \begin{itemize}
352 \item \code{None}, \code{True}, and \code{False}
354 \item integers, long integers, floating point numbers, complex numbers
356 \item normal and Unicode strings
358 \item tuples, lists, sets, and dictionaries containing only picklable objects
360 \item functions defined at the top level of a module
362 \item built-in functions defined at the top level of a module
364 \item classes that are defined at the top level of a module
366 \item instances of such classes whose \member{__dict__} or
367 \method{__setstate__()} is picklable (see
368 section~\ref{pickle-protocol} for details)
370 \end{itemize}
372 Attempts to pickle unpicklable objects will raise the
373 \exception{PicklingError} exception; when this happens, an unspecified
374 number of bytes may have already been written to the underlying file.
375 Trying to pickle a highly recursive data structure may exceed the
376 maximum recursion depth, a \exception{RuntimeError} will be raised
377 in this case. You can carefully raise this limit with
378 \function{sys.setrecursionlimit()}.
380 Note that functions (built-in and user-defined) are pickled by ``fully
381 qualified'' name reference, not by value. This means that only the
382 function name is pickled, along with the name of module the function
383 is defined in. Neither the function's code, nor any of its function
384 attributes are pickled. Thus the defining module must be importable
385 in the unpickling environment, and the module must contain the named
386 object, otherwise an exception will be raised.\footnote{The exception
387 raised will likely be an \exception{ImportError} or an
388 \exception{AttributeError} but it could be something else.}
390 Similarly, classes are pickled by named reference, so the same
391 restrictions in the unpickling environment apply. Note that none of
392 the class's code or data is pickled, so in the following example the
393 class attribute \code{attr} is not restored in the unpickling
394 environment:
396 \begin{verbatim}
397 class Foo:
398 attr = 'a class attr'
400 picklestring = pickle.dumps(Foo)
401 \end{verbatim}
403 These restrictions are why picklable functions and classes must be
404 defined in the top level of a module.
406 Similarly, when class instances are pickled, their class's code and
407 data are not pickled along with them. Only the instance data are
408 pickled. This is done on purpose, so you can fix bugs in a class or
409 add methods to the class and still load objects that were created with
410 an earlier version of the class. If you plan to have long-lived
411 objects that will see many versions of a class, it may be worthwhile
412 to put a version number in the objects so that suitable conversions
413 can be made by the class's \method{__setstate__()} method.
415 \subsection{The pickle protocol
416 \label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
418 This section describes the ``pickling protocol'' that defines the
419 interface between the pickler/unpickler and the objects that are being
420 serialized. This protocol provides a standard way for you to define,
421 customize, and control how your objects are serialized and
422 de-serialized. The description in this section doesn't cover specific
423 customizations that you can employ to make the unpickling environment
424 slightly safer from untrusted pickle data streams; see section~\ref{pickle-sub}
425 for more details.
427 \subsubsection{Pickling and unpickling normal class
428 instances\label{pickle-inst}}
430 When a pickled class instance is unpickled, its \method{__init__()}
431 method is normally \emph{not} invoked. If it is desirable that the
432 \method{__init__()} method be called on unpickling, an old-style class
433 can define a method \method{__getinitargs__()}, which should return a
434 \emph{tuple} containing the arguments to be passed to the class
435 constructor (\method{__init__()} for example). The
436 \method{__getinitargs__()} method is called at
437 pickle time; the tuple it returns is incorporated in the pickle for
438 the instance.
439 \withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
440 \withsubitem{(instance constructor)}{\ttindex{__init__()}}
442 \withsubitem{(copy protocol)}{\ttindex{__getnewargs__()}}
444 New-style types can provide a \method{__getnewargs__()} method that is
445 used for protocol 2. Implementing this method is needed if the type
446 establishes some internal invariants when the instance is created, or
447 if the memory allocation is affected by the values passed to the
448 \method{__new__()} method for the type (as it is for tuples and
449 strings). Instances of a new-style type \class{C} are created using
451 \begin{alltt}
452 obj = C.__new__(C, *\var{args})
453 \end{alltt}
455 where \var{args} is the result of calling \method{__getnewargs__()} on
456 the original object; if there is no \method{__getnewargs__()}, an
457 empty tuple is assumed.
459 \withsubitem{(copy protocol)}{
460 \ttindex{__getstate__()}\ttindex{__setstate__()}}
461 \withsubitem{(instance attribute)}{
462 \ttindex{__dict__}}
464 Classes can further influence how their instances are pickled; if the
465 class defines the method \method{__getstate__()}, it is called and the
466 return state is pickled as the contents for the instance, instead of
467 the contents of the instance's dictionary. If there is no
468 \method{__getstate__()} method, the instance's \member{__dict__} is
469 pickled.
471 Upon unpickling, if the class also defines the method
472 \method{__setstate__()}, it is called with the unpickled
473 state.\footnote{These methods can also be used to implement copying
474 class instances.} If there is no \method{__setstate__()} method, the
475 pickled state must be a dictionary and its items are assigned to the
476 new instance's dictionary. If a class defines both
477 \method{__getstate__()} and \method{__setstate__()}, the state object
478 needn't be a dictionary and these methods can do what they
479 want.\footnote{This protocol is also used by the shallow and deep
480 copying operations defined in the
481 \refmodule{copy} module.}
483 \begin{notice}[warning]
484 For new-style classes, if \method{__getstate__()} returns a false
485 value, the \method{__setstate__()} method will not be called.
486 \end{notice}
489 \subsubsection{Pickling and unpickling extension types}
491 When the \class{Pickler} encounters an object of a type it knows
492 nothing about --- such as an extension type --- it looks in two places
493 for a hint of how to pickle it. One alternative is for the object to
494 implement a \method{__reduce__()} method. If provided, at pickling
495 time \method{__reduce__()} will be called with no arguments, and it
496 must return either a string or a tuple.
498 If a string is returned, it names a global variable whose contents are
499 pickled as normal. The string returned by \method{__reduce__} should
500 be the object's local name relative to its module; the pickle module
501 searches the module namespace to determine the object's module.
503 When a tuple is returned, it must be between two and five elements
504 long. Optional elements can either be omitted, or \code{None} can be provided
505 as their value. The semantics of each element are:
507 \begin{itemize}
509 \item A callable object that will be called to create the initial
510 version of the object. The next element of the tuple will provide
511 arguments for this callable, and later elements provide additional
512 state information that will subsequently be used to fully reconstruct
513 the pickled date.
515 In the unpickling environment this object must be either a class, a
516 callable registered as a ``safe constructor'' (see below), or it must
517 have an attribute \member{__safe_for_unpickling__} with a true value.
518 Otherwise, an \exception{UnpicklingError} will be raised in the
519 unpickling environment. Note that as usual, the callable itself is
520 pickled by name.
522 \item A tuple of arguments for the callable object.
523 \versionchanged[Formerly, this argument could also be \code{None}]{2.5}
525 \item Optionally, the object's state, which will be passed to
526 the object's \method{__setstate__()} method as described in
527 section~\ref{pickle-inst}. If the object has no
528 \method{__setstate__()} method, then, as above, the value must
529 be a dictionary and it will be added to the object's
530 \member{__dict__}.
532 \item Optionally, an iterator (and not a sequence) yielding successive
533 list items. These list items will be pickled, and appended to the
534 object using either \code{obj.append(\var{item})} or
535 \code{obj.extend(\var{list_of_items})}. This is primarily used for
536 list subclasses, but may be used by other classes as long as they have
537 \method{append()} and \method{extend()} methods with the appropriate
538 signature. (Whether \method{append()} or \method{extend()} is used
539 depends on which pickle protocol version is used as well as the number
540 of items to append, so both must be supported.)
542 \item Optionally, an iterator (not a sequence)
543 yielding successive dictionary items, which should be tuples of the
544 form \code{(\var{key}, \var{value})}. These items will be pickled
545 and stored to the object using \code{obj[\var{key}] = \var{value}}.
546 This is primarily used for dictionary subclasses, but may be used by
547 other classes as long as they implement \method{__setitem__}.
549 \end{itemize}
551 It is sometimes useful to know the protocol version when implementing
552 \method{__reduce__}. This can be done by implementing a method named
553 \method{__reduce_ex__} instead of \method{__reduce__}.
554 \method{__reduce_ex__}, when it exists, is called in preference over
555 \method{__reduce__} (you may still provide \method{__reduce__} for
556 backwards compatibility). The \method{__reduce_ex__} method will be
557 called with a single integer argument, the protocol version.
559 The \class{object} class implements both \method{__reduce__} and
560 \method{__reduce_ex__}; however, if a subclass overrides
561 \method{__reduce__} but not \method{__reduce_ex__}, the
562 \method{__reduce_ex__} implementation detects this and calls
563 \method{__reduce__}.
565 An alternative to implementing a \method{__reduce__()} method on the
566 object to be pickled, is to register the callable with the
567 \refmodule[copyreg]{copy_reg} module. This module provides a way
568 for programs to register ``reduction functions'' and constructors for
569 user-defined types. Reduction functions have the same semantics and
570 interface as the \method{__reduce__()} method described above, except
571 that they are called with a single argument, the object to be pickled.
573 The registered constructor is deemed a ``safe constructor'' for purposes
574 of unpickling as described above.
577 \subsubsection{Pickling and unpickling external objects}
579 For the benefit of object persistence, the \module{pickle} module
580 supports the notion of a reference to an object outside the pickled
581 data stream. Such objects are referenced by a ``persistent id'',
582 which is just an arbitrary string of printable \ASCII{} characters.
583 The resolution of such names is not defined by the \module{pickle}
584 module; it will delegate this resolution to user defined functions on
585 the pickler and unpickler.\footnote{The actual mechanism for
586 associating these user defined functions is slightly different for
587 \module{pickle} and \module{cPickle}. The description given here
588 works the same for both implementations. Users of the \module{pickle}
589 module could also use subclassing to effect the same results,
590 overriding the \method{persistent_id()} and \method{persistent_load()}
591 methods in the derived classes.}
593 To define external persistent id resolution, you need to set the
594 \member{persistent_id} attribute of the pickler object and the
595 \member{persistent_load} attribute of the unpickler object.
597 To pickle objects that have an external persistent id, the pickler
598 must have a custom \function{persistent_id()} method that takes an
599 object as an argument and returns either \code{None} or the persistent
600 id for that object. When \code{None} is returned, the pickler simply
601 pickles the object as normal. When a persistent id string is
602 returned, the pickler will pickle that string, along with a marker
603 so that the unpickler will recognize the string as a persistent id.
605 To unpickle external objects, the unpickler must have a custom
606 \function{persistent_load()} function that takes a persistent id
607 string and returns the referenced object.
609 Here's a silly example that \emph{might} shed more light:
611 \begin{verbatim}
612 import pickle
613 from cStringIO import StringIO
615 src = StringIO()
616 p = pickle.Pickler(src)
618 def persistent_id(obj):
619 if hasattr(obj, 'x'):
620 return 'the value %d' % obj.x
621 else:
622 return None
624 p.persistent_id = persistent_id
626 class Integer:
627 def __init__(self, x):
628 self.x = x
629 def __str__(self):
630 return 'My name is integer %d' % self.x
632 i = Integer(7)
633 print i
634 p.dump(i)
636 datastream = src.getvalue()
637 print repr(datastream)
638 dst = StringIO(datastream)
640 up = pickle.Unpickler(dst)
642 class FancyInteger(Integer):
643 def __str__(self):
644 return 'I am the integer %d' % self.x
646 def persistent_load(persid):
647 if persid.startswith('the value '):
648 value = int(persid.split()[2])
649 return FancyInteger(value)
650 else:
651 raise pickle.UnpicklingError, 'Invalid persistent id'
653 up.persistent_load = persistent_load
655 j = up.load()
656 print j
657 \end{verbatim}
659 In the \module{cPickle} module, the unpickler's
660 \member{persistent_load} attribute can also be set to a Python
661 list, in which case, when the unpickler reaches a persistent id, the
662 persistent id string will simply be appended to this list. This
663 functionality exists so that a pickle data stream can be ``sniffed''
664 for object references without actually instantiating all the objects
665 in a pickle.\footnote{We'll leave you with the image of Guido and Jim
666 sitting around sniffing pickles in their living rooms.} Setting
667 \member{persistent_load} to a list is usually used in conjunction with
668 the \method{noload()} method on the Unpickler.
670 % BAW: Both pickle and cPickle support something called
671 % inst_persistent_id() which appears to give unknown types a second
672 % shot at producing a persistent id. Since Jim Fulton can't remember
673 % why it was added or what it's for, I'm leaving it undocumented.
675 \subsection{Subclassing Unpicklers \label{pickle-sub}}
677 By default, unpickling will import any class that it finds in the
678 pickle data. You can control exactly what gets unpickled and what
679 gets called by customizing your unpickler. Unfortunately, exactly how
680 you do this is different depending on whether you're using
681 \module{pickle} or \module{cPickle}.\footnote{A word of caution: the
682 mechanisms described here use internal attributes and methods, which
683 are subject to change in future versions of Python. We intend to
684 someday provide a common interface for controlling this behavior,
685 which will work in either \module{pickle} or \module{cPickle}.}
687 In the \module{pickle} module, you need to derive a subclass from
688 \class{Unpickler}, overriding the \method{load_global()}
689 method. \method{load_global()} should read two lines from the pickle
690 data stream where the first line will the name of the module
691 containing the class and the second line will be the name of the
692 instance's class. It then looks up the class, possibly importing the
693 module and digging out the attribute, then it appends what it finds to
694 the unpickler's stack. Later on, this class will be assigned to the
695 \member{__class__} attribute of an empty class, as a way of magically
696 creating an instance without calling its class's \method{__init__()}.
697 Your job (should you choose to accept it), would be to have
698 \method{load_global()} push onto the unpickler's stack, a known safe
699 version of any class you deem safe to unpickle. It is up to you to
700 produce such a class. Or you could raise an error if you want to
701 disallow all unpickling of instances. If this sounds like a hack,
702 you're right. Refer to the source code to make this work.
704 Things are a little cleaner with \module{cPickle}, but not by much.
705 To control what gets unpickled, you can set the unpickler's
706 \member{find_global} attribute to a function or \code{None}. If it is
707 \code{None} then any attempts to unpickle instances will raise an
708 \exception{UnpicklingError}. If it is a function,
709 then it should accept a module name and a class name, and return the
710 corresponding class object. It is responsible for looking up the
711 class and performing any necessary imports, and it may raise an
712 error to prevent instances of the class from being unpickled.
714 The moral of the story is that you should be really careful about the
715 source of the strings your application unpickles.
717 \subsection{Example \label{pickle-example}}
719 Here's a simple example of how to modify pickling behavior for a
720 class. The \class{TextReader} class opens a text file, and returns
721 the line number and line contents each time its \method{readline()}
722 method is called. If a \class{TextReader} instance is pickled, all
723 attributes \emph{except} the file object member are saved. When the
724 instance is unpickled, the file is reopened, and reading resumes from
725 the last location. The \method{__setstate__()} and
726 \method{__getstate__()} methods are used to implement this behavior.
728 \begin{verbatim}
729 class TextReader:
730 """Print and number lines in a text file."""
731 def __init__(self, file):
732 self.file = file
733 self.fh = open(file)
734 self.lineno = 0
736 def readline(self):
737 self.lineno = self.lineno + 1
738 line = self.fh.readline()
739 if not line:
740 return None
741 if line.endswith("\n"):
742 line = line[:-1]
743 return "%d: %s" % (self.lineno, line)
745 def __getstate__(self):
746 odict = self.__dict__.copy() # copy the dict since we change it
747 del odict['fh'] # remove filehandle entry
748 return odict
750 def __setstate__(self,dict):
751 fh = open(dict['file']) # reopen file
752 count = dict['lineno'] # read from file...
753 while count: # until line count is restored
754 fh.readline()
755 count = count - 1
756 self.__dict__.update(dict) # update attributes
757 self.fh = fh # save the file object
758 \end{verbatim}
760 A sample usage might be something like this:
762 \begin{verbatim}
763 >>> import TextReader
764 >>> obj = TextReader.TextReader("TextReader.py")
765 >>> obj.readline()
766 '1: #!/usr/local/bin/python'
767 >>> # (more invocations of obj.readline() here)
768 ... obj.readline()
769 '7: class TextReader:'
770 >>> import pickle
771 >>> pickle.dump(obj,open('save.p','w'))
772 \end{verbatim}
774 If you want to see that \refmodule{pickle} works across Python
775 processes, start another Python session, before continuing. What
776 follows can happen from either the same process or a new process.
778 \begin{verbatim}
779 >>> import pickle
780 >>> reader = pickle.load(open('save.p'))
781 >>> reader.readline()
782 '8: "Print and number lines in a text file."'
783 \end{verbatim}
786 \begin{seealso}
787 \seemodule[copyreg]{copy_reg}{Pickle interface constructor
788 registration for extension types.}
790 \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
792 \seemodule{copy}{Shallow and deep object copying.}
794 \seemodule{marshal}{High-performance serialization of built-in types.}
795 \end{seealso}
798 \section{\module{cPickle} --- A faster \module{pickle}}
800 \declaremodule{builtin}{cPickle}
801 \modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
802 \moduleauthor{Jim Fulton}{jim@zope.com}
803 \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
805 The \module{cPickle} module supports serialization and
806 de-serialization of Python objects, providing an interface and
807 functionality nearly identical to the
808 \refmodule{pickle}\refstmodindex{pickle} module. There are several
809 differences, the most important being performance and subclassability.
811 First, \module{cPickle} can be up to 1000 times faster than
812 \module{pickle} because the former is implemented in C. Second, in
813 the \module{cPickle} module the callables \function{Pickler()} and
814 \function{Unpickler()} are functions, not classes. This means that
815 you cannot use them to derive custom pickling and unpickling
816 subclasses. Most applications have no need for this functionality and
817 should benefit from the greatly improved performance of the
818 \module{cPickle} module.
820 The pickle data stream produced by \module{pickle} and
821 \module{cPickle} are identical, so it is possible to use
822 \module{pickle} and \module{cPickle} interchangeably with existing
823 pickles.\footnote{Since the pickle data format is actually a tiny
824 stack-oriented programming language, and some freedom is taken in the
825 encodings of certain objects, it is possible that the two modules
826 produce different data streams for the same input objects. However it
827 is guaranteed that they will always be able to read each other's
828 data streams.}
830 There are additional minor differences in API between \module{cPickle}
831 and \module{pickle}, however for most applications, they are
832 interchangeable. More documentation is provided in the
833 \module{pickle} module documentation, which
834 includes a list of the documented differences.