2 :mod:`collections` --- High-performance container datatypes
3 ===========================================================
5 .. module:: collections
6 :synopsis: High-performance datatypes
7 .. moduleauthor:: Raymond Hettinger <python@rcn.com>
8 .. sectionauthor:: Raymond Hettinger <python@rcn.com>
14 from collections import *
16 __name__ = '<doctest>'
18 This module implements high-performance container datatypes. Currently,
19 there are two datatypes, :class:`deque` and :class:`defaultdict`, and
20 one datatype factory function, :func:`namedtuple`.
22 .. versionchanged:: 2.5
23 Added :class:`defaultdict`.
25 .. versionchanged:: 2.6
26 Added :func:`namedtuple`.
28 The specialized containers provided in this module provide alternatives
29 to Python's general purpose built-in containers, :class:`dict`,
30 :class:`list`, :class:`set`, and :class:`tuple`.
32 Besides the containers provided here, the optional :mod:`bsddb`
33 module offers the ability to create in-memory or file based ordered
34 dictionaries with string keys using the :meth:`bsddb.btopen` method.
36 In addition to containers, the collections module provides some ABCs
37 (abstract base classes) that can be used to test whether a class
38 provides a particular interface, for example, is it hashable or
41 .. versionchanged:: 2.6
42 Added abstract base classes.
44 ABCs - abstract base classes
45 ----------------------------
47 The collections module offers the following ABCs:
49 ========================= ==================== ====================== ====================================================
50 ABC Inherits Abstract Methods Mixin Methods
51 ========================= ==================== ====================== ====================================================
52 :class:`Container` ``__contains__``
53 :class:`Hashable` ``__hash__``
54 :class:`Iterable` ``__iter__``
55 :class:`Iterator` :class:`Iterable` ``__next__`` ``__iter__``
56 :class:`Sized` ``__len__``
58 :class:`Mapping` :class:`Sized`, ``__getitem__``, ``__contains__``, ``keys``, ``items``, ``values``,
59 :class:`Iterable`, ``__len__``. and ``get``, ``__eq__``, and ``__ne__``
60 :class:`Container` ``__iter__``
62 :class:`MutableMapping` :class:`Mapping` ``__getitem__`` Inherited Mapping methods and
63 ``__setitem__``, ``pop``, ``popitem``, ``clear``, ``update``,
64 ``__delitem__``, and ``setdefault``
68 :class:`Sequence` :class:`Sized`, ``__getitem__`` ``__contains__``. ``__iter__``, ``__reversed__``.
69 :class:`Iterable`, and ``__len__`` ``index``, and ``count``
72 :class:`MutableSequnce` :class:`Sequence` ``__getitem__`` Inherited Sequence methods and
73 ``__delitem__``, ``append``, ``reverse``, ``extend``, ``pop``,
74 ``insert``, ``remove``, and ``__iadd__``
77 :class:`Set` :class:`Sized`, ``__len__``, ``__le__``, ``__lt__``, ``__eq__``, ``__ne__``,
78 :class:`Iterable`, ``__iter__``, and ``__gt__``, ``__ge__``, ``__and__``, ``__or__``
79 :class:`Container` ``__contains__`` ``__sub__``, ``__xor__``, and ``isdisjoint``
81 :class:`MutableSet` :class:`Set` ``add`` and Inherited Set methods and
82 ``discard`` ``clear``, ``pop``, ``remove``, ``__ior__``,
83 ``__iand__``, ``__ixor__``, and ``__isub__``
84 ========================= ==================== ====================== ====================================================
86 These ABCs allow us to ask classes or instances if they provide
87 particular functionality, for example::
90 if isinstance(myvar, collections.Sized):
93 Several of the ABCs are also useful as mixins that make it easier to develop
94 classes supporting container APIs. For example, to write a class supporting
95 the full :class:`Set` API, it only necessary to supply the three underlying
96 abstract methods: :meth:`__contains__`, :meth:`__iter__`, and :meth:`__len__`.
97 The ABC supplies the remaining methods such as :meth:`__and__` and
100 class ListBasedSet(collections.Set):
101 ''' Alternate set implementation favoring space over speed
102 and not requiring the set elements to be hashable. '''
103 def __init__(self, iterable):
104 self.elements = lst = []
105 for value in iterable:
109 return iter(self.elements)
110 def __contains__(self, value):
111 return value in self.elements
113 return len(self.elements)
115 s1 = ListBasedSet('abcdef')
116 s2 = ListBasedSet('defghi')
117 overlap = s1 & s2 # The __and__() method is supported automatically
119 Notes on using :class:`Set` and :class:`MutableSet` as a mixin:
122 Since some set operations create new sets, the default mixin methods need
123 a way to create new instances from an iterable. The class constructor is
124 assumed to have a signature in the form ``ClassName(iterable)``.
125 That assumption is factored-out to an internal classmethod called
126 :meth:`_from_iterable` which calls ``cls(iterable)`` to produce a new set.
127 If the :class:`Set` mixin is being used in a class with a different
128 constructor signature, you will need to override :meth:`from_iterable`
129 with a classmethod that can construct new instances from
130 an iterable argument.
133 To override the comparisons (presumably for speed, as the
134 semantics are fixed), redefine :meth:`__le__` and
135 then the other operations will automatically follow suit.
138 The :class:`Set` mixin provides a :meth:`_hash` method to compute a hash value
139 for the set; however, :meth:`__hash__` is not defined because not all sets
140 are hashable or immutable. To add set hashabilty using mixins,
141 inherit from both :meth:`Set` and :meth:`Hashable`, then define
142 ``__hash__ = Set._hash``.
144 (For more about ABCs, see the :mod:`abc` module and :pep:`3119`.)
150 :class:`deque` objects
151 ----------------------
154 .. class:: deque([iterable[, maxlen]])
156 Returns a new deque object initialized left-to-right (using :meth:`append`) with
157 data from *iterable*. If *iterable* is not specified, the new deque is empty.
159 Deques are a generalization of stacks and queues (the name is pronounced "deck"
160 and is short for "double-ended queue"). Deques support thread-safe, memory
161 efficient appends and pops from either side of the deque with approximately the
162 same O(1) performance in either direction.
164 Though :class:`list` objects support similar operations, they are optimized for
165 fast fixed-length operations and incur O(n) memory movement costs for
166 ``pop(0)`` and ``insert(0, v)`` operations which change both the size and
167 position of the underlying data representation.
169 .. versionadded:: 2.4
171 If *maxlen* is not specified or is *None*, deques may grow to an
172 arbitrary length. Otherwise, the deque is bounded to the specified maximum
173 length. Once a bounded length deque is full, when new items are added, a
174 corresponding number of items are discarded from the opposite end. Bounded
175 length deques provide functionality similar to the ``tail`` filter in
176 Unix. They are also useful for tracking transactions and other pools of data
177 where only the most recent activity is of interest.
179 .. versionchanged:: 2.6
180 Added *maxlen* parameter.
182 Deque objects support the following methods:
185 .. method:: append(x)
187 Add *x* to the right side of the deque.
190 .. method:: appendleft(x)
192 Add *x* to the left side of the deque.
197 Remove all elements from the deque leaving it with length 0.
200 .. method:: extend(iterable)
202 Extend the right side of the deque by appending elements from the iterable
206 .. method:: extendleft(iterable)
208 Extend the left side of the deque by appending elements from *iterable*.
209 Note, the series of left appends results in reversing the order of
210 elements in the iterable argument.
215 Remove and return an element from the right side of the deque. If no
216 elements are present, raises an :exc:`IndexError`.
219 .. method:: popleft()
221 Remove and return an element from the left side of the deque. If no
222 elements are present, raises an :exc:`IndexError`.
225 .. method:: remove(value)
227 Removed the first occurrence of *value*. If not found, raises a
230 .. versionadded:: 2.5
233 .. method:: rotate(n)
235 Rotate the deque *n* steps to the right. If *n* is negative, rotate to
236 the left. Rotating one step to the right is equivalent to:
237 ``d.appendleft(d.pop())``.
240 In addition to the above, deques support iteration, pickling, ``len(d)``,
241 ``reversed(d)``, ``copy.copy(d)``, ``copy.deepcopy(d)``, membership testing with
242 the :keyword:`in` operator, and subscript references such as ``d[-1]``.
248 >>> from collections import deque
249 >>> d = deque('ghi') # make a new deque with three items
250 >>> for elem in d: # iterate over the deque's elements
251 ... print elem.upper()
256 >>> d.append('j') # add a new entry to the right side
257 >>> d.appendleft('f') # add a new entry to the left side
258 >>> d # show the representation of the deque
259 deque(['f', 'g', 'h', 'i', 'j'])
261 >>> d.pop() # return and remove the rightmost item
263 >>> d.popleft() # return and remove the leftmost item
265 >>> list(d) # list the contents of the deque
267 >>> d[0] # peek at leftmost item
269 >>> d[-1] # peek at rightmost item
272 >>> list(reversed(d)) # list the contents of a deque in reverse
274 >>> 'h' in d # search the deque
276 >>> d.extend('jkl') # add multiple elements at once
278 deque(['g', 'h', 'i', 'j', 'k', 'l'])
279 >>> d.rotate(1) # right rotation
281 deque(['l', 'g', 'h', 'i', 'j', 'k'])
282 >>> d.rotate(-1) # left rotation
284 deque(['g', 'h', 'i', 'j', 'k', 'l'])
286 >>> deque(reversed(d)) # make a new deque in reverse order
287 deque(['l', 'k', 'j', 'i', 'h', 'g'])
288 >>> d.clear() # empty the deque
289 >>> d.pop() # cannot pop from an empty deque
290 Traceback (most recent call last):
291 File "<pyshell#6>", line 1, in -toplevel-
293 IndexError: pop from an empty deque
295 >>> d.extendleft('abc') # extendleft() reverses the input order
297 deque(['c', 'b', 'a'])
302 :class:`deque` Recipes
303 ^^^^^^^^^^^^^^^^^^^^^^
305 This section shows various approaches to working with deques.
307 The :meth:`rotate` method provides a way to implement :class:`deque` slicing and
308 deletion. For example, a pure python implementation of ``del d[n]`` relies on
309 the :meth:`rotate` method to position elements to be popped::
311 def delete_nth(d, n):
316 To implement :class:`deque` slicing, use a similar approach applying
317 :meth:`rotate` to bring a target element to the left side of the deque. Remove
318 old entries with :meth:`popleft`, add new entries with :meth:`extend`, and then
319 reverse the rotation.
320 With minor variations on that approach, it is easy to implement Forth style
321 stack manipulations such as ``dup``, ``drop``, ``swap``, ``over``, ``pick``,
322 ``rot``, and ``roll``.
324 Multi-pass data reduction algorithms can be succinctly expressed and efficiently
325 coded by extracting elements with multiple calls to :meth:`popleft`, applying
326 a reduction function, and calling :meth:`append` to add the result back to the
329 For example, building a balanced binary tree of nested lists entails reducing
330 two adjacent nodes into one by grouping them in a list:
332 >>> def maketree(iterable):
333 ... d = deque(iterable)
334 ... while len(d) > 1:
335 ... pair = [d.popleft(), d.popleft()]
339 >>> print maketree('abcdefgh')
340 [[[['a', 'b'], ['c', 'd']], [['e', 'f'], ['g', 'h']]]]
342 Bounded length deques provide functionality similar to the ``tail`` filter
345 def tail(filename, n=10):
346 'Return the last n lines of a file'
347 return deque(open(filename), n)
349 .. _defaultdict-objects:
351 :class:`defaultdict` objects
352 ----------------------------
355 .. class:: defaultdict([default_factory[, ...]])
357 Returns a new dictionary-like object. :class:`defaultdict` is a subclass of the
358 builtin :class:`dict` class. It overrides one method and adds one writable
359 instance variable. The remaining functionality is the same as for the
360 :class:`dict` class and is not documented here.
362 The first argument provides the initial value for the :attr:`default_factory`
363 attribute; it defaults to ``None``. All remaining arguments are treated the same
364 as if they were passed to the :class:`dict` constructor, including keyword
367 .. versionadded:: 2.5
369 :class:`defaultdict` objects support the following method in addition to the
370 standard :class:`dict` operations:
373 .. method:: defaultdict.__missing__(key)
375 If the :attr:`default_factory` attribute is ``None``, this raises an
376 :exc:`KeyError` exception with the *key* as argument.
378 If :attr:`default_factory` is not ``None``, it is called without arguments
379 to provide a default value for the given *key*, this value is inserted in
380 the dictionary for the *key*, and returned.
382 If calling :attr:`default_factory` raises an exception this exception is
383 propagated unchanged.
385 This method is called by the :meth:`__getitem__` method of the
386 :class:`dict` class when the requested key is not found; whatever it
387 returns or raises is then returned or raised by :meth:`__getitem__`.
390 :class:`defaultdict` objects support the following instance variable:
393 .. attribute:: defaultdict.default_factory
395 This attribute is used by the :meth:`__missing__` method; it is
396 initialized from the first argument to the constructor, if present, or to
400 .. _defaultdict-examples:
402 :class:`defaultdict` Examples
403 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
405 Using :class:`list` as the :attr:`default_factory`, it is easy to group a
406 sequence of key-value pairs into a dictionary of lists:
408 >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
409 >>> d = defaultdict(list)
414 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
416 When each key is encountered for the first time, it is not already in the
417 mapping; so an entry is automatically created using the :attr:`default_factory`
418 function which returns an empty :class:`list`. The :meth:`list.append`
419 operation then attaches the value to the new list. When keys are encountered
420 again, the look-up proceeds normally (returning the list for that key) and the
421 :meth:`list.append` operation adds another value to the list. This technique is
422 simpler and faster than an equivalent technique using :meth:`dict.setdefault`:
426 ... d.setdefault(k, []).append(v)
429 [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
431 Setting the :attr:`default_factory` to :class:`int` makes the
432 :class:`defaultdict` useful for counting (like a bag or multiset in other
435 >>> s = 'mississippi'
436 >>> d = defaultdict(int)
441 [('i', 4), ('p', 2), ('s', 4), ('m', 1)]
443 When a letter is first encountered, it is missing from the mapping, so the
444 :attr:`default_factory` function calls :func:`int` to supply a default count of
445 zero. The increment operation then builds up the count for each letter.
447 The function :func:`int` which always returns zero is just a special case of
448 constant functions. A faster and more flexible way to create constant functions
449 is to use :func:`itertools.repeat` which can supply any constant value (not just
452 >>> def constant_factory(value):
453 ... return itertools.repeat(value).next
454 >>> d = defaultdict(constant_factory('<missing>'))
455 >>> d.update(name='John', action='ran')
456 >>> '%(name)s %(action)s to %(object)s' % d
457 'John ran to <missing>'
459 Setting the :attr:`default_factory` to :class:`set` makes the
460 :class:`defaultdict` useful for building a dictionary of sets:
462 >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]
463 >>> d = defaultdict(set)
468 [('blue', set([2, 4])), ('red', set([1, 3]))]
471 .. _named-tuple-factory:
473 :func:`namedtuple` Factory Function for Tuples with Named Fields
474 ----------------------------------------------------------------
476 Named tuples assign meaning to each position in a tuple and allow for more readable,
477 self-documenting code. They can be used wherever regular tuples are used, and
478 they add the ability to access fields by name instead of position index.
480 .. function:: namedtuple(typename, fieldnames, [verbose])
482 Returns a new tuple subclass named *typename*. The new subclass is used to
483 create tuple-like objects that have fields accessible by attribute lookup as
484 well as being indexable and iterable. Instances of the subclass also have a
485 helpful docstring (with typename and fieldnames) and a helpful :meth:`__repr__`
486 method which lists the tuple contents in a ``name=value`` format.
488 The *fieldnames* are a single string with each fieldname separated by whitespace
489 and/or commas, for example ``'x y'`` or ``'x, y'``. Alternatively, *fieldnames*
490 can be a sequence of strings such as ``['x', 'y']``.
492 Any valid Python identifier may be used for a fieldname except for names
493 starting with an underscore. Valid identifiers consist of letters, digits,
494 and underscores but do not start with a digit or underscore and cannot be
495 a :mod:`keyword` such as *class*, *for*, *return*, *global*, *pass*, *print*,
498 If *verbose* is true, the class definition is printed just before being built.
500 Named tuple instances do not have per-instance dictionaries, so they are
501 lightweight and require no more memory than regular tuples.
503 .. versionadded:: 2.6
508 :options: +NORMALIZE_WHITESPACE
510 >>> Point = namedtuple('Point', 'x y', verbose=True)
518 def __new__(cls, x, y):
519 return tuple.__new__(cls, (x, y))
522 def _make(cls, iterable, new=tuple.__new__, len=len):
523 'Make a new Point object from a sequence or iterable'
524 result = new(cls, iterable)
526 raise TypeError('Expected 2 arguments, got %d' % len(result))
530 return 'Point(x=%r, y=%r)' % self
533 'Return a new dict which maps field names to their values'
534 return {'x': t[0], 'y': t[1]}
536 def _replace(self, **kwds):
537 'Return a new Point object replacing specified fields with new values'
538 result = self._make(map(kwds.pop, ('x', 'y'), self))
540 raise ValueError('Got unexpected field names: %r' % kwds.keys())
543 def __getnewargs__(self):
546 x = property(itemgetter(0))
547 y = property(itemgetter(1))
549 >>> p = Point(11, y=22) # instantiate with positional or keyword arguments
550 >>> p[0] + p[1] # indexable like the plain tuple (11, 22)
552 >>> x, y = p # unpack like a regular tuple
555 >>> p.x + p.y # fields also accessible by name
557 >>> p # readable __repr__ with a name=value style
560 Named tuples are especially useful for assigning field names to result tuples returned
561 by the :mod:`csv` or :mod:`sqlite3` modules::
563 EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')
566 for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))):
567 print emp.name, emp.title
570 conn = sqlite3.connect('/companydata')
571 cursor = conn.cursor()
572 cursor.execute('SELECT name, age, title, department, paygrade FROM employees')
573 for emp in map(EmployeeRecord._make, cursor.fetchall()):
574 print emp.name, emp.title
576 In addition to the methods inherited from tuples, named tuples support
577 three additional methods and one attribute. To prevent conflicts with
578 field names, the method and attribute names start with an underscore.
580 .. method:: somenamedtuple._make(iterable)
582 Class method that makes a new instance from an existing sequence or iterable.
590 .. method:: somenamedtuple._asdict()
592 Return a new dict which maps field names to their corresponding values::
597 .. method:: somenamedtuple._replace(kwargs)
599 Return a new instance of the named tuple replacing specified fields with new
604 >>> p = Point(x=11, y=22)
608 >>> for partnum, record in inventory.items():
609 ... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())
611 .. attribute:: somenamedtuple._fields
613 Tuple of strings listing the field names. Useful for introspection
614 and for creating new named tuple types from existing named tuples.
618 >>> p._fields # view the field names
621 >>> Color = namedtuple('Color', 'red green blue')
622 >>> Pixel = namedtuple('Pixel', Point._fields + Color._fields)
623 >>> Pixel(11, 22, 128, 255, 0)
624 Pixel(x=11, y=22, red=128, green=255, blue=0)
626 To retrieve a field whose name is stored in a string, use the :func:`getattr`
632 To convert a dictionary to a named tuple, use the double-star-operator [#]_:
634 >>> d = {'x': 11, 'y': 22}
638 Since a named tuple is a regular Python class, it is easy to add or change
639 functionality with a subclass. Here is how to add a calculated field and
640 a fixed-width print format:
642 >>> class Point(namedtuple('Point', 'x y')):
646 ... return (self.x ** 2 + self.y ** 2) ** 0.5
647 ... def __str__(self):
648 ... return 'Point: x=%6.3f y=%6.3f hypot=%6.3f' % (self.x, self.y, self.hypot)
650 >>> for p in Point(3, 4), Point(14, 5/7.):
652 Point: x= 3.000 y= 4.000 hypot= 5.000
653 Point: x=14.000 y= 0.714 hypot=14.018
655 The subclass shown above sets ``__slots__`` to an empty tuple. This keeps
656 keep memory requirements low by preventing the creation of instance dictionaries.
658 Subclassing is not useful for adding new, stored fields. Instead, simply
659 create a new named tuple type from the :attr:`_fields` attribute:
661 >>> Point3D = namedtuple('Point3D', Point._fields + ('z',))
663 Default values can be implemented by using :meth:`_replace` to
664 customize a prototype instance:
666 >>> Account = namedtuple('Account', 'owner balance transaction_count')
667 >>> default_account = Account('<owner name>', 0.0, 0)
668 >>> johns_account = default_account._replace(owner='John')
670 Enumerated constants can be implemented with named tuples, but it is simpler
671 and more efficient to use a simple class declaration:
673 >>> Status = namedtuple('Status', 'open pending closed')._make(range(3))
674 >>> Status.open, Status.pending, Status.closed
677 ... open, pending, closed = range(3)
679 .. rubric:: Footnotes
681 .. [#] For information on the double-star-operator see
682 :ref:`tut-unpacking-arguments` and :ref:`calls`.