2 /* Dictionary object implementation using a hash table */
4 /* The distribution includes a separate file, Objects/dictnotes.txt,
5 describing explorations into dictionary design and optimization.
6 It covers typical dictionary use patterns, the parameters for
7 tuning dictionaries, and several ideas for possible optimizations.
13 /* Set a key error with the specified argument, wrapping it in a
14 * tuple automatically so that tuple keys are not unpacked as the
15 * exception arguments. */
17 set_key_error(PyObject
*arg
)
20 tup
= PyTuple_Pack(1, arg
);
22 return; /* caller will expect error to be set anyway */
23 PyErr_SetObject(PyExc_KeyError
, tup
);
27 /* Define this out if you don't want conversion statistics on exit. */
28 #undef SHOW_CONVERSION_COUNTS
30 /* See large comment block below. This must be >= 1. */
31 #define PERTURB_SHIFT 5
34 Major subtleties ahead: Most hash schemes depend on having a "good" hash
35 function, in the sense of simulating randomness. Python doesn't: its most
36 important hash functions (for strings and ints) are very regular in common
39 >>> map(hash, (0, 1, 2, 3))
41 >>> map(hash, ("namea", "nameb", "namec", "named"))
42 [-1658398457, -1658398460, -1658398459, -1658398462]
45 This isn't necessarily bad! To the contrary, in a table of size 2**i, taking
46 the low-order i bits as the initial table index is extremely fast, and there
47 are no collisions at all for dicts indexed by a contiguous range of ints.
48 The same is approximately true when keys are "consecutive" strings. So this
49 gives better-than-random behavior in common cases, and that's very desirable.
51 OTOH, when collisions occur, the tendency to fill contiguous slices of the
52 hash table makes a good collision resolution strategy crucial. Taking only
53 the last i bits of the hash code is also vulnerable: for example, consider
54 [i << 16 for i in range(20000)] as a set of keys. Since ints are their own
55 hash codes, and this fits in a dict of size 2**15, the last 15 bits of every
56 hash code are all 0: they *all* map to the same table index.
58 But catering to unusual cases should not slow the usual ones, so we just take
59 the last i bits anyway. It's up to collision resolution to do the rest. If
60 we *usually* find the key we're looking for on the first try (and, it turns
61 out, we usually do -- the table load factor is kept under 2/3, so the odds
62 are solidly in our favor), then it makes best sense to keep the initial index
63 computation dirt cheap.
65 The first half of collision resolution is to visit table indices via this
68 j = ((5*j) + 1) mod 2**i
70 For any initial j in range(2**i), repeating that 2**i times generates each
71 int in range(2**i) exactly once (see any text on random-number generation for
72 proof). By itself, this doesn't help much: like linear probing (setting
73 j += 1, or j -= 1, on each loop trip), it scans the table entries in a fixed
74 order. This would be bad, except that's not the only thing we do, and it's
75 actually *good* in the common cases where hash keys are consecutive. In an
76 example that's really too small to make this entirely clear, for a table of
77 size 2**3 the order of indices is:
79 0 -> 1 -> 6 -> 7 -> 4 -> 5 -> 2 -> 3 -> 0 [and here it's repeating]
81 If two things come in at index 5, the first place we look after is index 2,
82 not 6, so if another comes in at index 6 the collision at 5 didn't hurt it.
83 Linear probing is deadly in this case because there the fixed probe order
84 is the *same* as the order consecutive keys are likely to arrive. But it's
85 extremely unlikely hash codes will follow a 5*j+1 recurrence by accident,
86 and certain that consecutive hash codes do not.
88 The other half of the strategy is to get the other bits of the hash code
89 into play. This is done by initializing a (unsigned) vrbl "perturb" to the
90 full hash code, and changing the recurrence to:
92 j = (5*j) + 1 + perturb;
93 perturb >>= PERTURB_SHIFT;
94 use j % 2**i as the next table index;
96 Now the probe sequence depends (eventually) on every bit in the hash code,
97 and the pseudo-scrambling property of recurring on 5*j+1 is more valuable,
98 because it quickly magnifies small differences in the bits that didn't affect
99 the initial index. Note that because perturb is unsigned, if the recurrence
100 is executed often enough perturb eventually becomes and remains 0. At that
101 point (very rarely reached) the recurrence is on (just) 5*j+1 again, and
102 that's certain to find an empty slot eventually (since it generates every int
103 in range(2**i), and we make sure there's always at least one empty slot).
105 Selecting a good value for PERTURB_SHIFT is a balancing act. You want it
106 small so that the high bits of the hash code continue to affect the probe
107 sequence across iterations; but you want it large so that in really bad cases
108 the high-order hash bits have an effect on early iterations. 5 was "the
109 best" in minimizing total collisions across experiments Tim Peters ran (on
110 both normal and pathological cases), but 4 and 6 weren't significantly worse.
112 Historical: Reimer Behrends contributed the idea of using a polynomial-based
113 approach, using repeated multiplication by x in GF(2**n) where an irreducible
114 polynomial for each table size was chosen such that x was a primitive root.
115 Christian Tismer later extended that to use division by x instead, as an
116 efficient way to get the high bits of the hash code into play. This scheme
117 also gave excellent collision statistics, but was more expensive: two
118 if-tests were required inside the loop; computing "the next" index took about
119 the same number of operations but without as much potential parallelism
120 (e.g., computing 5*j can go on at the same time as computing 1+perturb in the
121 above, and then shifting perturb can be done while the table index is being
122 masked); and the PyDictObject struct required a member to hold the table's
123 polynomial. In Tim's experiments the current scheme ran faster, produced
124 equally good collision statistics, needed less code & used less memory.
126 Theoretical Python 2.5 headache: hash codes are only C "long", but
127 sizeof(Py_ssize_t) > sizeof(long) may be possible. In that case, and if a
128 dict is genuinely huge, then only the slots directly reachable via indexing
129 by a C long can be the first slot in a probe sequence. The probe sequence
130 will still eventually reach every slot in the table, but the collision rate
131 on initial probes may be much higher than this scheme was designed for.
132 Getting a hash code as fat as Py_ssize_t is the only real cure. But in
133 practice, this probably won't make a lick of difference for many years (at
134 which point everyone will have terabytes of RAM on 64-bit boxes).
137 /* Object used as dummy key to fill deleted entries */
138 static PyObject
*dummy
= NULL
; /* Initialized by first call to newPyDictObject() */
148 /* forward declarations */
150 lookdict_string(PyDictObject
*mp
, PyObject
*key
, long hash
);
152 #ifdef SHOW_CONVERSION_COUNTS
153 static long created
= 0L;
154 static long converted
= 0L;
159 fprintf(stderr
, "created %ld string dicts\n", created
);
160 fprintf(stderr
, "converted %ld to normal dicts\n", converted
);
161 fprintf(stderr
, "%.2f%% conversion rate\n", (100.0*converted
)/created
);
165 /* Debug statistic to compare allocations with reuse through the free list */
166 #undef SHOW_ALLOC_COUNT
167 #ifdef SHOW_ALLOC_COUNT
168 static size_t count_alloc
= 0;
169 static size_t count_reuse
= 0;
174 fprintf(stderr
, "Dict allocations: %" PY_FORMAT_SIZE_T
"d\n",
176 fprintf(stderr
, "Dict reuse through freelist: %" PY_FORMAT_SIZE_T
178 fprintf(stderr
, "%.2f%% reuse rate\n\n",
179 (100.0*count_reuse
/(count_alloc
+count_reuse
)));
183 /* Debug statistic to count GC tracking of dicts */
184 #ifdef SHOW_TRACK_COUNT
185 static Py_ssize_t count_untracked
= 0;
186 static Py_ssize_t count_tracked
= 0;
191 fprintf(stderr
, "Dicts created: %" PY_FORMAT_SIZE_T
"d\n",
192 count_tracked
+ count_untracked
);
193 fprintf(stderr
, "Dicts tracked by the GC: %" PY_FORMAT_SIZE_T
194 "d\n", count_tracked
);
195 fprintf(stderr
, "%.2f%% dict tracking rate\n\n",
196 (100.0*count_tracked
/(count_untracked
+count_tracked
)));
201 /* Initialization macros.
202 There are two ways to create a dict: PyDict_New() is the main C API
203 function, and the tp_new slot maps to dict_new(). In the latter case we
204 can save a little time over what PyDict_New does because it's guaranteed
205 that the PyDictObject struct is already zeroed out.
206 Everyone except dict_new() should use EMPTY_TO_MINSIZE (unless they have
207 an excellent reason not to).
210 #define INIT_NONZERO_DICT_SLOTS(mp) do { \
211 (mp)->ma_table = (mp)->ma_smalltable; \
212 (mp)->ma_mask = PyDict_MINSIZE - 1; \
215 #define EMPTY_TO_MINSIZE(mp) do { \
216 memset((mp)->ma_smalltable, 0, sizeof((mp)->ma_smalltable)); \
217 (mp)->ma_used = (mp)->ma_fill = 0; \
218 INIT_NONZERO_DICT_SLOTS(mp); \
221 /* Dictionary reuse scheme to save calls to malloc, free, and memset */
222 #ifndef PyDict_MAXFREELIST
223 #define PyDict_MAXFREELIST 80
225 static PyDictObject
*free_list
[PyDict_MAXFREELIST
];
226 static int numfree
= 0;
234 op
= free_list
[--numfree
];
235 assert(PyDict_CheckExact(op
));
243 register PyDictObject
*mp
;
244 if (dummy
== NULL
) { /* Auto-initialize dummy */
245 dummy
= PyString_FromString("<dummy key>");
248 #ifdef SHOW_CONVERSION_COUNTS
249 Py_AtExit(show_counts
);
251 #ifdef SHOW_ALLOC_COUNT
252 Py_AtExit(show_alloc
);
254 #ifdef SHOW_TRACK_COUNT
255 Py_AtExit(show_track
);
259 mp
= free_list
[--numfree
];
261 assert (Py_TYPE(mp
) == &PyDict_Type
);
262 _Py_NewReference((PyObject
*)mp
);
264 EMPTY_TO_MINSIZE(mp
);
266 /* At least set ma_table and ma_mask; these are wrong
267 if an empty but presized dict is added to freelist */
268 INIT_NONZERO_DICT_SLOTS(mp
);
270 assert (mp
->ma_used
== 0);
271 assert (mp
->ma_table
== mp
->ma_smalltable
);
272 assert (mp
->ma_mask
== PyDict_MINSIZE
- 1);
273 #ifdef SHOW_ALLOC_COUNT
277 mp
= PyObject_GC_New(PyDictObject
, &PyDict_Type
);
280 EMPTY_TO_MINSIZE(mp
);
281 #ifdef SHOW_ALLOC_COUNT
285 mp
->ma_lookup
= lookdict_string
;
286 #ifdef SHOW_TRACK_COUNT
289 #ifdef SHOW_CONVERSION_COUNTS
292 return (PyObject
*)mp
;
296 The basic lookup function used by all operations.
297 This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
298 Open addressing is preferred over chaining since the link overhead for
299 chaining would be substantial (100% with typical malloc overhead).
301 The initial probe index is computed as hash mod the table size. Subsequent
302 probe indices are computed as explained earlier.
304 All arithmetic on hash should ignore overflow.
306 (The details in this version are due to Tim Peters, building on many past
307 contributions by Reimer Behrends, Jyrki Alakuijala, Vladimir Marangozov and
310 lookdict() is general-purpose, and may return NULL if (and only if) a
311 comparison raises an exception (this was new in Python 2.5).
312 lookdict_string() below is specialized to string keys, comparison of which can
313 never raise an exception; that function can never return NULL. For both, when
314 the key isn't found a PyDictEntry* is returned for which the me_value field is
315 NULL; this is the slot in the dict at which the key would have been found, and
316 the caller can (if it wishes) add the <key, value> pair to the returned
320 lookdict(PyDictObject
*mp
, PyObject
*key
, register long hash
)
323 register size_t perturb
;
324 register PyDictEntry
*freeslot
;
325 register size_t mask
= (size_t)mp
->ma_mask
;
326 PyDictEntry
*ep0
= mp
->ma_table
;
327 register PyDictEntry
*ep
;
331 i
= (size_t)hash
& mask
;
333 if (ep
->me_key
== NULL
|| ep
->me_key
== key
)
336 if (ep
->me_key
== dummy
)
339 if (ep
->me_hash
== hash
) {
340 startkey
= ep
->me_key
;
342 cmp
= PyObject_RichCompareBool(startkey
, key
, Py_EQ
);
346 if (ep0
== mp
->ma_table
&& ep
->me_key
== startkey
) {
351 /* The compare did major nasty stuff to the
353 * XXX A clever adversary could prevent this
354 * XXX from terminating.
356 return lookdict(mp
, key
, hash
);
362 /* In the loop, me_key == dummy is by far (factor of 100s) the
363 least likely outcome, so test for that last. */
364 for (perturb
= hash
; ; perturb
>>= PERTURB_SHIFT
) {
365 i
= (i
<< 2) + i
+ perturb
+ 1;
367 if (ep
->me_key
== NULL
)
368 return freeslot
== NULL
? ep
: freeslot
;
369 if (ep
->me_key
== key
)
371 if (ep
->me_hash
== hash
&& ep
->me_key
!= dummy
) {
372 startkey
= ep
->me_key
;
374 cmp
= PyObject_RichCompareBool(startkey
, key
, Py_EQ
);
378 if (ep0
== mp
->ma_table
&& ep
->me_key
== startkey
) {
383 /* The compare did major nasty stuff to the
385 * XXX A clever adversary could prevent this
386 * XXX from terminating.
388 return lookdict(mp
, key
, hash
);
391 else if (ep
->me_key
== dummy
&& freeslot
== NULL
)
394 assert(0); /* NOT REACHED */
399 * Hacked up version of lookdict which can assume keys are always strings;
400 * this assumption allows testing for errors during PyObject_RichCompareBool()
401 * to be dropped; string-string comparisons never raise exceptions. This also
402 * means we don't need to go through PyObject_RichCompareBool(); we can always
403 * use _PyString_Eq() directly.
405 * This is valuable because dicts with only string keys are very common.
408 lookdict_string(PyDictObject
*mp
, PyObject
*key
, register long hash
)
411 register size_t perturb
;
412 register PyDictEntry
*freeslot
;
413 register size_t mask
= (size_t)mp
->ma_mask
;
414 PyDictEntry
*ep0
= mp
->ma_table
;
415 register PyDictEntry
*ep
;
417 /* Make sure this function doesn't have to handle non-string keys,
418 including subclasses of str; e.g., one reason to subclass
419 strings is to override __eq__, and for speed we don't cater to
421 if (!PyString_CheckExact(key
)) {
422 #ifdef SHOW_CONVERSION_COUNTS
425 mp
->ma_lookup
= lookdict
;
426 return lookdict(mp
, key
, hash
);
430 if (ep
->me_key
== NULL
|| ep
->me_key
== key
)
432 if (ep
->me_key
== dummy
)
435 if (ep
->me_hash
== hash
&& _PyString_Eq(ep
->me_key
, key
))
440 /* In the loop, me_key == dummy is by far (factor of 100s) the
441 least likely outcome, so test for that last. */
442 for (perturb
= hash
; ; perturb
>>= PERTURB_SHIFT
) {
443 i
= (i
<< 2) + i
+ perturb
+ 1;
445 if (ep
->me_key
== NULL
)
446 return freeslot
== NULL
? ep
: freeslot
;
447 if (ep
->me_key
== key
448 || (ep
->me_hash
== hash
449 && ep
->me_key
!= dummy
450 && _PyString_Eq(ep
->me_key
, key
)))
452 if (ep
->me_key
== dummy
&& freeslot
== NULL
)
455 assert(0); /* NOT REACHED */
459 #ifdef SHOW_TRACK_COUNT
460 #define INCREASE_TRACK_COUNT \
461 (count_tracked++, count_untracked--);
462 #define DECREASE_TRACK_COUNT \
463 (count_tracked--, count_untracked++);
465 #define INCREASE_TRACK_COUNT
466 #define DECREASE_TRACK_COUNT
469 #define MAINTAIN_TRACKING(mp, key, value) \
471 if (!_PyObject_GC_IS_TRACKED(mp)) { \
472 if (_PyObject_GC_MAY_BE_TRACKED(key) || \
473 _PyObject_GC_MAY_BE_TRACKED(value)) { \
474 _PyObject_GC_TRACK(mp); \
475 INCREASE_TRACK_COUNT \
481 _PyDict_MaybeUntrack(PyObject
*op
)
488 if (!PyDict_CheckExact(op
) || !_PyObject_GC_IS_TRACKED(op
))
491 mp
= (PyDictObject
*) op
;
494 for (i
= 0; i
<= mask
; i
++) {
495 if ((value
= ep
[i
].me_value
) == NULL
)
497 if (_PyObject_GC_MAY_BE_TRACKED(value
) ||
498 _PyObject_GC_MAY_BE_TRACKED(ep
[i
].me_key
))
502 _PyObject_GC_UNTRACK(op
);
507 Internal routine to insert a new item into the table.
508 Used both by the internal resize routine and by the public insert routine.
509 Eats a reference to key and one to value.
510 Returns -1 if an error occurred, or 0 on success.
513 insertdict(register PyDictObject
*mp
, PyObject
*key
, long hash
, PyObject
*value
)
516 register PyDictEntry
*ep
;
517 typedef PyDictEntry
*(*lookupfunc
)(PyDictObject
*, PyObject
*, long);
519 assert(mp
->ma_lookup
!= NULL
);
520 ep
= mp
->ma_lookup(mp
, key
, hash
);
526 MAINTAIN_TRACKING(mp
, key
, value
);
527 if (ep
->me_value
!= NULL
) {
528 old_value
= ep
->me_value
;
529 ep
->me_value
= value
;
530 Py_DECREF(old_value
); /* which **CAN** re-enter */
534 if (ep
->me_key
== NULL
)
537 assert(ep
->me_key
== dummy
);
541 ep
->me_hash
= (Py_ssize_t
)hash
;
542 ep
->me_value
= value
;
549 Internal routine used by dictresize() to insert an item which is
550 known to be absent from the dict. This routine also assumes that
551 the dict contains no deleted entries. Besides the performance benefit,
552 using insertdict() in dictresize() is dangerous (SF bug #1456209).
553 Note that no refcounts are changed by this routine; if needed, the caller
554 is responsible for incref'ing `key` and `value`.
557 insertdict_clean(register PyDictObject
*mp
, PyObject
*key
, long hash
,
561 register size_t perturb
;
562 register size_t mask
= (size_t)mp
->ma_mask
;
563 PyDictEntry
*ep0
= mp
->ma_table
;
564 register PyDictEntry
*ep
;
566 MAINTAIN_TRACKING(mp
, key
, value
);
569 for (perturb
= hash
; ep
->me_key
!= NULL
; perturb
>>= PERTURB_SHIFT
) {
570 i
= (i
<< 2) + i
+ perturb
+ 1;
573 assert(ep
->me_value
== NULL
);
576 ep
->me_hash
= (Py_ssize_t
)hash
;
577 ep
->me_value
= value
;
582 Restructure the table by allocating a new table and reinserting all
583 items again. When entries have been deleted, the new table may
584 actually be smaller than the old one.
587 dictresize(PyDictObject
*mp
, Py_ssize_t minused
)
590 PyDictEntry
*oldtable
, *newtable
, *ep
;
592 int is_oldtable_malloced
;
593 PyDictEntry small_copy
[PyDict_MINSIZE
];
595 assert(minused
>= 0);
597 /* Find the smallest table size > minused. */
598 for (newsize
= PyDict_MINSIZE
;
599 newsize
<= minused
&& newsize
> 0;
607 /* Get space for a new table. */
608 oldtable
= mp
->ma_table
;
609 assert(oldtable
!= NULL
);
610 is_oldtable_malloced
= oldtable
!= mp
->ma_smalltable
;
612 if (newsize
== PyDict_MINSIZE
) {
613 /* A large table is shrinking, or we can't get any smaller. */
614 newtable
= mp
->ma_smalltable
;
615 if (newtable
== oldtable
) {
616 if (mp
->ma_fill
== mp
->ma_used
) {
617 /* No dummies, so no point doing anything. */
620 /* We're not going to resize it, but rebuild the
621 table anyway to purge old dummy entries.
622 Subtle: This is *necessary* if fill==size,
623 as lookdict needs at least one virgin slot to
624 terminate failing searches. If fill < size, it's
625 merely desirable, as dummies slow searches. */
626 assert(mp
->ma_fill
> mp
->ma_used
);
627 memcpy(small_copy
, oldtable
, sizeof(small_copy
));
628 oldtable
= small_copy
;
632 newtable
= PyMem_NEW(PyDictEntry
, newsize
);
633 if (newtable
== NULL
) {
639 /* Make the dict empty, using the new table. */
640 assert(newtable
!= oldtable
);
641 mp
->ma_table
= newtable
;
642 mp
->ma_mask
= newsize
- 1;
643 memset(newtable
, 0, sizeof(PyDictEntry
) * newsize
);
648 /* Copy the data over; this is refcount-neutral for active entries;
649 dummy entries aren't copied over, of course */
650 for (ep
= oldtable
; i
> 0; ep
++) {
651 if (ep
->me_value
!= NULL
) { /* active entry */
653 insertdict_clean(mp
, ep
->me_key
, (long)ep
->me_hash
,
656 else if (ep
->me_key
!= NULL
) { /* dummy entry */
658 assert(ep
->me_key
== dummy
);
659 Py_DECREF(ep
->me_key
);
661 /* else key == value == NULL: nothing to do */
664 if (is_oldtable_malloced
)
669 /* Create a new dictionary pre-sized to hold an estimated number of elements.
670 Underestimates are okay because the dictionary will resize as necessary.
671 Overestimates just mean the dictionary will be more sparse than usual.
675 _PyDict_NewPresized(Py_ssize_t minused
)
677 PyObject
*op
= PyDict_New();
679 if (minused
>5 && op
!= NULL
&& dictresize((PyDictObject
*)op
, minused
) == -1) {
686 /* Note that, for historical reasons, PyDict_GetItem() suppresses all errors
687 * that may occur (originally dicts supported only string keys, and exceptions
688 * weren't possible). So, while the original intent was that a NULL return
689 * meant the key wasn't present, in reality it can mean that, or that an error
690 * (suppressed) occurred while computing the key's hash, or that some error
691 * (suppressed) occurred when comparing keys in the dict's internal probe
692 * sequence. A nasty example of the latter is when a Python-coded comparison
693 * function hits a stack-depth error, which can cause this to return NULL
694 * even if the key is present.
697 PyDict_GetItem(PyObject
*op
, PyObject
*key
)
700 PyDictObject
*mp
= (PyDictObject
*)op
;
702 PyThreadState
*tstate
;
703 if (!PyDict_Check(op
))
705 if (!PyString_CheckExact(key
) ||
706 (hash
= ((PyStringObject
*) key
)->ob_shash
) == -1)
708 hash
= PyObject_Hash(key
);
715 /* We can arrive here with a NULL tstate during initialization:
716 try running "python -Wi" for an example related to string
717 interning. Let's just hope that no exception occurs then... */
718 tstate
= _PyThreadState_Current
;
719 if (tstate
!= NULL
&& tstate
->curexc_type
!= NULL
) {
720 /* preserve the existing exception */
721 PyObject
*err_type
, *err_value
, *err_tb
;
722 PyErr_Fetch(&err_type
, &err_value
, &err_tb
);
723 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
725 PyErr_Restore(err_type
, err_value
, err_tb
);
730 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
739 /* CAUTION: PyDict_SetItem() must guarantee that it won't resize the
740 * dictionary if it's merely replacing the value for an existing key.
741 * This means that it's safe to loop over a dictionary with PyDict_Next()
742 * and occasionally replace a value -- but you can't insert new keys or
746 PyDict_SetItem(register PyObject
*op
, PyObject
*key
, PyObject
*value
)
748 register PyDictObject
*mp
;
750 register Py_ssize_t n_used
;
752 if (!PyDict_Check(op
)) {
753 PyErr_BadInternalCall();
758 mp
= (PyDictObject
*)op
;
759 if (PyString_CheckExact(key
)) {
760 hash
= ((PyStringObject
*)key
)->ob_shash
;
762 hash
= PyObject_Hash(key
);
765 hash
= PyObject_Hash(key
);
769 assert(mp
->ma_fill
<= mp
->ma_mask
); /* at least one empty slot */
770 n_used
= mp
->ma_used
;
773 if (insertdict(mp
, key
, hash
, value
) != 0)
775 /* If we added a key, we can safely resize. Otherwise just return!
776 * If fill >= 2/3 size, adjust size. Normally, this doubles or
777 * quaduples the size, but it's also possible for the dict to shrink
778 * (if ma_fill is much larger than ma_used, meaning a lot of dict
779 * keys have been * deleted).
781 * Quadrupling the size improves average dictionary sparseness
782 * (reducing collisions) at the cost of some memory and iteration
783 * speed (which loops over every possible entry). It also halves
784 * the number of expensive resize operations in a growing dictionary.
786 * Very large dictionaries (over 50K items) use doubling instead.
787 * This may help applications with severe memory constraints.
789 if (!(mp
->ma_used
> n_used
&& mp
->ma_fill
*3 >= (mp
->ma_mask
+1)*2))
791 return dictresize(mp
, (mp
->ma_used
> 50000 ? 2 : 4) * mp
->ma_used
);
795 PyDict_DelItem(PyObject
*op
, PyObject
*key
)
797 register PyDictObject
*mp
;
799 register PyDictEntry
*ep
;
800 PyObject
*old_value
, *old_key
;
802 if (!PyDict_Check(op
)) {
803 PyErr_BadInternalCall();
807 if (!PyString_CheckExact(key
) ||
808 (hash
= ((PyStringObject
*) key
)->ob_shash
) == -1) {
809 hash
= PyObject_Hash(key
);
813 mp
= (PyDictObject
*)op
;
814 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
817 if (ep
->me_value
== NULL
) {
821 old_key
= ep
->me_key
;
824 old_value
= ep
->me_value
;
827 Py_DECREF(old_value
);
833 PyDict_Clear(PyObject
*op
)
836 PyDictEntry
*ep
, *table
;
837 int table_is_malloced
;
839 PyDictEntry small_copy
[PyDict_MINSIZE
];
844 if (!PyDict_Check(op
))
846 mp
= (PyDictObject
*)op
;
852 table
= mp
->ma_table
;
853 assert(table
!= NULL
);
854 table_is_malloced
= table
!= mp
->ma_smalltable
;
856 /* This is delicate. During the process of clearing the dict,
857 * decrefs can cause the dict to mutate. To avoid fatal confusion
858 * (voice of experience), we have to make the dict empty before
859 * clearing the slots, and never refer to anything via mp->xxx while
863 if (table_is_malloced
)
864 EMPTY_TO_MINSIZE(mp
);
867 /* It's a small table with something that needs to be cleared.
868 * Afraid the only safe way is to copy the dict entries into
869 * another small table first.
871 memcpy(small_copy
, table
, sizeof(small_copy
));
873 EMPTY_TO_MINSIZE(mp
);
875 /* else it's a small table that's already empty */
877 /* Now we can finally clear things. If C had refcounts, we could
878 * assert that the refcount on table is 1 now, i.e. that this function
879 * has unique access to it, so decref side-effects can't alter it.
881 for (ep
= table
; fill
> 0; ++ep
) {
888 Py_DECREF(ep
->me_key
);
889 Py_XDECREF(ep
->me_value
);
893 assert(ep
->me_value
== NULL
);
897 if (table_is_malloced
)
902 * Iterate over a dict. Use like so:
905 * PyObject *key, *value;
906 * i = 0; # important! i should not otherwise be changed by you
907 * while (PyDict_Next(yourdict, &i, &key, &value)) {
908 * Refer to borrowed references in key and value.
911 * CAUTION: In general, it isn't safe to use PyDict_Next in a loop that
912 * mutates the dict. One exception: it is safe if the loop merely changes
913 * the values associated with the keys (but doesn't insert new keys or
914 * delete keys), via PyDict_SetItem().
917 PyDict_Next(PyObject
*op
, Py_ssize_t
*ppos
, PyObject
**pkey
, PyObject
**pvalue
)
919 register Py_ssize_t i
;
920 register Py_ssize_t mask
;
921 register PyDictEntry
*ep
;
923 if (!PyDict_Check(op
))
928 ep
= ((PyDictObject
*)op
)->ma_table
;
929 mask
= ((PyDictObject
*)op
)->ma_mask
;
930 while (i
<= mask
&& ep
[i
].me_value
== NULL
)
936 *pkey
= ep
[i
].me_key
;
938 *pvalue
= ep
[i
].me_value
;
942 /* Internal version of PyDict_Next that returns a hash value in addition to the key and value.*/
944 _PyDict_Next(PyObject
*op
, Py_ssize_t
*ppos
, PyObject
**pkey
, PyObject
**pvalue
, long *phash
)
946 register Py_ssize_t i
;
947 register Py_ssize_t mask
;
948 register PyDictEntry
*ep
;
950 if (!PyDict_Check(op
))
955 ep
= ((PyDictObject
*)op
)->ma_table
;
956 mask
= ((PyDictObject
*)op
)->ma_mask
;
957 while (i
<= mask
&& ep
[i
].me_value
== NULL
)
962 *phash
= (long)(ep
[i
].me_hash
);
964 *pkey
= ep
[i
].me_key
;
966 *pvalue
= ep
[i
].me_value
;
973 dict_dealloc(register PyDictObject
*mp
)
975 register PyDictEntry
*ep
;
976 Py_ssize_t fill
= mp
->ma_fill
;
977 PyObject_GC_UnTrack(mp
);
978 Py_TRASHCAN_SAFE_BEGIN(mp
)
979 for (ep
= mp
->ma_table
; fill
> 0; ep
++) {
982 Py_DECREF(ep
->me_key
);
983 Py_XDECREF(ep
->me_value
);
986 if (mp
->ma_table
!= mp
->ma_smalltable
)
987 PyMem_DEL(mp
->ma_table
);
988 if (numfree
< PyDict_MAXFREELIST
&& Py_TYPE(mp
) == &PyDict_Type
)
989 free_list
[numfree
++] = mp
;
991 Py_TYPE(mp
)->tp_free((PyObject
*)mp
);
992 Py_TRASHCAN_SAFE_END(mp
)
996 dict_print(register PyDictObject
*mp
, register FILE *fp
, register int flags
)
998 register Py_ssize_t i
;
999 register Py_ssize_t any
;
1002 status
= Py_ReprEnter((PyObject
*)mp
);
1006 Py_BEGIN_ALLOW_THREADS
1007 fprintf(fp
, "{...}");
1008 Py_END_ALLOW_THREADS
1012 Py_BEGIN_ALLOW_THREADS
1014 Py_END_ALLOW_THREADS
1016 for (i
= 0; i
<= mp
->ma_mask
; i
++) {
1017 PyDictEntry
*ep
= mp
->ma_table
+ i
;
1018 PyObject
*pvalue
= ep
->me_value
;
1019 if (pvalue
!= NULL
) {
1020 /* Prevent PyObject_Repr from deleting value during
1024 Py_BEGIN_ALLOW_THREADS
1026 Py_END_ALLOW_THREADS
1028 if (PyObject_Print((PyObject
*)ep
->me_key
, fp
, 0)!=0) {
1030 Py_ReprLeave((PyObject
*)mp
);
1033 Py_BEGIN_ALLOW_THREADS
1035 Py_END_ALLOW_THREADS
1036 if (PyObject_Print(pvalue
, fp
, 0) != 0) {
1038 Py_ReprLeave((PyObject
*)mp
);
1044 Py_BEGIN_ALLOW_THREADS
1046 Py_END_ALLOW_THREADS
1047 Py_ReprLeave((PyObject
*)mp
);
1052 dict_repr(PyDictObject
*mp
)
1055 PyObject
*s
, *temp
, *colon
= NULL
;
1056 PyObject
*pieces
= NULL
, *result
= NULL
;
1057 PyObject
*key
, *value
;
1059 i
= Py_ReprEnter((PyObject
*)mp
);
1061 return i
> 0 ? PyString_FromString("{...}") : NULL
;
1064 if (mp
->ma_used
== 0) {
1065 result
= PyString_FromString("{}");
1069 pieces
= PyList_New(0);
1073 colon
= PyString_FromString(": ");
1077 /* Do repr() on each key+value pair, and insert ": " between them.
1078 Note that repr may mutate the dict. */
1080 while (PyDict_Next((PyObject
*)mp
, &i
, &key
, &value
)) {
1082 /* Prevent repr from deleting value during key format. */
1084 s
= PyObject_Repr(key
);
1085 PyString_Concat(&s
, colon
);
1086 PyString_ConcatAndDel(&s
, PyObject_Repr(value
));
1090 status
= PyList_Append(pieces
, s
);
1091 Py_DECREF(s
); /* append created a new ref */
1096 /* Add "{}" decorations to the first and last items. */
1097 assert(PyList_GET_SIZE(pieces
) > 0);
1098 s
= PyString_FromString("{");
1101 temp
= PyList_GET_ITEM(pieces
, 0);
1102 PyString_ConcatAndDel(&s
, temp
);
1103 PyList_SET_ITEM(pieces
, 0, s
);
1107 s
= PyString_FromString("}");
1110 temp
= PyList_GET_ITEM(pieces
, PyList_GET_SIZE(pieces
) - 1);
1111 PyString_ConcatAndDel(&temp
, s
);
1112 PyList_SET_ITEM(pieces
, PyList_GET_SIZE(pieces
) - 1, temp
);
1116 /* Paste them all together with ", " between. */
1117 s
= PyString_FromString(", ");
1120 result
= _PyString_Join(s
, pieces
);
1126 Py_ReprLeave((PyObject
*)mp
);
1131 dict_length(PyDictObject
*mp
)
1137 dict_subscript(PyDictObject
*mp
, register PyObject
*key
)
1142 assert(mp
->ma_table
!= NULL
);
1143 if (!PyString_CheckExact(key
) ||
1144 (hash
= ((PyStringObject
*) key
)->ob_shash
) == -1) {
1145 hash
= PyObject_Hash(key
);
1149 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
1154 if (!PyDict_CheckExact(mp
)) {
1155 /* Look up __missing__ method if we're a subclass. */
1157 static PyObject
*missing_str
= NULL
;
1158 if (missing_str
== NULL
)
1160 PyString_InternFromString("__missing__");
1161 missing
= _PyType_Lookup(Py_TYPE(mp
), missing_str
);
1162 if (missing
!= NULL
)
1163 return PyObject_CallFunctionObjArgs(missing
,
1164 (PyObject
*)mp
, key
, NULL
);
1175 dict_ass_sub(PyDictObject
*mp
, PyObject
*v
, PyObject
*w
)
1178 return PyDict_DelItem((PyObject
*)mp
, v
);
1180 return PyDict_SetItem((PyObject
*)mp
, v
, w
);
1183 static PyMappingMethods dict_as_mapping
= {
1184 (lenfunc
)dict_length
, /*mp_length*/
1185 (binaryfunc
)dict_subscript
, /*mp_subscript*/
1186 (objobjargproc
)dict_ass_sub
, /*mp_ass_subscript*/
1190 dict_keys(register PyDictObject
*mp
)
1192 register PyObject
*v
;
1193 register Py_ssize_t i
, j
;
1202 if (n
!= mp
->ma_used
) {
1203 /* Durnit. The allocations caused the dict to resize.
1204 * Just start over, this shouldn't normally happen.
1211 for (i
= 0, j
= 0; i
<= mask
; i
++) {
1212 if (ep
[i
].me_value
!= NULL
) {
1213 PyObject
*key
= ep
[i
].me_key
;
1215 PyList_SET_ITEM(v
, j
, key
);
1224 dict_values(register PyDictObject
*mp
)
1226 register PyObject
*v
;
1227 register Py_ssize_t i
, j
;
1236 if (n
!= mp
->ma_used
) {
1237 /* Durnit. The allocations caused the dict to resize.
1238 * Just start over, this shouldn't normally happen.
1245 for (i
= 0, j
= 0; i
<= mask
; i
++) {
1246 if (ep
[i
].me_value
!= NULL
) {
1247 PyObject
*value
= ep
[i
].me_value
;
1249 PyList_SET_ITEM(v
, j
, value
);
1258 dict_items(register PyDictObject
*mp
)
1260 register PyObject
*v
;
1261 register Py_ssize_t i
, j
, n
;
1263 PyObject
*item
, *key
, *value
;
1266 /* Preallocate the list of tuples, to avoid allocations during
1267 * the loop over the items, which could trigger GC, which
1268 * could resize the dict. :-(
1275 for (i
= 0; i
< n
; i
++) {
1276 item
= PyTuple_New(2);
1281 PyList_SET_ITEM(v
, i
, item
);
1283 if (n
!= mp
->ma_used
) {
1284 /* Durnit. The allocations caused the dict to resize.
1285 * Just start over, this shouldn't normally happen.
1290 /* Nothing we do below makes any function calls. */
1293 for (i
= 0, j
= 0; i
<= mask
; i
++) {
1294 if ((value
=ep
[i
].me_value
) != NULL
) {
1296 item
= PyList_GET_ITEM(v
, j
);
1298 PyTuple_SET_ITEM(item
, 0, key
);
1300 PyTuple_SET_ITEM(item
, 1, value
);
1309 dict_fromkeys(PyObject
*cls
, PyObject
*args
)
1312 PyObject
*value
= Py_None
;
1313 PyObject
*it
; /* iter(seq) */
1318 if (!PyArg_UnpackTuple(args
, "fromkeys", 1, 2, &seq
, &value
))
1321 d
= PyObject_CallObject(cls
, NULL
);
1325 if (PyDict_CheckExact(d
) && PyDict_CheckExact(seq
)) {
1326 PyDictObject
*mp
= (PyDictObject
*)d
;
1332 if (dictresize(mp
, Py_SIZE(seq
)))
1335 while (_PyDict_Next(seq
, &pos
, &key
, &oldvalue
, &hash
)) {
1338 if (insertdict(mp
, key
, hash
, value
))
1344 if (PyDict_CheckExact(d
) && PyAnySet_CheckExact(seq
)) {
1345 PyDictObject
*mp
= (PyDictObject
*)d
;
1350 if (dictresize(mp
, PySet_GET_SIZE(seq
)))
1353 while (_PySet_NextEntry(seq
, &pos
, &key
, &hash
)) {
1356 if (insertdict(mp
, key
, hash
, value
))
1362 it
= PyObject_GetIter(seq
);
1368 if (PyDict_CheckExact(d
)) {
1369 while ((key
= PyIter_Next(it
)) != NULL
) {
1370 status
= PyDict_SetItem(d
, key
, value
);
1376 while ((key
= PyIter_Next(it
)) != NULL
) {
1377 status
= PyObject_SetItem(d
, key
, value
);
1384 if (PyErr_Occurred())
1396 dict_update_common(PyObject
*self
, PyObject
*args
, PyObject
*kwds
, char *methname
)
1398 PyObject
*arg
= NULL
;
1401 if (!PyArg_UnpackTuple(args
, methname
, 0, 1, &arg
))
1404 else if (arg
!= NULL
) {
1405 if (PyObject_HasAttrString(arg
, "keys"))
1406 result
= PyDict_Merge(self
, arg
, 1);
1408 result
= PyDict_MergeFromSeq2(self
, arg
, 1);
1410 if (result
== 0 && kwds
!= NULL
)
1411 result
= PyDict_Merge(self
, kwds
, 1);
1416 dict_update(PyObject
*self
, PyObject
*args
, PyObject
*kwds
)
1418 if (dict_update_common(self
, args
, kwds
, "update") != -1)
1423 /* Update unconditionally replaces existing items.
1424 Merge has a 3rd argument 'override'; if set, it acts like Update,
1425 otherwise it leaves existing items unchanged.
1427 PyDict_{Update,Merge} update/merge from a mapping object.
1429 PyDict_MergeFromSeq2 updates/merges from any iterable object
1430 producing iterable objects of length 2.
1434 PyDict_MergeFromSeq2(PyObject
*d
, PyObject
*seq2
, int override
)
1436 PyObject
*it
; /* iter(seq2) */
1437 Py_ssize_t i
; /* index into seq2 of current element */
1438 PyObject
*item
; /* seq2[i] */
1439 PyObject
*fast
; /* item as a 2-tuple or 2-list */
1442 assert(PyDict_Check(d
));
1443 assert(seq2
!= NULL
);
1445 it
= PyObject_GetIter(seq2
);
1449 for (i
= 0; ; ++i
) {
1450 PyObject
*key
, *value
;
1454 item
= PyIter_Next(it
);
1456 if (PyErr_Occurred())
1461 /* Convert item to sequence, and verify length 2. */
1462 fast
= PySequence_Fast(item
, "");
1464 if (PyErr_ExceptionMatches(PyExc_TypeError
))
1465 PyErr_Format(PyExc_TypeError
,
1466 "cannot convert dictionary update "
1467 "sequence element #%zd to a sequence",
1471 n
= PySequence_Fast_GET_SIZE(fast
);
1473 PyErr_Format(PyExc_ValueError
,
1474 "dictionary update sequence element #%zd "
1475 "has length %zd; 2 is required",
1480 /* Update/merge with this (key, value) pair. */
1481 key
= PySequence_Fast_GET_ITEM(fast
, 0);
1482 value
= PySequence_Fast_GET_ITEM(fast
, 1);
1483 if (override
|| PyDict_GetItem(d
, key
) == NULL
) {
1484 int status
= PyDict_SetItem(d
, key
, value
);
1500 return Py_SAFE_DOWNCAST(i
, Py_ssize_t
, int);
1504 PyDict_Update(PyObject
*a
, PyObject
*b
)
1506 return PyDict_Merge(a
, b
, 1);
1510 PyDict_Merge(PyObject
*a
, PyObject
*b
, int override
)
1512 register PyDictObject
*mp
, *other
;
1513 register Py_ssize_t i
;
1516 /* We accept for the argument either a concrete dictionary object,
1517 * or an abstract "mapping" object. For the former, we can do
1518 * things quite efficiently. For the latter, we only require that
1519 * PyMapping_Keys() and PyObject_GetItem() be supported.
1521 if (a
== NULL
|| !PyDict_Check(a
) || b
== NULL
) {
1522 PyErr_BadInternalCall();
1525 mp
= (PyDictObject
*)a
;
1526 if (PyDict_Check(b
)) {
1527 other
= (PyDictObject
*)b
;
1528 if (other
== mp
|| other
->ma_used
== 0)
1529 /* a.update(a) or a.update({}); nothing to do */
1531 if (mp
->ma_used
== 0)
1532 /* Since the target dict is empty, PyDict_GetItem()
1533 * always returns NULL. Setting override to 1
1534 * skips the unnecessary test.
1537 /* Do one big resize at the start, rather than
1538 * incrementally resizing as we insert new items. Expect
1539 * that there will be no (or few) overlapping keys.
1541 if ((mp
->ma_fill
+ other
->ma_used
)*3 >= (mp
->ma_mask
+1)*2) {
1542 if (dictresize(mp
, (mp
->ma_used
+ other
->ma_used
)*2) != 0)
1545 for (i
= 0; i
<= other
->ma_mask
; i
++) {
1546 entry
= &other
->ma_table
[i
];
1547 if (entry
->me_value
!= NULL
&&
1549 PyDict_GetItem(a
, entry
->me_key
) == NULL
)) {
1550 Py_INCREF(entry
->me_key
);
1551 Py_INCREF(entry
->me_value
);
1552 if (insertdict(mp
, entry
->me_key
,
1553 (long)entry
->me_hash
,
1554 entry
->me_value
) != 0)
1560 /* Do it the generic, slower way */
1561 PyObject
*keys
= PyMapping_Keys(b
);
1563 PyObject
*key
, *value
;
1567 /* Docstring says this is equivalent to E.keys() so
1568 * if E doesn't have a .keys() method we want
1569 * AttributeError to percolate up. Might as well
1570 * do the same for any other error.
1574 iter
= PyObject_GetIter(keys
);
1579 for (key
= PyIter_Next(iter
); key
; key
= PyIter_Next(iter
)) {
1580 if (!override
&& PyDict_GetItem(a
, key
) != NULL
) {
1584 value
= PyObject_GetItem(b
, key
);
1585 if (value
== NULL
) {
1590 status
= PyDict_SetItem(a
, key
, value
);
1599 if (PyErr_Occurred())
1600 /* Iterator completed, via error */
1607 dict_copy(register PyDictObject
*mp
)
1609 return PyDict_Copy((PyObject
*)mp
);
1613 PyDict_Copy(PyObject
*o
)
1617 if (o
== NULL
|| !PyDict_Check(o
)) {
1618 PyErr_BadInternalCall();
1621 copy
= PyDict_New();
1624 if (PyDict_Merge(copy
, o
, 1) == 0)
1631 PyDict_Size(PyObject
*mp
)
1633 if (mp
== NULL
|| !PyDict_Check(mp
)) {
1634 PyErr_BadInternalCall();
1637 return ((PyDictObject
*)mp
)->ma_used
;
1641 PyDict_Keys(PyObject
*mp
)
1643 if (mp
== NULL
|| !PyDict_Check(mp
)) {
1644 PyErr_BadInternalCall();
1647 return dict_keys((PyDictObject
*)mp
);
1651 PyDict_Values(PyObject
*mp
)
1653 if (mp
== NULL
|| !PyDict_Check(mp
)) {
1654 PyErr_BadInternalCall();
1657 return dict_values((PyDictObject
*)mp
);
1661 PyDict_Items(PyObject
*mp
)
1663 if (mp
== NULL
|| !PyDict_Check(mp
)) {
1664 PyErr_BadInternalCall();
1667 return dict_items((PyDictObject
*)mp
);
1670 /* Subroutine which returns the smallest key in a for which b's value
1671 is different or absent. The value is returned too, through the
1672 pval argument. Both are NULL if no key in a is found for which b's status
1673 differs. The refcounts on (and only on) non-NULL *pval and function return
1674 values must be decremented by the caller (characterize() increments them
1675 to ensure that mutating comparison and PyDict_GetItem calls can't delete
1676 them before the caller is done looking at them). */
1679 characterize(PyDictObject
*a
, PyDictObject
*b
, PyObject
**pval
)
1681 PyObject
*akey
= NULL
; /* smallest key in a s.t. a[akey] != b[akey] */
1682 PyObject
*aval
= NULL
; /* a[akey] */
1686 for (i
= 0; i
<= a
->ma_mask
; i
++) {
1687 PyObject
*thiskey
, *thisaval
, *thisbval
;
1688 if (a
->ma_table
[i
].me_value
== NULL
)
1690 thiskey
= a
->ma_table
[i
].me_key
;
1691 Py_INCREF(thiskey
); /* keep alive across compares */
1693 cmp
= PyObject_RichCompareBool(akey
, thiskey
, Py_LT
);
1700 a
->ma_table
[i
].me_value
== NULL
)
1702 /* Not the *smallest* a key; or maybe it is
1703 * but the compare shrunk the dict so we can't
1704 * find its associated value anymore; or
1705 * maybe it is but the compare deleted the
1713 /* Compare a[thiskey] to b[thiskey]; cmp <- true iff equal. */
1714 thisaval
= a
->ma_table
[i
].me_value
;
1716 Py_INCREF(thisaval
); /* keep alive */
1717 thisbval
= PyDict_GetItem((PyObject
*)b
, thiskey
);
1718 if (thisbval
== NULL
)
1721 /* both dicts have thiskey: same values? */
1722 cmp
= PyObject_RichCompareBool(
1723 thisaval
, thisbval
, Py_EQ
);
1726 Py_DECREF(thisaval
);
1739 Py_DECREF(thisaval
);
1753 dict_compare(PyDictObject
*a
, PyDictObject
*b
)
1755 PyObject
*adiff
, *bdiff
, *aval
, *bval
;
1758 /* Compare lengths first */
1759 if (a
->ma_used
< b
->ma_used
)
1760 return -1; /* a is shorter */
1761 else if (a
->ma_used
> b
->ma_used
)
1762 return 1; /* b is shorter */
1764 /* Same length -- check all keys */
1765 bdiff
= bval
= NULL
;
1766 adiff
= characterize(a
, b
, &aval
);
1767 if (adiff
== NULL
) {
1769 /* Either an error, or a is a subset with the same length so
1772 res
= PyErr_Occurred() ? -1 : 0;
1775 bdiff
= characterize(b
, a
, &bval
);
1776 if (bdiff
== NULL
&& PyErr_Occurred()) {
1783 /* bdiff == NULL "should be" impossible now, but perhaps
1784 * the last comparison done by the characterize() on a had
1785 * the side effect of making the dicts equal!
1787 res
= PyObject_Compare(adiff
, bdiff
);
1789 if (res
== 0 && bval
!= NULL
)
1790 res
= PyObject_Compare(aval
, bval
);
1800 /* Return 1 if dicts equal, 0 if not, -1 if error.
1801 * Gets out as soon as any difference is detected.
1802 * Uses only Py_EQ comparison.
1805 dict_equal(PyDictObject
*a
, PyDictObject
*b
)
1809 if (a
->ma_used
!= b
->ma_used
)
1810 /* can't be equal if # of entries differ */
1813 /* Same # of entries -- check all of 'em. Exit early on any diff. */
1814 for (i
= 0; i
<= a
->ma_mask
; i
++) {
1815 PyObject
*aval
= a
->ma_table
[i
].me_value
;
1819 PyObject
*key
= a
->ma_table
[i
].me_key
;
1820 /* temporarily bump aval's refcount to ensure it stays
1821 alive until we're done with it */
1825 bval
= PyDict_GetItem((PyObject
*)b
, key
);
1831 cmp
= PyObject_RichCompareBool(aval
, bval
, Py_EQ
);
1833 if (cmp
<= 0) /* error or not equal */
1841 dict_richcompare(PyObject
*v
, PyObject
*w
, int op
)
1846 if (!PyDict_Check(v
) || !PyDict_Check(w
)) {
1847 res
= Py_NotImplemented
;
1849 else if (op
== Py_EQ
|| op
== Py_NE
) {
1850 cmp
= dict_equal((PyDictObject
*)v
, (PyDictObject
*)w
);
1853 res
= (cmp
== (op
== Py_EQ
)) ? Py_True
: Py_False
;
1856 /* Py3K warning if comparison isn't == or != */
1857 if (PyErr_WarnPy3k("dict inequality comparisons not supported "
1861 res
= Py_NotImplemented
;
1868 dict_contains(register PyDictObject
*mp
, PyObject
*key
)
1873 if (!PyString_CheckExact(key
) ||
1874 (hash
= ((PyStringObject
*) key
)->ob_shash
) == -1) {
1875 hash
= PyObject_Hash(key
);
1879 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
1882 return PyBool_FromLong(ep
->me_value
!= NULL
);
1886 dict_has_key(register PyDictObject
*mp
, PyObject
*key
)
1888 if (PyErr_WarnPy3k("dict.has_key() not supported in 3.x; "
1889 "use the in operator", 1) < 0)
1891 return dict_contains(mp
, key
);
1895 dict_get(register PyDictObject
*mp
, PyObject
*args
)
1898 PyObject
*failobj
= Py_None
;
1899 PyObject
*val
= NULL
;
1903 if (!PyArg_UnpackTuple(args
, "get", 1, 2, &key
, &failobj
))
1906 if (!PyString_CheckExact(key
) ||
1907 (hash
= ((PyStringObject
*) key
)->ob_shash
) == -1) {
1908 hash
= PyObject_Hash(key
);
1912 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
1924 dict_setdefault(register PyDictObject
*mp
, PyObject
*args
)
1927 PyObject
*failobj
= Py_None
;
1928 PyObject
*val
= NULL
;
1932 if (!PyArg_UnpackTuple(args
, "setdefault", 1, 2, &key
, &failobj
))
1935 if (!PyString_CheckExact(key
) ||
1936 (hash
= ((PyStringObject
*) key
)->ob_shash
) == -1) {
1937 hash
= PyObject_Hash(key
);
1941 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
1947 if (PyDict_SetItem((PyObject
*)mp
, key
, failobj
))
1956 dict_clear(register PyDictObject
*mp
)
1958 PyDict_Clear((PyObject
*)mp
);
1963 dict_pop(PyDictObject
*mp
, PyObject
*args
)
1967 PyObject
*old_value
, *old_key
;
1968 PyObject
*key
, *deflt
= NULL
;
1970 if(!PyArg_UnpackTuple(args
, "pop", 1, 2, &key
, &deflt
))
1972 if (mp
->ma_used
== 0) {
1977 PyErr_SetString(PyExc_KeyError
,
1978 "pop(): dictionary is empty");
1981 if (!PyString_CheckExact(key
) ||
1982 (hash
= ((PyStringObject
*) key
)->ob_shash
) == -1) {
1983 hash
= PyObject_Hash(key
);
1987 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
1990 if (ep
->me_value
== NULL
) {
1998 old_key
= ep
->me_key
;
2001 old_value
= ep
->me_value
;
2002 ep
->me_value
= NULL
;
2009 dict_popitem(PyDictObject
*mp
)
2015 /* Allocate the result tuple before checking the size. Believe it
2016 * or not, this allocation could trigger a garbage collection which
2017 * could empty the dict, so if we checked the size first and that
2018 * happened, the result would be an infinite loop (searching for an
2019 * entry that no longer exists). Note that the usual popitem()
2020 * idiom is "while d: k, v = d.popitem()". so needing to throw the
2021 * tuple away if the dict *is* empty isn't a significant
2022 * inefficiency -- possible, but unlikely in practice.
2024 res
= PyTuple_New(2);
2027 if (mp
->ma_used
== 0) {
2029 PyErr_SetString(PyExc_KeyError
,
2030 "popitem(): dictionary is empty");
2033 /* Set ep to "the first" dict entry with a value. We abuse the hash
2034 * field of slot 0 to hold a search finger:
2035 * If slot 0 has a value, use slot 0.
2036 * Else slot 0 is being used to hold a search finger,
2037 * and we use its hash value as the first index to look.
2039 ep
= &mp
->ma_table
[0];
2040 if (ep
->me_value
== NULL
) {
2042 /* The hash field may be a real hash value, or it may be a
2043 * legit search finger, or it may be a once-legit search
2044 * finger that's out of bounds now because it wrapped around
2045 * or the table shrunk -- simply make sure it's in bounds now.
2047 if (i
> mp
->ma_mask
|| i
< 1)
2048 i
= 1; /* skip slot 0 */
2049 while ((ep
= &mp
->ma_table
[i
])->me_value
== NULL
) {
2051 if (i
> mp
->ma_mask
)
2055 PyTuple_SET_ITEM(res
, 0, ep
->me_key
);
2056 PyTuple_SET_ITEM(res
, 1, ep
->me_value
);
2059 ep
->me_value
= NULL
;
2061 assert(mp
->ma_table
[0].me_value
== NULL
);
2062 mp
->ma_table
[0].me_hash
= i
+ 1; /* next place to start */
2067 dict_traverse(PyObject
*op
, visitproc visit
, void *arg
)
2073 while (PyDict_Next(op
, &i
, &pk
, &pv
)) {
2081 dict_tp_clear(PyObject
*op
)
2088 extern PyTypeObject PyDictIterKey_Type
; /* Forward */
2089 extern PyTypeObject PyDictIterValue_Type
; /* Forward */
2090 extern PyTypeObject PyDictIterItem_Type
; /* Forward */
2091 static PyObject
*dictiter_new(PyDictObject
*, PyTypeObject
*);
2094 dict_iterkeys(PyDictObject
*dict
)
2096 return dictiter_new(dict
, &PyDictIterKey_Type
);
2100 dict_itervalues(PyDictObject
*dict
)
2102 return dictiter_new(dict
, &PyDictIterValue_Type
);
2106 dict_iteritems(PyDictObject
*dict
)
2108 return dictiter_new(dict
, &PyDictIterItem_Type
);
2112 dict_sizeof(PyDictObject
*mp
)
2116 res
= sizeof(PyDictObject
);
2117 if (mp
->ma_table
!= mp
->ma_smalltable
)
2118 res
= res
+ (mp
->ma_mask
+ 1) * sizeof(PyDictEntry
);
2119 return PyInt_FromSsize_t(res
);
2122 PyDoc_STRVAR(has_key__doc__
,
2123 "D.has_key(k) -> True if D has a key k, else False");
2125 PyDoc_STRVAR(contains__doc__
,
2126 "D.__contains__(k) -> True if D has a key k, else False");
2128 PyDoc_STRVAR(getitem__doc__
, "x.__getitem__(y) <==> x[y]");
2130 PyDoc_STRVAR(sizeof__doc__
,
2131 "D.__sizeof__() -> size of D in memory, in bytes");
2133 PyDoc_STRVAR(get__doc__
,
2134 "D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.");
2136 PyDoc_STRVAR(setdefault_doc__
,
2137 "D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D");
2139 PyDoc_STRVAR(pop__doc__
,
2140 "D.pop(k[,d]) -> v, remove specified key and return the corresponding value.\n\
2141 If key is not found, d is returned if given, otherwise KeyError is raised");
2143 PyDoc_STRVAR(popitem__doc__
,
2144 "D.popitem() -> (k, v), remove and return some (key, value) pair as a\n\
2145 2-tuple; but raise KeyError if D is empty.");
2147 PyDoc_STRVAR(keys__doc__
,
2148 "D.keys() -> list of D's keys");
2150 PyDoc_STRVAR(items__doc__
,
2151 "D.items() -> list of D's (key, value) pairs, as 2-tuples");
2153 PyDoc_STRVAR(values__doc__
,
2154 "D.values() -> list of D's values");
2156 PyDoc_STRVAR(update__doc__
,
2157 "D.update(E, **F) -> None. Update D from dict/iterable E and F.\n"
2158 "If E has a .keys() method, does: for k in E: D[k] = E[k]\n\
2159 If E lacks .keys() method, does: for (k, v) in E: D[k] = v\n\
2160 In either case, this is followed by: for k in F: D[k] = F[k]");
2162 PyDoc_STRVAR(fromkeys__doc__
,
2163 "dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v.\n\
2164 v defaults to None.");
2166 PyDoc_STRVAR(clear__doc__
,
2167 "D.clear() -> None. Remove all items from D.");
2169 PyDoc_STRVAR(copy__doc__
,
2170 "D.copy() -> a shallow copy of D");
2172 PyDoc_STRVAR(iterkeys__doc__
,
2173 "D.iterkeys() -> an iterator over the keys of D");
2175 PyDoc_STRVAR(itervalues__doc__
,
2176 "D.itervalues() -> an iterator over the values of D");
2178 PyDoc_STRVAR(iteritems__doc__
,
2179 "D.iteritems() -> an iterator over the (key, value) items of D");
2181 static PyMethodDef mapp_methods
[] = {
2182 {"__contains__",(PyCFunction
)dict_contains
, METH_O
| METH_COEXIST
,
2184 {"__getitem__", (PyCFunction
)dict_subscript
, METH_O
| METH_COEXIST
,
2186 {"__sizeof__", (PyCFunction
)dict_sizeof
, METH_NOARGS
,
2188 {"has_key", (PyCFunction
)dict_has_key
, METH_O
,
2190 {"get", (PyCFunction
)dict_get
, METH_VARARGS
,
2192 {"setdefault", (PyCFunction
)dict_setdefault
, METH_VARARGS
,
2194 {"pop", (PyCFunction
)dict_pop
, METH_VARARGS
,
2196 {"popitem", (PyCFunction
)dict_popitem
, METH_NOARGS
,
2198 {"keys", (PyCFunction
)dict_keys
, METH_NOARGS
,
2200 {"items", (PyCFunction
)dict_items
, METH_NOARGS
,
2202 {"values", (PyCFunction
)dict_values
, METH_NOARGS
,
2204 {"update", (PyCFunction
)dict_update
, METH_VARARGS
| METH_KEYWORDS
,
2206 {"fromkeys", (PyCFunction
)dict_fromkeys
, METH_VARARGS
| METH_CLASS
,
2208 {"clear", (PyCFunction
)dict_clear
, METH_NOARGS
,
2210 {"copy", (PyCFunction
)dict_copy
, METH_NOARGS
,
2212 {"iterkeys", (PyCFunction
)dict_iterkeys
, METH_NOARGS
,
2214 {"itervalues", (PyCFunction
)dict_itervalues
, METH_NOARGS
,
2216 {"iteritems", (PyCFunction
)dict_iteritems
, METH_NOARGS
,
2218 {NULL
, NULL
} /* sentinel */
2221 /* Return 1 if `key` is in dict `op`, 0 if not, and -1 on error. */
2223 PyDict_Contains(PyObject
*op
, PyObject
*key
)
2226 PyDictObject
*mp
= (PyDictObject
*)op
;
2229 if (!PyString_CheckExact(key
) ||
2230 (hash
= ((PyStringObject
*) key
)->ob_shash
) == -1) {
2231 hash
= PyObject_Hash(key
);
2235 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
2236 return ep
== NULL
? -1 : (ep
->me_value
!= NULL
);
2239 /* Internal version of PyDict_Contains used when the hash value is already known */
2241 _PyDict_Contains(PyObject
*op
, PyObject
*key
, long hash
)
2243 PyDictObject
*mp
= (PyDictObject
*)op
;
2246 ep
= (mp
->ma_lookup
)(mp
, key
, hash
);
2247 return ep
== NULL
? -1 : (ep
->me_value
!= NULL
);
2250 /* Hack to implement "key in dict" */
2251 static PySequenceMethods dict_as_sequence
= {
2257 0, /* sq_ass_item */
2258 0, /* sq_ass_slice */
2259 PyDict_Contains
, /* sq_contains */
2260 0, /* sq_inplace_concat */
2261 0, /* sq_inplace_repeat */
2265 dict_new(PyTypeObject
*type
, PyObject
*args
, PyObject
*kwds
)
2269 assert(type
!= NULL
&& type
->tp_alloc
!= NULL
);
2270 self
= type
->tp_alloc(type
, 0);
2272 PyDictObject
*d
= (PyDictObject
*)self
;
2273 /* It's guaranteed that tp->alloc zeroed out the struct. */
2274 assert(d
->ma_table
== NULL
&& d
->ma_fill
== 0 && d
->ma_used
== 0);
2275 INIT_NONZERO_DICT_SLOTS(d
);
2276 d
->ma_lookup
= lookdict_string
;
2277 /* The object has been implicitely tracked by tp_alloc */
2278 if (type
== &PyDict_Type
)
2279 _PyObject_GC_UNTRACK(d
);
2280 #ifdef SHOW_CONVERSION_COUNTS
2283 #ifdef SHOW_TRACK_COUNT
2284 if (_PyObject_GC_IS_TRACKED(d
))
2294 dict_init(PyObject
*self
, PyObject
*args
, PyObject
*kwds
)
2296 return dict_update_common(self
, args
, kwds
, "dict");
2300 dict_iter(PyDictObject
*dict
)
2302 return dictiter_new(dict
, &PyDictIterKey_Type
);
2305 PyDoc_STRVAR(dictionary_doc
,
2306 "dict() -> new empty dictionary.\n"
2307 "dict(mapping) -> new dictionary initialized from a mapping object's\n"
2308 " (key, value) pairs.\n"
2309 "dict(seq) -> new dictionary initialized as if via:\n"
2311 " for k, v in seq:\n"
2313 "dict(**kwargs) -> new dictionary initialized with the name=value pairs\n"
2314 " in the keyword argument list. For example: dict(one=1, two=2)");
2316 PyTypeObject PyDict_Type
= {
2317 PyVarObject_HEAD_INIT(&PyType_Type
, 0)
2319 sizeof(PyDictObject
),
2321 (destructor
)dict_dealloc
, /* tp_dealloc */
2322 (printfunc
)dict_print
, /* tp_print */
2325 (cmpfunc
)dict_compare
, /* tp_compare */
2326 (reprfunc
)dict_repr
, /* tp_repr */
2327 0, /* tp_as_number */
2328 &dict_as_sequence
, /* tp_as_sequence */
2329 &dict_as_mapping
, /* tp_as_mapping */
2330 (hashfunc
)PyObject_HashNotImplemented
, /* tp_hash */
2333 PyObject_GenericGetAttr
, /* tp_getattro */
2334 0, /* tp_setattro */
2335 0, /* tp_as_buffer */
2336 Py_TPFLAGS_DEFAULT
| Py_TPFLAGS_HAVE_GC
|
2337 Py_TPFLAGS_BASETYPE
| Py_TPFLAGS_DICT_SUBCLASS
, /* tp_flags */
2338 dictionary_doc
, /* tp_doc */
2339 dict_traverse
, /* tp_traverse */
2340 dict_tp_clear
, /* tp_clear */
2341 dict_richcompare
, /* tp_richcompare */
2342 0, /* tp_weaklistoffset */
2343 (getiterfunc
)dict_iter
, /* tp_iter */
2344 0, /* tp_iternext */
2345 mapp_methods
, /* tp_methods */
2350 0, /* tp_descr_get */
2351 0, /* tp_descr_set */
2352 0, /* tp_dictoffset */
2353 dict_init
, /* tp_init */
2354 PyType_GenericAlloc
, /* tp_alloc */
2355 dict_new
, /* tp_new */
2356 PyObject_GC_Del
, /* tp_free */
2359 /* For backward compatibility with old dictionary interface */
2362 PyDict_GetItemString(PyObject
*v
, const char *key
)
2365 kv
= PyString_FromString(key
);
2368 rv
= PyDict_GetItem(v
, kv
);
2374 PyDict_SetItemString(PyObject
*v
, const char *key
, PyObject
*item
)
2378 kv
= PyString_FromString(key
);
2381 PyString_InternInPlace(&kv
); /* XXX Should we really? */
2382 err
= PyDict_SetItem(v
, kv
, item
);
2388 PyDict_DelItemString(PyObject
*v
, const char *key
)
2392 kv
= PyString_FromString(key
);
2395 err
= PyDict_DelItem(v
, kv
);
2400 /* Dictionary iterator types */
2404 PyDictObject
*di_dict
; /* Set to NULL when iterator is exhausted */
2407 PyObject
* di_result
; /* reusable result tuple for iteritems */
2412 dictiter_new(PyDictObject
*dict
, PyTypeObject
*itertype
)
2415 di
= PyObject_GC_New(dictiterobject
, itertype
);
2420 di
->di_used
= dict
->ma_used
;
2422 di
->len
= dict
->ma_used
;
2423 if (itertype
== &PyDictIterItem_Type
) {
2424 di
->di_result
= PyTuple_Pack(2, Py_None
, Py_None
);
2425 if (di
->di_result
== NULL
) {
2431 di
->di_result
= NULL
;
2432 _PyObject_GC_TRACK(di
);
2433 return (PyObject
*)di
;
2437 dictiter_dealloc(dictiterobject
*di
)
2439 Py_XDECREF(di
->di_dict
);
2440 Py_XDECREF(di
->di_result
);
2441 PyObject_GC_Del(di
);
2445 dictiter_traverse(dictiterobject
*di
, visitproc visit
, void *arg
)
2447 Py_VISIT(di
->di_dict
);
2448 Py_VISIT(di
->di_result
);
2453 dictiter_len(dictiterobject
*di
)
2456 if (di
->di_dict
!= NULL
&& di
->di_used
== di
->di_dict
->ma_used
)
2458 return PyInt_FromSize_t(len
);
2461 PyDoc_STRVAR(length_hint_doc
, "Private method returning an estimate of len(list(it)).");
2463 static PyMethodDef dictiter_methods
[] = {
2464 {"__length_hint__", (PyCFunction
)dictiter_len
, METH_NOARGS
, length_hint_doc
},
2465 {NULL
, NULL
} /* sentinel */
2468 static PyObject
*dictiter_iternextkey(dictiterobject
*di
)
2471 register Py_ssize_t i
, mask
;
2472 register PyDictEntry
*ep
;
2473 PyDictObject
*d
= di
->di_dict
;
2477 assert (PyDict_Check(d
));
2479 if (di
->di_used
!= d
->ma_used
) {
2480 PyErr_SetString(PyExc_RuntimeError
,
2481 "dictionary changed size during iteration");
2482 di
->di_used
= -1; /* Make this state sticky */
2491 while (i
<= mask
&& ep
[i
].me_value
== NULL
)
2507 PyTypeObject PyDictIterKey_Type
= {
2508 PyVarObject_HEAD_INIT(&PyType_Type
, 0)
2509 "dictionary-keyiterator", /* tp_name */
2510 sizeof(dictiterobject
), /* tp_basicsize */
2511 0, /* tp_itemsize */
2513 (destructor
)dictiter_dealloc
, /* tp_dealloc */
2519 0, /* tp_as_number */
2520 0, /* tp_as_sequence */
2521 0, /* tp_as_mapping */
2525 PyObject_GenericGetAttr
, /* tp_getattro */
2526 0, /* tp_setattro */
2527 0, /* tp_as_buffer */
2528 Py_TPFLAGS_DEFAULT
| Py_TPFLAGS_HAVE_GC
,/* tp_flags */
2530 (traverseproc
)dictiter_traverse
, /* tp_traverse */
2532 0, /* tp_richcompare */
2533 0, /* tp_weaklistoffset */
2534 PyObject_SelfIter
, /* tp_iter */
2535 (iternextfunc
)dictiter_iternextkey
, /* tp_iternext */
2536 dictiter_methods
, /* tp_methods */
2540 static PyObject
*dictiter_iternextvalue(dictiterobject
*di
)
2543 register Py_ssize_t i
, mask
;
2544 register PyDictEntry
*ep
;
2545 PyDictObject
*d
= di
->di_dict
;
2549 assert (PyDict_Check(d
));
2551 if (di
->di_used
!= d
->ma_used
) {
2552 PyErr_SetString(PyExc_RuntimeError
,
2553 "dictionary changed size during iteration");
2554 di
->di_used
= -1; /* Make this state sticky */
2560 if (i
< 0 || i
> mask
)
2563 while ((value
=ep
[i
].me_value
) == NULL
) {
2579 PyTypeObject PyDictIterValue_Type
= {
2580 PyVarObject_HEAD_INIT(&PyType_Type
, 0)
2581 "dictionary-valueiterator", /* tp_name */
2582 sizeof(dictiterobject
), /* tp_basicsize */
2583 0, /* tp_itemsize */
2585 (destructor
)dictiter_dealloc
, /* tp_dealloc */
2591 0, /* tp_as_number */
2592 0, /* tp_as_sequence */
2593 0, /* tp_as_mapping */
2597 PyObject_GenericGetAttr
, /* tp_getattro */
2598 0, /* tp_setattro */
2599 0, /* tp_as_buffer */
2600 Py_TPFLAGS_DEFAULT
| Py_TPFLAGS_HAVE_GC
,/* tp_flags */
2602 (traverseproc
)dictiter_traverse
, /* tp_traverse */
2604 0, /* tp_richcompare */
2605 0, /* tp_weaklistoffset */
2606 PyObject_SelfIter
, /* tp_iter */
2607 (iternextfunc
)dictiter_iternextvalue
, /* tp_iternext */
2608 dictiter_methods
, /* tp_methods */
2612 static PyObject
*dictiter_iternextitem(dictiterobject
*di
)
2614 PyObject
*key
, *value
, *result
= di
->di_result
;
2615 register Py_ssize_t i
, mask
;
2616 register PyDictEntry
*ep
;
2617 PyDictObject
*d
= di
->di_dict
;
2621 assert (PyDict_Check(d
));
2623 if (di
->di_used
!= d
->ma_used
) {
2624 PyErr_SetString(PyExc_RuntimeError
,
2625 "dictionary changed size during iteration");
2626 di
->di_used
= -1; /* Make this state sticky */
2635 while (i
<= mask
&& ep
[i
].me_value
== NULL
)
2641 if (result
->ob_refcnt
== 1) {
2643 Py_DECREF(PyTuple_GET_ITEM(result
, 0));
2644 Py_DECREF(PyTuple_GET_ITEM(result
, 1));
2646 result
= PyTuple_New(2);
2652 value
= ep
[i
].me_value
;
2655 PyTuple_SET_ITEM(result
, 0, key
);
2656 PyTuple_SET_ITEM(result
, 1, value
);
2665 PyTypeObject PyDictIterItem_Type
= {
2666 PyVarObject_HEAD_INIT(&PyType_Type
, 0)
2667 "dictionary-itemiterator", /* tp_name */
2668 sizeof(dictiterobject
), /* tp_basicsize */
2669 0, /* tp_itemsize */
2671 (destructor
)dictiter_dealloc
, /* tp_dealloc */
2677 0, /* tp_as_number */
2678 0, /* tp_as_sequence */
2679 0, /* tp_as_mapping */
2683 PyObject_GenericGetAttr
, /* tp_getattro */
2684 0, /* tp_setattro */
2685 0, /* tp_as_buffer */
2686 Py_TPFLAGS_DEFAULT
| Py_TPFLAGS_HAVE_GC
,/* tp_flags */
2688 (traverseproc
)dictiter_traverse
, /* tp_traverse */
2690 0, /* tp_richcompare */
2691 0, /* tp_weaklistoffset */
2692 PyObject_SelfIter
, /* tp_iter */
2693 (iternextfunc
)dictiter_iternextitem
, /* tp_iternext */
2694 dictiter_methods
, /* tp_methods */