2 :mod:`string` --- Common string operations
3 ==========================================
6 :synopsis: Common string operations.
11 The :mod:`string` module contains a number of useful constants and
12 classes, as well as some deprecated legacy functions that are also
13 available as methods on strings. In addition, Python's built-in string
14 classes support the sequence type methods described in the
15 :ref:`typesseq` section, and also the string-specific methods described
16 in the :ref:`string-methods` section. To output formatted strings use
17 template strings or the ``%`` operator described in the
18 :ref:`string-formatting` section. Also, see the :mod:`re` module for
19 string functions based on regular expressions.
25 The constants defined in this module are:
28 .. data:: ascii_letters
30 The concatenation of the :const:`ascii_lowercase` and :const:`ascii_uppercase`
31 constants described below. This value is not locale-dependent.
34 .. data:: ascii_lowercase
36 The lowercase letters ``'abcdefghijklmnopqrstuvwxyz'``. This value is not
37 locale-dependent and will not change.
40 .. data:: ascii_uppercase
42 The uppercase letters ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. This value is not
43 locale-dependent and will not change.
48 The string ``'0123456789'``.
53 The string ``'0123456789abcdefABCDEF'``.
58 The concatenation of the strings :const:`lowercase` and :const:`uppercase`
59 described below. The specific value is locale-dependent, and will be updated
60 when :func:`locale.setlocale` is called.
65 A string containing all the characters that are considered lowercase letters.
66 On most systems this is the string ``'abcdefghijklmnopqrstuvwxyz'``. Do not
67 change its definition --- the effect on the routines :func:`upper` and
68 :func:`swapcase` is undefined. The specific value is locale-dependent, and will
69 be updated when :func:`locale.setlocale` is called.
74 The string ``'01234567'``.
79 String of ASCII characters which are considered punctuation characters in the
85 String of characters which are considered printable. This is a combination of
86 :const:`digits`, :const:`letters`, :const:`punctuation`, and
92 A string containing all the characters that are considered uppercase letters.
93 On most systems this is the string ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``. Do not
94 change its definition --- the effect on the routines :func:`lower` and
95 :func:`swapcase` is undefined. The specific value is locale-dependent, and will
96 be updated when :func:`locale.setlocale` is called.
101 A string containing all characters that are considered whitespace. On most
102 systems this includes the characters space, tab, linefeed, return, formfeed, and
103 vertical tab. Do not change its definition --- the effect on the routines
104 :func:`strip` and :func:`split` is undefined.
110 Templates provide simpler string substitutions as described in :pep:`292`.
111 Instead of the normal ``%``\ -based substitutions, Templates support ``$``\
112 -based substitutions, using the following rules:
114 * ``$$`` is an escape; it is replaced with a single ``$``.
116 * ``$identifier`` names a substitution placeholder matching a mapping key of
117 ``"identifier"``. By default, ``"identifier"`` must spell a Python
118 identifier. The first non-identifier character after the ``$`` character
119 terminates this placeholder specification.
121 * ``${identifier}`` is equivalent to ``$identifier``. It is required when valid
122 identifier characters follow the placeholder but are not part of the
123 placeholder, such as ``"${noun}ification"``.
125 Any other appearance of ``$`` in the string will result in a :exc:`ValueError`
128 .. versionadded:: 2.4
130 The :mod:`string` module provides a :class:`Template` class that implements
131 these rules. The methods of :class:`Template` are:
134 .. class:: Template(template)
136 The constructor takes a single argument which is the template string.
139 .. method:: Template.substitute(mapping[, **kws])
141 Performs the template substitution, returning a new string. *mapping* is any
142 dictionary-like object with keys that match the placeholders in the template.
143 Alternatively, you can provide keyword arguments, where the keywords are the
144 placeholders. When both *mapping* and *kws* are given and there are duplicates,
145 the placeholders from *kws* take precedence.
148 .. method:: Template.safe_substitute(mapping[, **kws])
150 Like :meth:`substitute`, except that if placeholders are missing from *mapping*
151 and *kws*, instead of raising a :exc:`KeyError` exception, the original
152 placeholder will appear in the resulting string intact. Also, unlike with
153 :meth:`substitute`, any other appearances of the ``$`` will simply return ``$``
154 instead of raising :exc:`ValueError`.
156 While other exceptions may still occur, this method is called "safe" because
157 substitutions always tries to return a usable string instead of raising an
158 exception. In another sense, :meth:`safe_substitute` may be anything other than
159 safe, since it will silently ignore malformed templates containing dangling
160 delimiters, unmatched braces, or placeholders that are not valid Python
163 :class:`Template` instances also provide one public data attribute:
166 .. attribute:: string.template
168 This is the object passed to the constructor's *template* argument. In general,
169 you shouldn't change it, but read-only access is not enforced.
171 Here is an example of how to use a Template::
173 >>> from string import Template
174 >>> s = Template('$who likes $what')
175 >>> s.substitute(who='tim', what='kung pao')
177 >>> d = dict(who='tim')
178 >>> Template('Give $who $100').substitute(d)
179 Traceback (most recent call last):
181 ValueError: Invalid placeholder in string: line 1, col 10
182 >>> Template('$who likes $what').substitute(d)
183 Traceback (most recent call last):
186 >>> Template('$who likes $what').safe_substitute(d)
189 Advanced usage: you can derive subclasses of :class:`Template` to customize the
190 placeholder syntax, delimiter character, or the entire regular expression used
191 to parse template strings. To do this, you can override these class attributes:
193 * *delimiter* -- This is the literal string describing a placeholder introducing
194 delimiter. The default value ``$``. Note that this should *not* be a regular
195 expression, as the implementation will call :meth:`re.escape` on this string as
198 * *idpattern* -- This is the regular expression describing the pattern for
199 non-braced placeholders (the braces will be added automatically as
200 appropriate). The default value is the regular expression
201 ``[_a-z][_a-z0-9]*``.
203 Alternatively, you can provide the entire regular expression pattern by
204 overriding the class attribute *pattern*. If you do this, the value must be a
205 regular expression object with four named capturing groups. The capturing
206 groups correspond to the rules given above, along with the invalid placeholder
209 * *escaped* -- This group matches the escape sequence, e.g. ``$$``, in the
212 * *named* -- This group matches the unbraced placeholder name; it should not
213 include the delimiter in capturing group.
215 * *braced* -- This group matches the brace enclosed placeholder name; it should
216 not include either the delimiter or braces in the capturing group.
218 * *invalid* -- This group matches any other delimiter pattern (usually a single
219 delimiter), and it should appear last in the regular expression.
225 The following functions are available to operate on string and Unicode objects.
226 They are not available as string methods.
229 .. function:: capwords(s)
231 Split the argument into words using :func:`split`, capitalize each word using
232 :func:`capitalize`, and join the capitalized words using :func:`join`. Note
233 that this replaces runs of whitespace characters by a single space, and removes
234 leading and trailing whitespace.
237 .. function:: maketrans(from, to)
239 Return a translation table suitable for passing to :func:`translate`, that will
240 map each character in *from* into the character at the same position in *to*;
241 *from* and *to* must have the same length.
245 Don't use strings derived from :const:`lowercase` and :const:`uppercase` as
246 arguments; in some locales, these don't have the same length. For case
247 conversions, always use :func:`lower` and :func:`upper`.
250 Deprecated string functions
251 ---------------------------
253 The following list of functions are also defined as methods of string and
254 Unicode objects; see section :ref:`string-methods` for more information on
255 those. You should consider these functions as deprecated, although they will
256 not be removed until Python 3.0. The functions defined in this module are:
259 .. function:: atof(s)
262 Use the :func:`float` built-in function.
264 .. index:: builtin: float
266 Convert a string to a floating point number. The string must have the standard
267 syntax for a floating point literal in Python, optionally preceded by a sign
268 (``+`` or ``-``). Note that this behaves identical to the built-in function
269 :func:`float` when passed a string.
277 When passing in a string, values for NaN and Infinity may be returned, depending
278 on the underlying C library. The specific set of strings accepted which cause
279 these values to be returned depends entirely on the C library and is known to
283 .. function:: atoi(s[, base])
286 Use the :func:`int` built-in function.
288 .. index:: builtin: eval
290 Convert string *s* to an integer in the given *base*. The string must consist
291 of one or more digits, optionally preceded by a sign (``+`` or ``-``). The
292 *base* defaults to 10. If it is 0, a default base is chosen depending on the
293 leading characters of the string (after stripping the sign): ``0x`` or ``0X``
294 means 16, ``0`` means 8, anything else means 10. If *base* is 16, a leading
295 ``0x`` or ``0X`` is always accepted, though not required. This behaves
296 identically to the built-in function :func:`int` when passed a string. (Also
297 note: for a more flexible interpretation of numeric literals, use the built-in
298 function :func:`eval`.)
301 .. function:: atol(s[, base])
304 Use the :func:`long` built-in function.
306 .. index:: builtin: long
308 Convert string *s* to a long integer in the given *base*. The string must
309 consist of one or more digits, optionally preceded by a sign (``+`` or ``-``).
310 The *base* argument has the same meaning as for :func:`atoi`. A trailing ``l``
311 or ``L`` is not allowed, except if the base is 0. Note that when invoked
312 without *base* or with *base* set to 10, this behaves identical to the built-in
313 function :func:`long` when passed a string.
316 .. function:: capitalize(word)
318 Return a copy of *word* with only its first character capitalized.
321 .. function:: expandtabs(s[, tabsize])
323 Expand tabs in a string replacing them by one or more spaces, depending on the
324 current column and the given tab size. The column number is reset to zero after
325 each newline occurring in the string. This doesn't understand other non-printing
326 characters or escape sequences. The tab size defaults to 8.
329 .. function:: find(s, sub[, start[,end]])
331 Return the lowest index in *s* where the substring *sub* is found such that
332 *sub* is wholly contained in ``s[start:end]``. Return ``-1`` on failure.
333 Defaults for *start* and *end* and interpretation of negative values is the same
337 .. function:: rfind(s, sub[, start[, end]])
339 Like :func:`find` but find the highest index.
342 .. function:: index(s, sub[, start[, end]])
344 Like :func:`find` but raise :exc:`ValueError` when the substring is not found.
347 .. function:: rindex(s, sub[, start[, end]])
349 Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found.
352 .. function:: count(s, sub[, start[, end]])
354 Return the number of (non-overlapping) occurrences of substring *sub* in string
355 ``s[start:end]``. Defaults for *start* and *end* and interpretation of negative
356 values are the same as for slices.
359 .. function:: lower(s)
361 Return a copy of *s*, but with upper case letters converted to lower case.
364 .. function:: split(s[, sep[, maxsplit]])
366 Return a list of the words of the string *s*. If the optional second argument
367 *sep* is absent or ``None``, the words are separated by arbitrary strings of
368 whitespace characters (space, tab, newline, return, formfeed). If the second
369 argument *sep* is present and not ``None``, it specifies a string to be used as
370 the word separator. The returned list will then have one more item than the
371 number of non-overlapping occurrences of the separator in the string. The
372 optional third argument *maxsplit* defaults to 0. If it is nonzero, at most
373 *maxsplit* number of splits occur, and the remainder of the string is returned
374 as the final element of the list (thus, the list will have at most
375 ``maxsplit+1`` elements).
377 The behavior of split on an empty string depends on the value of *sep*. If *sep*
378 is not specified, or specified as ``None``, the result will be an empty list.
379 If *sep* is specified as any string, the result will be a list containing one
380 element which is an empty string.
383 .. function:: rsplit(s[, sep[, maxsplit]])
385 Return a list of the words of the string *s*, scanning *s* from the end. To all
386 intents and purposes, the resulting list of words is the same as returned by
387 :func:`split`, except when the optional third argument *maxsplit* is explicitly
388 specified and nonzero. When *maxsplit* is nonzero, at most *maxsplit* number of
389 splits -- the *rightmost* ones -- occur, and the remainder of the string is
390 returned as the first element of the list (thus, the list will have at most
391 ``maxsplit+1`` elements).
393 .. versionadded:: 2.4
396 .. function:: splitfields(s[, sep[, maxsplit]])
398 This function behaves identically to :func:`split`. (In the past, :func:`split`
399 was only used with one argument, while :func:`splitfields` was only used with
403 .. function:: join(words[, sep])
405 Concatenate a list or tuple of words with intervening occurrences of *sep*.
406 The default value for *sep* is a single space character. It is always true that
407 ``string.join(string.split(s, sep), sep)`` equals *s*.
410 .. function:: joinfields(words[, sep])
412 This function behaves identically to :func:`join`. (In the past, :func:`join`
413 was only used with one argument, while :func:`joinfields` was only used with two
414 arguments.) Note that there is no :meth:`joinfields` method on string objects;
415 use the :meth:`join` method instead.
418 .. function:: lstrip(s[, chars])
420 Return a copy of the string with leading characters removed. If *chars* is
421 omitted or ``None``, whitespace characters are removed. If given and not
422 ``None``, *chars* must be a string; the characters in the string will be
423 stripped from the beginning of the string this method is called on.
425 .. versionchanged:: 2.2.3
426 The *chars* parameter was added. The *chars* parameter cannot be passed in
427 earlier 2.2 versions.
430 .. function:: rstrip(s[, chars])
432 Return a copy of the string with trailing characters removed. If *chars* is
433 omitted or ``None``, whitespace characters are removed. If given and not
434 ``None``, *chars* must be a string; the characters in the string will be
435 stripped from the end of the string this method is called on.
437 .. versionchanged:: 2.2.3
438 The *chars* parameter was added. The *chars* parameter cannot be passed in
439 earlier 2.2 versions.
442 .. function:: strip(s[, chars])
444 Return a copy of the string with leading and trailing characters removed. If
445 *chars* is omitted or ``None``, whitespace characters are removed. If given and
446 not ``None``, *chars* must be a string; the characters in the string will be
447 stripped from the both ends of the string this method is called on.
449 .. versionchanged:: 2.2.3
450 The *chars* parameter was added. The *chars* parameter cannot be passed in
451 earlier 2.2 versions.
454 .. function:: swapcase(s)
456 Return a copy of *s*, but with lower case letters converted to upper case and
460 .. function:: translate(s, table[, deletechars])
462 Delete all characters from *s* that are in *deletechars* (if present), and then
463 translate the characters using *table*, which must be a 256-character string
464 giving the translation for each character value, indexed by its ordinal. If
465 *table* is ``None``, then only the character deletion step is performed.
468 .. function:: upper(s)
470 Return a copy of *s*, but with lower case letters converted to upper case.
473 .. function:: ljust(s, width)
477 These functions respectively left-justify, right-justify and center a string in
478 a field of given width. They return a string that is at least *width*
479 characters wide, created by padding the string *s* with spaces until the given
480 width on the right, left or both sides. The string is never truncated.
483 .. function:: zfill(s, width)
485 Pad a numeric string on the left with zero digits until the given width is
486 reached. Strings starting with a sign are handled correctly.
489 .. function:: replace(str, old, new[, maxreplace])
491 Return a copy of string *str* with all occurrences of substring *old* replaced
492 by *new*. If the optional argument *maxreplace* is given, the first
493 *maxreplace* occurrences are replaced.