Doc/library/string.rst

   1
   2 :mod:`string` --- Common string operations
   3 ==========================================
   4
   5 .. module:: string
   6    :synopsis: Common string operations.
   7
   8
   9 .. index:: module: re
  10
  11 The :mod:`string` module contains a number of useful constants and
  12 classes, as well as some deprecated legacy functions that are also
  13 available as methods on strings. In addition, Python's built-in string
  14 classes support the sequence type methods described in the
  15 :ref:`typesseq` section, and also the string-specific methods described
  16 in the :ref:`string-methods` section. To output formatted strings use
  17 template strings or the ``%`` operator described in the
  18 :ref:`string-formatting` section. Also, see the :mod:`re` module for
  19 string functions based on regular expressions.
  20
  21
  22 String constants
  23 ----------------
  24
  25 The constants defined in this module are:
  26
  27
  28 .. data:: ascii_letters
  29
  30    The concatenation of the :const:`ascii_lowercase` and :const:`ascii_uppercase`
  31    constants described below.  This value is not locale-dependent.
  32
  33
  34 .. data:: ascii_lowercase
  35
  36    The lowercase letters ``'abcdefghijklmnopqrstuvwxyz'``.  This value is not
  37    locale-dependent and will not change.
  38
  39
  40 .. data:: ascii_uppercase
  41
  42    The uppercase letters ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``.  This value is not
  43    locale-dependent and will not change.
  44
  45
  46 .. data:: digits
  47
  48    The string ``'0123456789'``.
  49
  50
  51 .. data:: hexdigits
  52
  53    The string ``'0123456789abcdefABCDEF'``.
  54
  55
  56 .. data:: letters
  57
  58    The concatenation of the strings :const:`lowercase` and :const:`uppercase`
  59    described below.  The specific value is locale-dependent, and will be updated
  60    when :func:`locale.setlocale` is called.
  61
  62
  63 .. data:: lowercase
  64
  65    A string containing all the characters that are considered lowercase letters.
  66    On most systems this is the string ``'abcdefghijklmnopqrstuvwxyz'``.  Do not
  67    change its definition --- the effect on the routines :func:`upper` and
  68    :func:`swapcase` is undefined.  The specific value is locale-dependent, and will
  69    be updated when :func:`locale.setlocale` is called.
  70
  71
  72 .. data:: octdigits
  73
  74    The string ``'01234567'``.
  75
  76
  77 .. data:: punctuation
  78
  79    String of ASCII characters which are considered punctuation characters in the
  80    ``C`` locale.
  81
  82
  83 .. data:: printable
  84
  85    String of characters which are considered printable.  This is a combination of
  86    :const:`digits`, :const:`letters`, :const:`punctuation`, and
  87    :const:`whitespace`.
  88
  89
  90 .. data:: uppercase
  91
  92    A string containing all the characters that are considered uppercase letters.
  93    On most systems this is the string ``'ABCDEFGHIJKLMNOPQRSTUVWXYZ'``.  Do not
  94    change its definition --- the effect on the routines :func:`lower` and
  95    :func:`swapcase` is undefined.  The specific value is locale-dependent, and will
  96    be updated when :func:`locale.setlocale` is called.
  97
  98
  99 .. data:: whitespace
 100
 101    A string containing all characters that are considered whitespace. On most
 102    systems this includes the characters space, tab, linefeed, return, formfeed, and
 103    vertical tab.  Do not change its definition --- the effect on the routines
 104    :func:`strip` and :func:`split` is undefined.
 105
 106
 107 Template strings
 108 ----------------
 109
 110 Templates provide simpler string substitutions as described in :pep:`292`.
 111 Instead of the normal ``%``\ -based substitutions, Templates support ``$``\
 112 -based substitutions, using the following rules:
 113
 114 * ``$$`` is an escape; it is replaced with a single ``$``.
 115
 116 * ``$identifier`` names a substitution placeholder matching a mapping key of
 117   ``"identifier"``.  By default, ``"identifier"`` must spell a Python
 118   identifier.  The first non-identifier character after the ``$`` character
 119   terminates this placeholder specification.
 120
 121 * ``${identifier}`` is equivalent to ``$identifier``.  It is required when valid
 122   identifier characters follow the placeholder but are not part of the
 123   placeholder, such as ``"${noun}ification"``.
 124
 125 Any other appearance of ``$`` in the string will result in a :exc:`ValueError`
 126 being raised.
 127
 128 .. versionadded:: 2.4
 129
 130 The :mod:`string` module provides a :class:`Template` class that implements
 131 these rules.  The methods of :class:`Template` are:
 132
 133
 134 .. class:: Template(template)
 135
 136    The constructor takes a single argument which is the template string.
 137
 138
 139 .. method:: Template.substitute(mapping[, **kws])
 140
 141    Performs the template substitution, returning a new string.  *mapping* is any
 142    dictionary-like object with keys that match the placeholders in the template.
 143    Alternatively, you can provide keyword arguments, where the keywords are the
 144    placeholders.  When both *mapping* and *kws* are given and there are duplicates,
 145    the placeholders from *kws* take precedence.
 146
 147
 148 .. method:: Template.safe_substitute(mapping[, **kws])
 149
 150    Like :meth:`substitute`, except that if placeholders are missing from *mapping*
 151    and *kws*, instead of raising a :exc:`KeyError` exception, the original
 152    placeholder will appear in the resulting string intact.  Also, unlike with
 153    :meth:`substitute`, any other appearances of the ``$`` will simply return ``$``
 154    instead of raising :exc:`ValueError`.
 155
 156    While other exceptions may still occur, this method is called "safe" because
 157    substitutions always tries to return a usable string instead of raising an
 158    exception.  In another sense, :meth:`safe_substitute` may be anything other than
 159    safe, since it will silently ignore malformed templates containing dangling
 160    delimiters, unmatched braces, or placeholders that are not valid Python
 161    identifiers.
 162
 163 :class:`Template` instances also provide one public data attribute:
 164
 165
 166 .. attribute:: string.template
 167
 168    This is the object passed to the constructor's *template* argument.  In general,
 169    you shouldn't change it, but read-only access is not enforced.
 170
 171 Here is an example of how to use a Template::
 172
 173    >>> from string import Template
 174    >>> s = Template('$who likes $what')
 175    >>> s.substitute(who='tim', what='kung pao')
 176    'tim likes kung pao'
 177    >>> d = dict(who='tim')
 178    >>> Template('Give $who $100').substitute(d)
 179    Traceback (most recent call last):
 180    [...]
 181    ValueError: Invalid placeholder in string: line 1, col 10
 182    >>> Template('$who likes $what').substitute(d)
 183    Traceback (most recent call last):
 184    [...]
 185    KeyError: 'what'
 186    >>> Template('$who likes $what').safe_substitute(d)
 187    'tim likes $what'
 188
 189 Advanced usage: you can derive subclasses of :class:`Template` to customize the
 190 placeholder syntax, delimiter character, or the entire regular expression used
 191 to parse template strings.  To do this, you can override these class attributes:
 192
 193 * *delimiter* -- This is the literal string describing a placeholder introducing
 194   delimiter.  The default value ``$``.  Note that this should *not* be a regular
 195   expression, as the implementation will call :meth:`re.escape` on this string as
 196   needed.
 197
 198 * *idpattern* -- This is the regular expression describing the pattern for
 199   non-braced placeholders (the braces will be added automatically as
 200   appropriate).  The default value is the regular expression
 201   ``[_a-z][_a-z0-9]*``.
 202
 203 Alternatively, you can provide the entire regular expression pattern by
 204 overriding the class attribute *pattern*.  If you do this, the value must be a
 205 regular expression object with four named capturing groups.  The capturing
 206 groups correspond to the rules given above, along with the invalid placeholder
 207 rule:
 208
 209 * *escaped* -- This group matches the escape sequence, e.g. ``$$``, in the
 210   default pattern.
 211
 212 * *named* -- This group matches the unbraced placeholder name; it should not
 213   include the delimiter in capturing group.
 214
 215 * *braced* -- This group matches the brace enclosed placeholder name; it should
 216   not include either the delimiter or braces in the capturing group.
 217
 218 * *invalid* -- This group matches any other delimiter pattern (usually a single
 219   delimiter), and it should appear last in the regular expression.
 220
 221
 222 String functions
 223 ----------------
 224
 225 The following functions are available to operate on string and Unicode objects.
 226 They are not available as string methods.
 227
 228
 229 .. function:: capwords(s)
 230
 231    Split the argument into words using :func:`split`, capitalize each word using
 232    :func:`capitalize`, and join the capitalized words using :func:`join`.  Note
 233    that this replaces runs of whitespace characters by a single space, and removes
 234    leading and trailing whitespace.
 235
 236
 237 .. function:: maketrans(from, to)
 238
 239    Return a translation table suitable for passing to :func:`translate`, that will
 240    map each character in *from* into the character at the same position in *to*;
 241    *from* and *to* must have the same length.
 242
 243    .. warning::
 244
 245       Don't use strings derived from :const:`lowercase` and :const:`uppercase` as
 246       arguments; in some locales, these don't have the same length.  For case
 247       conversions, always use :func:`lower` and :func:`upper`.
 248
 249
 250 Deprecated string functions
 251 ---------------------------
 252
 253 The following list of functions are also defined as methods of string and
 254 Unicode objects; see section :ref:`string-methods` for more information on
 255 those.  You should consider these functions as deprecated, although they will
 256 not be removed until Python 3.0.  The functions defined in this module are:
 257
 258
 259 .. function:: atof(s)
 260
 261    .. deprecated:: 2.0
 262       Use the :func:`float` built-in function.
 263
 264    .. index:: builtin: float
 265
 266    Convert a string to a floating point number.  The string must have the standard
 267    syntax for a floating point literal in Python, optionally preceded by a sign
 268    (``+`` or ``-``).  Note that this behaves identical to the built-in function
 269    :func:`float` when passed a string.
 270
 271    .. note::
 272
 273       .. index::
 274          single: NaN
 275          single: Infinity
 276
 277       When passing in a string, values for NaN and Infinity may be returned, depending
 278       on the underlying C library.  The specific set of strings accepted which cause
 279       these values to be returned depends entirely on the C library and is known to
 280       vary.
 281
 282
 283 .. function:: atoi(s[, base])
 284
 285    .. deprecated:: 2.0
 286       Use the :func:`int` built-in function.
 287
 288    .. index:: builtin: eval
 289
 290    Convert string *s* to an integer in the given *base*.  The string must consist
 291    of one or more digits, optionally preceded by a sign (``+`` or ``-``).  The
 292    *base* defaults to 10.  If it is 0, a default base is chosen depending on the
 293    leading characters of the string (after stripping the sign): ``0x`` or ``0X``
 294    means 16, ``0`` means 8, anything else means 10.  If *base* is 16, a leading
 295    ``0x`` or ``0X`` is always accepted, though not required.  This behaves
 296    identically to the built-in function :func:`int` when passed a string.  (Also
 297    note: for a more flexible interpretation of numeric literals, use the built-in
 298    function :func:`eval`.)
 299
 300
 301 .. function:: atol(s[, base])
 302
 303    .. deprecated:: 2.0
 304       Use the :func:`long` built-in function.
 305
 306    .. index:: builtin: long
 307
 308    Convert string *s* to a long integer in the given *base*. The string must
 309    consist of one or more digits, optionally preceded by a sign (``+`` or ``-``).
 310    The *base* argument has the same meaning as for :func:`atoi`.  A trailing ``l``
 311    or ``L`` is not allowed, except if the base is 0.  Note that when invoked
 312    without *base* or with *base* set to 10, this behaves identical to the built-in
 313    function :func:`long` when passed a string.
 314
 315
 316 .. function:: capitalize(word)
 317
 318    Return a copy of *word* with only its first character capitalized.
 319
 320
 321 .. function:: expandtabs(s[, tabsize])
 322
 323    Expand tabs in a string replacing them by one or more spaces, depending on the
 324    current column and the given tab size.  The column number is reset to zero after
 325    each newline occurring in the string. This doesn't understand other non-printing
 326    characters or escape sequences.  The tab size defaults to 8.
 327
 328
 329 .. function:: find(s, sub[, start[,end]])
 330
 331    Return the lowest index in *s* where the substring *sub* is found such that
 332    *sub* is wholly contained in ``s[start:end]``.  Return ``-1`` on failure.
 333    Defaults for *start* and *end* and interpretation of negative values is the same
 334    as for slices.
 335
 336
 337 .. function:: rfind(s, sub[, start[, end]])
 338
 339    Like :func:`find` but find the highest index.
 340
 341
 342 .. function:: index(s, sub[, start[, end]])
 343
 344    Like :func:`find` but raise :exc:`ValueError` when the substring is not found.
 345
 346
 347 .. function:: rindex(s, sub[, start[, end]])
 348
 349    Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found.
 350
 351
 352 .. function:: count(s, sub[, start[, end]])
 353
 354    Return the number of (non-overlapping) occurrences of substring *sub* in string
 355    ``s[start:end]``. Defaults for *start* and *end* and interpretation of negative
 356    values are the same as for slices.
 357
 358
 359 .. function:: lower(s)
 360
 361    Return a copy of *s*, but with upper case letters converted to lower case.
 362
 363
 364 .. function:: split(s[, sep[, maxsplit]])
 365
 366    Return a list of the words of the string *s*.  If the optional second argument
 367    *sep* is absent or ``None``, the words are separated by arbitrary strings of
 368    whitespace characters (space, tab,  newline, return, formfeed).  If the second
 369    argument *sep* is present and not ``None``, it specifies a string to be used as
 370    the  word separator.  The returned list will then have one more item than the
 371    number of non-overlapping occurrences of the separator in the string.  The
 372    optional third argument *maxsplit* defaults to 0.  If it is nonzero, at most
 373    *maxsplit* number of splits occur, and the remainder of the string is returned
 374    as the final element of the list (thus, the list will have at most
 375    ``maxsplit+1`` elements).
 376
 377    The behavior of split on an empty string depends on the value of *sep*. If *sep*
 378    is not specified, or specified as ``None``, the result will be an empty list.
 379    If *sep* is specified as any string, the result will be a list containing one
 380    element which is an empty string.
 381
 382
 383 .. function:: rsplit(s[, sep[, maxsplit]])
 384
 385    Return a list of the words of the string *s*, scanning *s* from the end.  To all
 386    intents and purposes, the resulting list of words is the same as returned by
 387    :func:`split`, except when the optional third argument *maxsplit* is explicitly
 388    specified and nonzero.  When *maxsplit* is nonzero, at most *maxsplit* number of
 389    splits -- the *rightmost* ones -- occur, and the remainder of the string is
 390    returned as the first element of the list (thus, the list will have at most
 391    ``maxsplit+1`` elements).
 392
 393    .. versionadded:: 2.4
 394
 395
 396 .. function:: splitfields(s[, sep[, maxsplit]])
 397
 398    This function behaves identically to :func:`split`.  (In the past, :func:`split`
 399    was only used with one argument, while :func:`splitfields` was only used with
 400    two arguments.)
 401
 402
 403 .. function:: join(words[, sep])
 404
 405    Concatenate a list or tuple of words with intervening occurrences of  *sep*.
 406    The default value for *sep* is a single space character.  It is always true that
 407    ``string.join(string.split(s, sep), sep)`` equals *s*.
 408
 409
 410 .. function:: joinfields(words[, sep])
 411
 412    This function behaves identically to :func:`join`.  (In the past,  :func:`join`
 413    was only used with one argument, while :func:`joinfields` was only used with two
 414    arguments.) Note that there is no :meth:`joinfields` method on string objects;
 415    use the :meth:`join` method instead.
 416
 417
 418 .. function:: lstrip(s[, chars])
 419
 420    Return a copy of the string with leading characters removed.  If *chars* is
 421    omitted or ``None``, whitespace characters are removed.  If given and not
 422    ``None``, *chars* must be a string; the characters in the string will be
 423    stripped from the beginning of the string this method is called on.
 424
 425    .. versionchanged:: 2.2.3
 426       The *chars* parameter was added.  The *chars* parameter cannot be passed in
 427       earlier 2.2 versions.
 428
 429
 430 .. function:: rstrip(s[, chars])
 431
 432    Return a copy of the string with trailing characters removed.  If *chars* is
 433    omitted or ``None``, whitespace characters are removed.  If given and not
 434    ``None``, *chars* must be a string; the characters in the string will be
 435    stripped from the end of the string this method is called on.
 436
 437    .. versionchanged:: 2.2.3
 438       The *chars* parameter was added.  The *chars* parameter cannot be passed in
 439       earlier 2.2 versions.
 440
 441
 442 .. function:: strip(s[, chars])
 443
 444    Return a copy of the string with leading and trailing characters removed.  If
 445    *chars* is omitted or ``None``, whitespace characters are removed.  If given and
 446    not ``None``, *chars* must be a string; the characters in the string will be
 447    stripped from the both ends of the string this method is called on.
 448
 449    .. versionchanged:: 2.2.3
 450       The *chars* parameter was added.  The *chars* parameter cannot be passed in
 451       earlier 2.2 versions.
 452
 453
 454 .. function:: swapcase(s)
 455
 456    Return a copy of *s*, but with lower case letters converted to upper case and
 457    vice versa.
 458
 459
 460 .. function:: translate(s, table[, deletechars])
 461
 462    Delete all characters from *s* that are in *deletechars* (if  present), and then
 463    translate the characters using *table*, which  must be a 256-character string
 464    giving the translation for each character value, indexed by its ordinal.  If
 465    *table* is ``None``, then only the character deletion step is performed.
 466
 467
 468 .. function:: upper(s)
 469
 470    Return a copy of *s*, but with lower case letters converted to upper case.
 471
 472
 473 .. function:: ljust(s, width)
 474               rjust(s, width)
 475               center(s, width)
 476
 477    These functions respectively left-justify, right-justify and center a string in
 478    a field of given width.  They return a string that is at least *width*
 479    characters wide, created by padding the string *s* with spaces until the given
 480    width on the right, left or both sides.  The string is never truncated.
 481
 482
 483 .. function:: zfill(s, width)
 484
 485    Pad a numeric string on the left with zero digits until the given width is
 486    reached.  Strings starting with a sign are handled correctly.
 487
 488
 489 .. function:: replace(str, old, new[, maxreplace])
 490
 491    Return a copy of string *str* with all occurrences of substring *old* replaced
 492    by *new*.  If the optional argument *maxreplace* is given, the first
 493    *maxreplace* occurrences are replaced.
 494