1 # A +String+ object has an arbitrary sequence of bytes,
2 # typically representing text or binary data.
3 # A +String+ object may be created using String::new or as literals.
5 # String objects differ from Symbol objects in that Symbol objects are
6 # designed to be used as identifiers, instead of text or data.
8 # You can create a +String+ object explicitly with:
10 # - A {string literal}[rdoc-ref:syntax/literals.rdoc@String+Literals].
11 # - A {heredoc literal}[rdoc-ref:syntax/literals.rdoc@Here+Document+Literals].
13 # You can convert certain objects to Strings with:
17 # Some +String+ methods modify +self+.
18 # Typically, a method whose name ends with <tt>!</tt> modifies +self+
20 # often a similarly named method (without the <tt>!</tt>)
21 # returns a new string.
23 # In general, if there exist both bang and non-bang version of method,
24 # the bang! mutates and the non-bang! does not.
25 # However, a method without a bang can also mutate, such as String#replace.
27 # == Substitution Methods
29 # These methods perform substitutions:
31 # - String#sub: One substitution (or none); returns a new string.
32 # - String#sub!: One substitution (or none); returns +self+.
33 # - String#gsub: Zero or more substitutions; returns a new string.
34 # - String#gsub!: Zero or more substitutions; returns +self+.
36 # Each of these methods takes:
38 # - A first argument, +pattern+ (string or regexp),
39 # that specifies the substring(s) to be replaced.
43 # - A second argument, +replacement+ (string or hash),
44 # that determines the replacing string.
45 # - A block that will determine the replacing string.
47 # The examples in this section mostly use methods String#sub and String#gsub;
48 # the principles illustrated apply to all four substitution methods.
50 # <b>Argument +pattern+</b>
52 # Argument +pattern+ is commonly a regular expression:
55 # s.sub(/[aeiou]/, '*')# => "h*llo"
56 # s.gsub(/[aeiou]/, '*') # => "h*ll*"
57 # s.gsub(/[aeiou]/, '')# => "hll"
58 # s.sub(/ell/, 'al') # => "halo"
59 # s.gsub(/xyzzy/, '*') # => "hello"
60 # 'THX1138'.gsub(/\d+/, '00') # => "THX00"
62 # When +pattern+ is a string, all its characters are treated
63 # as ordinary characters (not as regexp special characters):
65 # 'THX1138'.gsub('\d+', '00') # => "THX1138"
67 # <b>+String+ +replacement+</b>
69 # If +replacement+ is a string, that string will determine
70 # the replacing string that is to be substituted for the matched text.
72 # Each of the examples above uses a simple string as the replacing string.
74 # +String+ +replacement+ may contain back-references to the pattern's captures:
76 # - <tt>\n</tt> (_n_ a non-negative integer) refers to <tt>$n</tt>.
77 # - <tt>\k<name></tt> refers to the named capture +name+.
79 # See Regexp for details.
81 # Note that within the string +replacement+, a character combination
82 # such as <tt>$&</tt> is treated as ordinary text, and not as
83 # a special match variable.
84 # However, you may refer to some special match variables using these
87 # - <tt>\&</tt> and <tt>\0</tt> correspond to <tt>$&</tt>,
88 # which contains the complete matched text.
89 # - <tt>\'</tt> corresponds to <tt>$'</tt>,
90 # which contains string after match.
91 # - <tt>\`</tt> corresponds to <tt>$`</tt>,
92 # which contains string before match.
93 # - <tt>\\+</tt> corresponds to <tt>$+</tt>,
94 # which contains last capture group.
96 # See Regexp for details.
98 # Note that <tt>\\\\</tt> is interpreted as an escape, i.e., a single backslash.
100 # Note also that a string literal consumes backslashes.
101 # See {String Literals}[rdoc-ref:syntax/literals.rdoc@String+Literals] for details about string literals.
103 # A back-reference is typically preceded by an additional backslash.
104 # For example, if you want to write a back-reference <tt>\&</tt> in
105 # +replacement+ with a double-quoted string literal, you need to write
106 # <tt>"..\\\\&.."</tt>.
108 # If you want to write a non-back-reference string <tt>\&</tt> in
109 # +replacement+, you need first to escape the backslash to prevent
110 # this method from interpreting it as a back-reference, and then you
111 # need to escape the backslashes again to prevent a string literal from
112 # consuming them: <tt>"..\\\\\\\\&.."</tt>.
114 # You may want to use the block form to avoid a lot of backslashes.
116 # <b>\Hash +replacement+</b>
118 # If argument +replacement+ is a hash, and +pattern+ matches one of its keys,
119 # the replacing string is the value for that key:
121 # h = {'foo' => 'bar', 'baz' => 'bat'}
122 # 'food'.sub('foo', h) # => "bard"
124 # Note that a symbol key does not match:
126 # h = {foo: 'bar', baz: 'bat'}
127 # 'food'.sub('foo', h) # => "d"
131 # In the block form, the current match string is passed to the block;
132 # the block's return value becomes the replacing string:
135 # '1234'.gsub(/\d/) {|match| s.succ! } # => "ABCD"
137 # Special match variables such as <tt>$1</tt>, <tt>$2</tt>, <tt>$`</tt>,
138 # <tt>$&</tt>, and <tt>$'</tt> are set appropriately.
140 # == Whitespace in Strings
142 # In class +String+, _whitespace_ is defined as a contiguous sequence of characters
143 # consisting of any mixture of the following:
145 # - NL (null): <tt>"\x00"</tt>, <tt>"\u0000"</tt>.
146 # - HT (horizontal tab): <tt>"\x09"</tt>, <tt>"\t"</tt>.
147 # - LF (line feed): <tt>"\x0a"</tt>, <tt>"\n"</tt>.
148 # - VT (vertical tab): <tt>"\x0b"</tt>, <tt>"\v"</tt>.
149 # - FF (form feed): <tt>"\x0c"</tt>, <tt>"\f"</tt>.
150 # - CR (carriage return): <tt>"\x0d"</tt>, <tt>"\r"</tt>.
151 # - SP (space): <tt>"\x20"</tt>, <tt>" "</tt>.
154 # Whitespace is relevant for these methods:
156 # - #lstrip, #lstrip!: strip leading whitespace.
157 # - #rstrip, #rstrip!: strip trailing whitespace.
158 # - #strip, #strip!: strip leading and trailing whitespace.
162 # A _slice_ of a string is a substring that is selected by certain criteria.
164 # These instance methods make use of slicing:
166 # - String#[] (also aliased as String#slice) returns a slice copied from +self+.
167 # - String#[]= returns a copy of +self+ with a slice replaced.
168 # - String#slice! returns +self+ with a slice removed.
170 # Each of the above methods takes arguments that determine the slice
171 # to be copied or replaced.
173 # The arguments have several forms.
174 # For string +string+, the forms are:
176 # - <tt>string[index]</tt>.
177 # - <tt>string[start, length]</tt>.
178 # - <tt>string[range]</tt>.
179 # - <tt>string[regexp, capture = 0]</tt>.
180 # - <tt>string[substring]</tt>.
182 # <b><tt>string[index]</tt></b>
184 # When non-negative integer argument +index+ is given,
185 # the slice is the 1-character substring found in +self+ at character offset +index+:
191 # 'こんにちは'[4] # => "は"
193 # When negative integer +index+ is given,
194 # the slice begins at the offset given by counting backward from the end of +self+:
198 # 'bar'[-20] # => nil
200 # <b><tt>string[start, length]</tt></b>
202 # When non-negative integer arguments +start+ and +length+ are given,
203 # the slice begins at character offset +start+, if it exists,
204 # and continues for +length+ characters, if available:
206 # 'foo'[0, 2] # => "fo"
207 # 'тест'[1, 2] # => "ес"
208 # 'こんにちは'[2, 2] # => "にち"
210 # 'foo'[2, 0] # => ""
211 # # Length not entirely available.
212 # 'foo'[1, 200] # => "oo"
213 # # Start out of range.
214 # 'foo'[4, 2] # => nil
216 # Special case: if +start+ is equal to the length of +self+,
217 # the slice is a new empty string:
219 # 'foo'[3, 2] # => ""
220 # 'foo'[3, 200] # => ""
222 # When negative +start+ and non-negative +length+ are given,
223 # the slice beginning is determined by counting backward from the end of +self+,
224 # and the slice continues for +length+ characters, if available:
226 # 'foo'[-2, 2] # => "oo"
227 # 'foo'[-2, 200] # => "oo"
228 # # Start out of range.
229 # 'foo'[-4, 2] # => nil
231 # When negative +length+ is given, there is no slice:
233 # 'foo'[1, -1] # => nil
234 # 'foo'[-2, -1] # => nil
236 # <b><tt>string[range]</tt></b>
238 # When Range argument +range+ is given,
239 # creates a substring of +string+ using the indices in +range+.
240 # The slice is then determined as above:
242 # 'foo'[0..1] # => "fo"
243 # 'foo'[0, 2] # => "fo"
245 # 'foo'[2...2] # => ""
246 # 'foo'[2, 0] # => ""
248 # 'foo'[1..200] # => "oo"
249 # 'foo'[1, 200] # => "oo"
251 # 'foo'[4..5] # => nil
252 # 'foo'[4, 2] # => nil
254 # 'foo'[-4..-3] # => nil
255 # 'foo'[-4, 2] # => nil
257 # 'foo'[3..4] # => ""
258 # 'foo'[3, 2] # => ""
260 # 'foo'[-2..-1] # => "oo"
261 # 'foo'[-2, 2] # => "oo"
263 # 'foo'[-2..197] # => "oo"
264 # 'foo'[-2, 200] # => "oo"
266 # <b><tt>string[regexp, capture = 0]</tt></b>
268 # When the Regexp argument +regexp+ is given,
269 # and the +capture+ argument is <tt>0</tt>,
270 # the slice is the first matching substring found in +self+:
272 # 'foo'[/o/] # => "o"
273 # 'foo'[/x/] # => nil
275 # s[/[aeiou](.)\1/] # => "ell"
276 # s[/[aeiou](.)\1/, 0] # => "ell"
278 # If argument +capture+ is given and not <tt>0</tt>,
279 # it should be either an capture group index (integer)
280 # or a capture group name (string or symbol);
281 # the slice is the specified capture (see Regexp@Groups+and+Captures):
284 # s[/[aeiou](.)\1/, 1] # => "l"
285 # s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, "non_vowel"] # => "l"
286 # s[/(?<vowel>[aeiou])(?<non_vowel>[^aeiou])/, :vowel] # => "e"
288 # If an invalid capture group index is given, there is no slice.
289 # If an invalid capture group name is given, +IndexError+ is raised.
291 # <b><tt>string[substring]</tt></b>
293 # When the single +String+ argument +substring+ is given,
294 # returns the substring from +self+ if found, otherwise +nil+:
296 # 'foo'['oo'] # => "oo"
297 # 'foo'['xx'] # => nil
301 # First, what's elsewhere. \Class +String+:
303 # - Inherits from {class Object}[rdoc-ref:Object@What-27s+Here].
304 # - Includes {module Comparable}[rdoc-ref:Comparable@What-27s+Here].
306 # Here, class +String+ provides methods that are useful for:
308 # - {Creating a String}[rdoc-ref:String@Methods+for+Creating+a+String]
309 # - {Frozen/Unfrozen Strings}[rdoc-ref:String@Methods+for+a+Frozen-2FUnfrozen+String]
310 # - {Querying}[rdoc-ref:String@Methods+for+Querying]
311 # - {Comparing}[rdoc-ref:String@Methods+for+Comparing]
312 # - {Modifying a String}[rdoc-ref:String@Methods+for+Modifying+a+String]
313 # - {Converting to New String}[rdoc-ref:String@Methods+for+Converting+to+New+String]
314 # - {Converting to Non-String}[rdoc-ref:String@Methods+for+Converting+to+Non-String]
315 # - {Iterating}[rdoc-ref:String@Methods+for+Iterating]
317 # === Methods for Creating a +String+
319 # - ::new: Returns a new string.
320 # - ::try_convert: Returns a new string created from a given object.
322 # === Methods for a Frozen/Unfrozen String
324 # - #+@: Returns a string that is not frozen: +self+, if not frozen;
325 # +self.dup+ otherwise.
326 # - #-@: Returns a string that is frozen: +self+, if already frozen;
327 # +self.freeze+ otherwise.
328 # - #freeze: Freezes +self+, if not already frozen; returns +self+.
330 # === Methods for Querying
334 # - #length, #size: Returns the count of characters (not bytes).
335 # - #empty?: Returns +true+ if +self.length+ is zero; +false+ otherwise.
336 # - #bytesize: Returns the count of bytes.
337 # - #count: Returns the count of substrings matching given strings.
341 # - #=~: Returns the index of the first substring that matches a given
342 # Regexp or other object; returns +nil+ if no match is found.
343 # - #index: Returns the index of the _first_ occurrence of a given substring;
344 # returns +nil+ if none found.
345 # - #rindex: Returns the index of the _last_ occurrence of a given substring;
346 # returns +nil+ if none found.
347 # - #include?: Returns +true+ if the string contains a given substring; +false+ otherwise.
348 # - #match: Returns a MatchData object if the string matches a given Regexp; +nil+ otherwise.
349 # - #match?: Returns +true+ if the string matches a given Regexp; +false+ otherwise.
350 # - #start_with?: Returns +true+ if the string begins with any of the given substrings.
351 # - #end_with?: Returns +true+ if the string ends with any of the given substrings.
355 # - #encoding\: Returns the Encoding object that represents the encoding of the string.
356 # - #unicode_normalized?: Returns +true+ if the string is in Unicode normalized form; +false+ otherwise.
357 # - #valid_encoding?: Returns +true+ if the string contains only characters that are valid
359 # - #ascii_only?: Returns +true+ if the string has only ASCII characters; +false+ otherwise.
363 # - #sum: Returns a basic checksum for the string: the sum of each byte.
364 # - #hash: Returns the integer hash code.
366 # === Methods for Comparing
368 # - #==, #===: Returns +true+ if a given other string has the same content as +self+.
369 # - #eql?: Returns +true+ if the content is the same as the given other string.
370 # - #<=>: Returns -1, 0, or 1 as a given other string is smaller than,
371 # equal to, or larger than +self+.
372 # - #casecmp: Ignoring case, returns -1, 0, or 1 as a given
373 # other string is smaller than, equal to, or larger than +self+.
374 # - #casecmp?: Returns +true+ if the string is equal to a given string after Unicode case folding;
377 # === Methods for Modifying a +String+
379 # Each of these methods modifies +self+.
383 # - #insert: Returns +self+ with a given string inserted at a given offset.
384 # - #<<: Returns +self+ concatenated with a given string or integer.
388 # - #sub!: Replaces the first substring that matches a given pattern with a given replacement string;
389 # returns +self+ if any changes, +nil+ otherwise.
390 # - #gsub!: Replaces each substring that matches a given pattern with a given replacement string;
391 # returns +self+ if any changes, +nil+ otherwise.
392 # - #succ!, #next!: Returns +self+ modified to become its own successor.
393 # - #replace: Returns +self+ with its entire content replaced by a given string.
394 # - #reverse!: Returns +self+ with its characters in reverse order.
395 # - #setbyte: Sets the byte at a given integer offset to a given value; returns the argument.
396 # - #tr!: Replaces specified characters in +self+ with specified replacement characters;
397 # returns +self+ if any changes, +nil+ otherwise.
398 # - #tr_s!: Replaces specified characters in +self+ with specified replacement characters,
399 # removing duplicates from the substrings that were modified;
400 # returns +self+ if any changes, +nil+ otherwise.
404 # - #capitalize!: Upcases the initial character and downcases all others;
405 # returns +self+ if any changes, +nil+ otherwise.
406 # - #downcase!: Downcases all characters; returns +self+ if any changes, +nil+ otherwise.
407 # - #upcase!: Upcases all characters; returns +self+ if any changes, +nil+ otherwise.
408 # - #swapcase!: Upcases each downcase character and downcases each upcase character;
409 # returns +self+ if any changes, +nil+ otherwise.
413 # - #encode!: Returns +self+ with all characters transcoded from one given encoding into another.
414 # - #unicode_normalize!: Unicode-normalizes +self+; returns +self+.
415 # - #scrub!: Replaces each invalid byte with a given character; returns +self+.
416 # - #force_encoding: Changes the encoding to a given encoding; returns +self+.
420 # - #clear: Removes all content, so that +self+ is empty; returns +self+.
421 # - #slice!, #[]=: Removes a substring determined by a given index, start/length, range, regexp, or substring.
422 # - #squeeze!: Removes contiguous duplicate characters; returns +self+.
423 # - #delete!: Removes characters as determined by the intersection of substring arguments.
424 # - #lstrip!: Removes leading whitespace; returns +self+ if any changes, +nil+ otherwise.
425 # - #rstrip!: Removes trailing whitespace; returns +self+ if any changes, +nil+ otherwise.
426 # - #strip!: Removes leading and trailing whitespace; returns +self+ if any changes, +nil+ otherwise.
427 # - #chomp!: Removes trailing record separator, if found; returns +self+ if any changes, +nil+ otherwise.
428 # - #chop!: Removes trailing newline characters if found; otherwise removes the last character;
429 # returns +self+ if any changes, +nil+ otherwise.
431 # === Methods for Converting to New +String+
433 # Each of these methods returns a new +String+ based on +self+,
434 # often just a modified copy of +self+.
438 # - #*: Returns the concatenation of multiple copies of +self+,
439 # - #+: Returns the concatenation of +self+ and a given other string.
440 # - #center: Returns a copy of +self+ centered between pad substring.
441 # - #concat: Returns the concatenation of +self+ with given other strings.
442 # - #prepend: Returns the concatenation of a given other string with +self+.
443 # - #ljust: Returns a copy of +self+ of a given length, right-padded with a given other string.
444 # - #rjust: Returns a copy of +self+ of a given length, left-padded with a given other string.
448 # - #b: Returns a copy of +self+ with ASCII-8BIT encoding.
449 # - #scrub: Returns a copy of +self+ with each invalid byte replaced with a given character.
450 # - #unicode_normalize: Returns a copy of +self+ with each character Unicode-normalized.
451 # - #encode: Returns a copy of +self+ with all characters transcoded from one given encoding into another.
455 # - #dump: Returns a copy of +self+ with all non-printing characters replaced by \xHH notation
456 # and all special characters escaped.
457 # - #undump: Returns a copy of +self+ with all <tt>\xNN</tt> notation replace by <tt>\uNNNN</tt> notation
458 # and all escaped characters unescaped.
459 # - #sub: Returns a copy of +self+ with the first substring matching a given pattern
460 # replaced with a given replacement string;.
461 # - #gsub: Returns a copy of +self+ with each substring that matches a given pattern
462 # replaced with a given replacement string.
463 # - #succ, #next: Returns the string that is the successor to +self+.
464 # - #reverse: Returns a copy of +self+ with its characters in reverse order.
465 # - #tr: Returns a copy of +self+ with specified characters replaced with specified replacement characters.
466 # - #tr_s: Returns a copy of +self+ with specified characters replaced with
467 # specified replacement characters,
468 # removing duplicates from the substrings that were modified.
469 # - #%: Returns the string resulting from formatting a given object into +self+
473 # - #capitalize: Returns a copy of +self+ with the first character upcased
474 # and all other characters downcased.
475 # - #downcase: Returns a copy of +self+ with all characters downcased.
476 # - #upcase: Returns a copy of +self+ with all characters upcased.
477 # - #swapcase: Returns a copy of +self+ with all upcase characters downcased
478 # and all downcase characters upcased.
482 # - #delete: Returns a copy of +self+ with characters removed
483 # - #delete_prefix: Returns a copy of +self+ with a given prefix removed.
484 # - #delete_suffix: Returns a copy of +self+ with a given suffix removed.
485 # - #lstrip: Returns a copy of +self+ with leading whitespace removed.
486 # - #rstrip: Returns a copy of +self+ with trailing whitespace removed.
487 # - #strip: Returns a copy of +self+ with leading and trailing whitespace removed.
488 # - #chomp: Returns a copy of +self+ with a trailing record separator removed, if found.
489 # - #chop: Returns a copy of +self+ with trailing newline characters or the last character removed.
490 # - #squeeze: Returns a copy of +self+ with contiguous duplicate characters removed.
491 # - #[], #slice: Returns a substring determined by a given index, start/length, or range, or string.
492 # - #byteslice: Returns a substring determined by a given index, start/length, or range.
493 # - #chr: Returns the first character.
497 # - #to_s, $to_str: If +self+ is a subclass of +String+, returns +self+ copied into a +String+;
498 # otherwise, returns +self+.
500 # === Methods for Converting to Non-+String+
502 # Each of these methods converts the contents of +self+ to a non-+String+.
504 # <em>Characters, Bytes, and Clusters</em>
506 # - #bytes: Returns an array of the bytes in +self+.
507 # - #chars: Returns an array of the characters in +self+.
508 # - #codepoints: Returns an array of the integer ordinals in +self+.
509 # - #getbyte: Returns an integer byte as determined by a given index.
510 # - #grapheme_clusters: Returns an array of the grapheme clusters in +self+.
514 # - #lines: Returns an array of the lines in +self+, as determined by a given record separator.
515 # - #partition: Returns a 3-element array determined by the first substring that matches
516 # a given substring or regexp,
517 # - #rpartition: Returns a 3-element array determined by the last substring that matches
518 # a given substring or regexp,
519 # - #split: Returns an array of substrings determined by a given delimiter -- regexp or string --
520 # or, if a block given, passes those substrings to the block.
524 # - #scan: Returns an array of substrings matching a given regexp or string, or,
525 # if a block given, passes each matching substring to the block.
526 # - #unpack: Returns an array of substrings extracted from +self+ according to a given format.
527 # - #unpack1: Returns the first substring extracted from +self+ according to a given format.
531 # - #hex: Returns the integer value of the leading characters, interpreted as hexadecimal digits.
532 # - #oct: Returns the integer value of the leading characters, interpreted as octal digits.
533 # - #ord: Returns the integer ordinal of the first character in +self+.
534 # - #to_i: Returns the integer value of leading characters, interpreted as an integer.
535 # - #to_f: Returns the floating-point value of leading characters, interpreted as a floating-point number.
537 # <em>Strings and Symbols</em>
539 # - #inspect: Returns copy of +self+, enclosed in double-quotes, with special characters escaped.
540 # - #to_sym, #intern: Returns the symbol corresponding to +self+.
542 # === Methods for Iterating
544 # - #each_byte: Calls the given block with each successive byte in +self+.
545 # - #each_char: Calls the given block with each successive character in +self+.
546 # - #each_codepoint: Calls the given block with each successive integer codepoint in +self+.
547 # - #each_grapheme_cluster: Calls the given block with each successive grapheme cluster in +self+.
548 # - #each_line: Calls the given block with each successive line in +self+,
549 # as determined by a given record separator.
550 # - #upto: Calls the given block with each string value returned by successive calls to #succ.