doc/case_mapping.rdoc

   1 = Case Mapping
   2
   3 Some string-oriented methods use case mapping.
   4
   5 In String:
   6
   7 - String#capitalize
   8 - String#capitalize!
   9 - String#casecmp
  10 - String#casecmp?
  11 - String#downcase
  12 - String#downcase!
  13 - String#swapcase
  14 - String#swapcase!
  15 - String#upcase
  16 - String#upcase!
  17
  18 In Symbol:
  19
  20 - Symbol#capitalize
  21 - Symbol#casecmp
  22 - Symbol#casecmp?
  23 - Symbol#downcase
  24 - Symbol#swapcase
  25 - Symbol#upcase
  26
  27 == Default Case Mapping
  28
  29 By default, all of these methods use full Unicode case mapping,
  30 which is suitable for most languages.
  31 See {Section 3.13 (Default Case Algorithms) of the Unicode standard}[https://www.unicode.org/versions/latest/ch03.pdf].
  32
  33 Non-ASCII case mapping and folding are supported for UTF-8,
  34 UTF-16BE/LE, UTF-32BE/LE, and ISO-8859-1~16 Strings/Symbols.
  35
  36 Context-dependent case mapping as described in
  37 {Table 3-17 (Context Specification for Casing) of the Unicode standard}[https://www.unicode.org/versions/latest/ch03.pdf]
  38 is currently not supported.
  39
  40 In most cases, case conversions of a string have the same number of characters.
  41 There are exceptions (see also +:fold+ below):
  42
  43   s = "\u00DF" # => "ß"
  44   s.upcase     # => "SS"
  45   s = "\u0149" # => "ŉ"
  46   s.upcase     # => "ʼN"
  47
  48 Case mapping may also depend on locale (see also +:turkic+ below):
  49
  50   s = "\u0049"        # => "I"
  51   s.downcase          # => "i" # Dot above.
  52   s.downcase(:turkic) # => "ı" # No dot above.
  53
  54 Case changes may not be reversible:
  55
  56   s = 'Hello World!' # => "Hello World!"
  57   s.downcase         # => "hello world!"
  58   s.downcase.upcase  # => "HELLO WORLD!" # Different from original s.
  59
  60 Case changing methods may not maintain Unicode normalization.
  61 See String#unicode_normalize).
  62
  63 == Options for Case Mapping
  64
  65 Except for +casecmp+ and +casecmp?+,
  66 each of the case-mapping methods listed above
  67 accepts optional arguments, <tt>*options</tt>.
  68
  69 The arguments may be:
  70
  71 - +:ascii+ only.
  72 - +:fold+ only.
  73 - +:turkic+ or +:lithuanian+ or both.
  74
  75 The options:
  76
  77 - +:ascii+:
  78   ASCII-only mapping:
  79   uppercase letters ('A'..'Z') are mapped to lowercase letters ('a'..'z);
  80   other characters are not changed
  81
  82     s = "Foo \u00D8 \u00F8 Bar" # => "Foo Ø ø Bar"
  83     s.upcase                    # => "FOO Ø Ø BAR"
  84     s.downcase                  # => "foo ø ø bar"
  85     s.upcase(:ascii)            # => "FOO Ø ø BAR"
  86     s.downcase(:ascii)          # => "foo Ø ø bar"
  87
  88 - +:turkic+:
  89   Full Unicode case mapping, adapted for the Turkic languages
  90   that distinguish dotted and dotless I, for example Turkish and Azeri.
  91
  92     s = 'Türkiye'       # => "Türkiye"
  93     s.upcase            # => "TÜRKIYE"
  94     s.upcase(:turkic)   # => "TÜRKİYE" # Dot above.
  95
  96     s = 'TÜRKIYE'       # => "TÜRKIYE"
  97     s.downcase          # => "türkiye"
  98     s.downcase(:turkic) # => "türkıye" # No dot above.
  99
 100 - +:lithuanian+:
 101   Not yet implemented.
 102
 103 - +:fold+ (available only for String#downcase, String#downcase!,
 104   and Symbol#downcase):
 105   Unicode case folding,
 106   which is more far-reaching than Unicode case mapping.
 107
 108     s = "\u00DF"      # => "ß"
 109     s.downcase        # => "ß"
 110     s.downcase(:fold) # => "ss"
 111     s.upcase          # => "SS"
 112
 113     s = "\uFB04"      # => "ﬄ"
 114     s.downcase        # => "ﬄ"
 115     s.upcase          # => "FFL"
 116     s.downcase(:fold) # => "ffl"