4 Most of the Mozilla code uses a C++ class hierarchy to pass string data,
5 rather than using raw pointers. This guide documents the string classes which
6 are visible to code within the Mozilla codebase (code which is linked into
12 The string classes are a library of C++ classes which are used to manage
13 buffers of wide (16-bit) and narrow (8-bit) character strings. The headers
14 and implementation are in the `xpcom/string
15 <https://searchfox.org/mozilla-central/source/xpcom/string>`_ directory. All
16 strings are stored as a single contiguous buffer of characters.
18 The 8-bit and 16-bit string classes have completely separate base classes,
19 but share the same APIs. As a result, you cannot assign a 8-bit string to a
20 16-bit string without some kind of conversion helper class or routine. For
21 the purpose of this document, we will refer to the 16-bit string classes in
22 class documentation. Every 16-bit class has an equivalent 8-bit class:
24 ===================== ======================
26 ===================== ======================
27 ``nsAString`` ``nsACString``
28 ``nsString`` ``nsCString``
29 ``nsAutoString`` ``nsAutoCString``
30 ``nsDependentString`` ``nsDependentCString``
31 ===================== ======================
33 The string classes distinguish, as part of the type hierarchy, between
34 strings that must have a null-terminator at the end of their buffer
35 (``ns[C]String``) and strings that are not required to have a null-terminator
36 (``nsA[C]String``). nsA[C]String is the base of the string classes (since it
37 imposes fewer requirements) and ``ns[C]String`` is a class derived from it.
38 Functions taking strings as parameters should generally take one of these
41 In order to avoid unnecessary copying of string data (which can have
42 significant performance cost), the string classes support different ownership
43 models. All string classes support the following three ownership models
46 * reference counted, copy-on-write, buffers (the default)
48 * adopted buffers (a buffer that the string class owns, but is not reference
49 counted, because it came from somewhere else)
51 * dependent buffers, that is, an underlying buffer that the string class does
52 not own, but that the caller that constructed the string guarantees will
53 outlive the string instance
55 Auto strings will prefer reference counting an existing reference-counted
56 buffer over their stack buffer, but will otherwise use their stack buffer for
57 anything that will fit in it.
59 There are a number of additional string classes:
62 * Classes which exist primarily as constructors for the other types,
63 particularly ``nsDependent[C]String`` and ``nsDependent[C]Substring``. These
64 types are really just convenient notation for constructing an
65 ``nsA[C]String`` with a non-default ownership mode; they should not be
66 thought of as different types.
68 * ``nsLiteral[C]String`` which should rarely be constructed explicitly but
69 usually through the ``""_ns`` and ``u""_ns`` user-defined string literals.
70 ``nsLiteral[C]String`` is trivially constructible and destructible, and
71 therefore does not emit construction/destruction code when stored in statics,
72 as opposed to the other string classes.
74 The Major String Classes
75 ------------------------
77 The list below describes the main base classes. Once you are familiar with
78 them, see the appendix describing What Class to Use When.
81 * **nsAString**/**nsACString**: the abstract base class for all strings. It
82 provides an API for assignment, individual character access, basic
83 manipulation of characters in the string, and string comparison. This class
84 corresponds to the XPIDL ``AString`` or ``ACString`` parameter types.
85 ``nsA[C]String`` is not necessarily null-terminated.
87 * **nsString**/**nsCString**: builds on ``nsA[C]String`` by guaranteeing a
88 null-terminated storage. This allows for a method (``.get()``) to access the
89 underlying character buffer.
91 The remainder of the string classes inherit from either ``nsA[C]String`` or
92 ``ns[C]String``. Thus, every string class is compatible with ``nsA[C]String``.
96 In code which is generic over string width, ``nsA[C]String`` is sometimes
97 known as ``nsTSubstring<CharT>``. ``nsAString`` is a type alias for
98 ``nsTSubstring<char16_t>``, and ``nsACString`` is a type alias for
99 ``nsTSubstring<char>``.
103 The type ``nsLiteral[C]String`` technically does not inherit from
104 ``nsA[C]String``, but instead inherits from ``nsStringRepr<CharT>``. This
105 allows the type to not generate destructors when stored in static
108 It can be implicitly coerced to ``const ns[C]String&`` (though can never
109 be accessed mutably) and generally acts as-if it was a subclass of
110 ``ns[C]String`` in most cases.
112 Since every string derives from ``nsAString`` (or ``nsACString``), they all
113 share a simple API. Common read-only methods include:
115 * ``.Length()`` - the number of code units (bytes for 8-bit string classes and ``char16_t`` for 16-bit string classes) in the string.
116 * ``.IsEmpty()`` - the fastest way of determining if the string has any value. Use this instead of testing ``string.Length() == 0``
117 * ``.Equals(string)`` - ``true`` if the given string has the same value as the current string. Approximately the same as ``operator==``.
119 Common methods that modify the string:
121 * ``.Assign(string)`` - Assigns a new value to the string. Approximately the same as ``operator=``.
122 * ``.Append(string)`` - Appends a value to the string.
123 * ``.Insert(string, position)`` - Inserts the given string before the code unit at position.
124 * ``.Truncate(length)`` - shortens the string to the given length.
126 More complete documentation can be found in the `Class Reference`_.
128 As function parameters
129 ~~~~~~~~~~~~~~~~~~~~~~
131 In general, use ``nsA[C]String`` references to pass strings across modules. For example:
135 // when passing a string to a method, use const nsAString&
136 nsFoo::PrintString(const nsAString& str);
138 // when getting a string from a method, use nsAString&
139 nsFoo::GetString(nsAString& result);
141 The Concrete Classes - which classes to use when
142 ------------------------------------------------
144 The concrete classes are for use in code that actually needs to store string
145 data. The most common uses of the concrete classes are as local variables,
146 and members in classes or structs.
148 .. digraph:: concreteclasses
150 node [shape=rectangle]
152 "nsA[C]String" -> "ns[C]String";
153 "ns[C]String" -> "nsDependent[C]String";
154 "nsA[C]String" -> "nsDependent[C]Substring";
155 "nsA[C]String" -> "ns[C]SubstringTuple";
156 "ns[C]String" -> "nsAuto[C]StringN";
157 "ns[C]String" -> "nsLiteral[C]String" [style=dashed];
158 "nsAuto[C]StringN" -> "nsPromiseFlat[C]String";
159 "nsAuto[C]StringN" -> "nsPrintfCString";
161 The following is a list of the most common concrete classes. Once you are
162 familiar with them, see the appendix describing What Class to Use When.
164 * ``ns[C]String`` - a null-terminated string whose buffer is allocated on the
165 heap. Destroys its buffer when the string object goes away.
167 * ``nsAuto[C]String`` - derived from ``nsString``, a string which owns a 64
168 code unit buffer in the same storage space as the string itself. If a string
169 less than 64 code units is assigned to an ``nsAutoString``, then no extra
170 storage will be allocated. For larger strings, a new buffer is allocated on
173 If you want a number other than 64, use the templated types ``nsAutoStringN``
174 / ``nsAutoCStringN``. (``nsAutoString`` and ``nsAutoCString`` are just
175 typedefs for ``nsAutoStringN<64>`` and ``nsAutoCStringN<64>``, respectively.)
177 * ``nsDependent[C]String`` - derived from ``nsString``, this string does not
178 own its buffer. It is useful for converting a raw string pointer (``const
179 char16_t*`` or ``const char*``) into a class of type ``nsAString``. Note that
180 you must null-terminate buffers used by to ``nsDependentString``. If you
181 don't want to or can't null-terminate the buffer, use
182 ``nsDependentSubstring``.
184 * ``nsPrintfCString`` - derived from ``nsCString``, this string behaves like an
185 ``nsAutoCString``. The constructor takes parameters which allows it to
186 construct a 8-bit string from a printf-style format string and parameter
189 There are also a number of concrete classes that are created as a side-effect
190 of helper routines, etc. You should avoid direct use of these classes. Let
191 the string library create the class for you.
193 * ``ns[C]SubstringTuple`` - created via string concatenation
194 * ``nsDependent[C]Substring`` - created through ``Substring()``
195 * ``nsPromiseFlat[C]String`` - created through ``PromiseFlatString()``
196 * ``nsLiteral[C]String`` - created through the ``""_ns`` and ``u""_ns`` user-defined literals
198 Of course, there are times when it is necessary to reference these string
199 classes in your code, but as a general rule they should be avoided.
204 Because Mozilla strings are always a single buffer, iteration over the
205 characters in the string is done using raw pointers:
210 * Find whether there is a tab character in `data`
212 bool HasTab(const nsAString& data) {
213 const char16_t* cur = data.BeginReading();
214 const char16_t* end = data.EndReading();
216 for (; cur < end; ++cur) {
217 if (char16_t('\t') == *cur) {
224 Note that ``end`` points to the character after the end of the string buffer.
225 It should never be dereferenced.
227 Writing to a mutable string is also simple:
232 * Replace every tab character in `data` with a space.
234 void ReplaceTabs(nsAString& data) {
235 char16_t* cur = data.BeginWriting();
236 char16_t* end = data.EndWriting();
238 for (; cur < end; ++cur) {
239 if (char16_t('\t') == *cur) {
240 *cur = char16_t(' ');
245 You may change the length of a string via ``SetLength()``. Note that
246 Iterators become invalid after changing the length of a string. If a string
247 buffer becomes smaller while writing it, use ``SetLength`` to inform the
248 string class of the new size:
253 * Remove every tab character from `data`
255 void RemoveTabs(nsAString& data) {
256 int len = data.Length();
257 char16_t* cur = data.BeginWriting();
258 char16_t* end = data.EndWriting();
261 if (char16_t('\t') == *cur) {
265 memmove(cur, cur + 1, (end - cur) * sizeof(char16_t));
274 Note that using ``BeginWriting()`` to make a string longer is not OK.
275 ``BeginWriting()`` must not be used to write past the logical length of the
276 string indicated by ``EndWriting()`` or ``Length()``. Calling
277 ``SetCapacity()`` before ``BeginWriting()`` does not affect what the previous
278 sentence says. To make the string longer, call ``SetLength()`` before
279 ``BeginWriting()`` or use the ``BulkWrite()`` API described below.
284 ``BulkWrite()`` allows capacity-aware cache-friendly low-level writes to the
287 Capacity-aware means that the caller is made aware of how the
288 caller-requested buffer capacity was rounded up to mozjemalloc buckets. This
289 is useful when initially requesting best-case buffer size without yet knowing
290 the true size need. If the data that actually needs to be written is larger
291 than the best-case estimate but still fits within the rounded-up capacity,
292 there is no need to reallocate despite requesting the best-case capacity.
294 Cache-friendly means that the zero terminator for C compatibility is written
295 after the new content of the string has been written, so the result is a
296 forward-only linear write access pattern instead of a non-linear
297 back-and-forth sequence resulting from using ``SetLength()`` followed by
300 Low-level means that writing via a raw pointer is possible as with
303 ``BulkWrite()`` takes three arguments: The new capacity (which may be rounded
304 up), the number of code units at the beginning of the string to preserve
305 (typically the old logical length), and a boolean indicating whether
306 reallocating a smaller buffer is OK if the requested capacity would fit in a
307 buffer that's smaller than current one. It returns a ``mozilla::Result`` which
308 contains either a usable ``mozilla::BulkWriteHandle<T>`` (where ``T`` is the
309 string's ``char_type``) or an ``nsresult`` explaining why none can be had
312 The actual writes are performed through the returned
313 ``mozilla::BulkWriteHandle<T>``. You must not access the string except via this
314 handle until you call ``Finish()`` on the handle in the success case or you let
315 the handle go out of scope without calling ``Finish()`` in the failure case, in
316 which case the destructor of the handle puts the string in a mostly harmless but
317 consistent state (containing a single REPLACEMENT CHARACTER if a capacity
318 greater than 0 was requested, or in the ``char`` case if the three-byte UTF-8
319 representation of the REPLACEMENT CHARACTER doesn't fit, an ASCII SUBSTITUTE).
321 ``mozilla::BulkWriteHandle<T>`` autoconverts to a writable
322 ``mozilla::Span<T>`` and also provides explicit access to itself as ``Span``
323 (``AsSpan()``) or via component accessors named consistently with those on
324 ``Span``: ``Elements()`` and ``Length()``. (The latter is not the logical
325 length of the string but the writable length of the buffer.) The buffer
326 exposed via these methods includes the prefix that you may have requested to
327 be preserved. It's up to you to skip past it so as to not overwrite it.
329 If there's a need to request a different capacity before you are ready to
330 call ``Finish()``, you can call ``RestartBulkWrite()`` on the handle. It
331 takes three arguments that match the first three arguments of
332 ``BulkWrite()``. It returns ``mozilla::Result<mozilla::Ok, nsresult>`` to
333 indicate success or OOM. Calling ``RestartBulkWrite()`` invalidates
334 previously-obtained span, raw pointer or length.
336 Once you are done writing, call ``Finish()``. It takes two arguments: the new
337 logical length of the string (which must not exceed the capacity retuned by
338 the ``Length()`` method of the handle) and a boolean indicating whether it's
339 OK to attempt to reallocate a smaller buffer in case a smaller mozjemalloc
340 bucket could accommodate the new logical length.
342 Helper Classes and Functions
343 ----------------------------
345 Converting Cocoa strings
346 ~~~~~~~~~~~~~~~~~~~~~~~~
348 Use ``mozilla::CopyCocoaStringToXPCOMString()`` in
349 ``mozilla/MacStringHelpers.h`` to convert Cocoa strings to XPCOM strings.
351 Searching strings - looking for substrings, characters, etc.
352 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
354 The ``nsReadableUtils.h`` header provides helper methods for searching in runnables.
358 bool FindInReadable(const nsAString& pattern,
359 nsAString::const_iterator start, nsAString::const_iterator end,
360 nsStringComparator& aComparator = nsDefaultStringComparator());
362 To use this, ``start`` and ``end`` should point to the beginning and end of a
363 string that you would like to search. If the search string is found,
364 ``start`` and ``end`` will be adjusted to point to the beginning and end of
365 the found pattern. The return value is ``true`` or ``false``, indicating
366 whether or not the string was found.
372 const nsAString& str = GetSomeString();
373 nsAString::const_iterator start, end;
375 str.BeginReading(start);
378 constexpr auto valuePrefix = u"value="_ns;
380 if (FindInReadable(valuePrefix, start, end)) {
381 // end now points to the character after the pattern
385 Checking for Memory Allocation failure
386 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
388 Like other types in Gecko, the string classes use infallible memory
389 allocation by default, so you do not need to check for success when
390 allocating/resizing "normal" strings.
392 Most functions that modify strings (``Assign()``, ``SetLength()``, etc.) also
393 have an overload that takes a ``mozilla::fallible_t`` parameter. These
394 overloads return ``false`` instead of aborting if allocation fails. Use them
395 when creating/allocating strings which may be very large, and which the
396 program could recover from if the allocation fails.
398 Substrings (string fragments)
399 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
401 It is very simple to refer to a substring of an existing string without
402 actually allocating new space and copying the characters into that substring.
403 ``Substring()`` is the preferred method to create a reference to such a
408 void ProcessString(const nsAString& str) {
409 const nsAString& firstFive = Substring(str, 0, 5); // from index 0, length 5
410 // firstFive is now a string representing the first 5 characters
416 Strings can be stored in two basic formats: 8-bit code unit (byte/``char``)
417 strings, or 16-bit code unit (``char16_t``) strings. Any string class with a
418 capital "C" in the classname contains 8-bit bytes. These classes include
419 ``nsCString``, ``nsDependentCString``, and so forth. Any string class without
420 the "C" contains 16-bit code units.
422 A 8-bit string can be in one of many character encodings while a 16-bit
423 string is always in potentially-invalid UTF-16. (You can make a 16-bit string
424 guaranteed-valid UTF-16 by passing it to ``EnsureUTF16Validity()``.) The most
425 common encodings are:
428 * ASCII - 7-bit encoding for basic English-only strings. Each ASCII value
429 is stored in exactly one byte in the array with the most-significant 8th bit
432 * `UCS2 <http://www.unicode.org/glossary/#UCS_2>`_ - 16-bit encoding for a
433 subset of Unicode, `BMP <http://www.unicode.org/glossary/#BMP>`_. The Unicode
434 value of a character stored in UCS2 is stored in exactly one 16-bit
435 ``char16_t`` in a string class.
437 * `UTF-8 <http://www.faqs.org/rfcs/rfc3629.html>`_ - 8-bit encoding for
438 Unicode characters. Each Unicode characters is stored in up to 4 bytes in a
439 string class. UTF-8 is capable of representing the entire Unicode character
440 repertoire, and it efficiently maps to `UTF-32
441 <http://www.unicode.org/glossary/#UTF_32>`_. (Gtk and Rust natively use
444 * `UTF-16 <http://www.unicode.org/glossary/#UTF_16>`_ - 16-bit encoding for
445 Unicode storage, backwards compatible with UCS2. The Unicode value of a
446 character stored in UTF-16 may require one or two 16-bit ``char16_t`` in a
447 string class. The contents of ``nsAString`` always has to be regarded as in
448 this encoding instead of UCS2. UTF-16 is capable of representing the entire
449 Unicode character repertoire, and it efficiently maps to UTF-32. (Win32 W
450 APIs and Mac OS X natively use UTF-16.)
452 * Latin1 - 8-bit encoding for the first 256 Unicode code points. Used for
453 HTTP headers and for size-optimized storage in text node and SpiderMonkey
454 strings. Latin1 converts to UTF-16 by zero-extending each byte to a 16-bit
455 code unit. Note that this kind of "Latin1" is not available for encoding
456 HTML, CSS, JS, etc. Specifying ``charset=latin1`` means the same as
457 ``charset=windows-1252``. Windows-1252 is a similar but different encoding
458 used for interchange.
460 In addition, there exist multiple other (legacy) encodings. The Web-relevant
461 ones are defined in the `Encoding Standard <https://encoding.spec.whatwg.org/>`_.
462 Conversions from these encodings to
463 UTF-8 and UTF-16 are provided by `mozilla::Encoding
464 <https://searchfox.org/mozilla-central/source/intl/Encoding.h#109>`_.
465 Additonally, on Windows the are some rare cases (e.g. drag&drop) where it's
466 necessary to call a system API with data encoded in the Windows
467 locale-dependent legacy encoding instead of UTF-16. In those rare cases, use
468 ``MultiByteToWideChar``/``WideCharToMultiByte`` from kernel32.dll. Do not use
469 ``iconv`` on *nix. We only support UTF-8-encoded file paths on *nix, non-path
470 Gtk strings are always UTF-8 and Cocoa and Java strings are always UTF-16.
472 When working with existing code, it is important to examine the current usage
473 of the strings that you are manipulating, to determine the correct conversion
476 When writing new code, it can be confusing to know which storage class and
477 encoding is the most appropriate. There is no single answer to this question,
478 but the important points are:
481 * **Surprisingly many strings are very often just ASCII.** ASCII is a subset of
482 UTF-8 and is, therefore, efficient to represent as UTF-8. Representing ASCII
483 as UTF-16 bad both for memory usage and cache locality.
485 * **Rust strongly prefers UTF-8.** If your C++ code is interacting with Rust
486 code, using UTF-8 in ``nsACString`` and merely validating it when converting
487 to Rust strings is more efficient than using ``nsAString`` on the C++ side.
489 * **Networking code prefers 8-bit strings.** Networking code tends to use 8-bit
490 strings: either with UTF-8 or Latin1 (byte value is the Unicode scalar value)
493 * **JS and DOM prefer UTF-16.** Most Gecko code uses UTF-16 for compatibility
494 with JS strings and DOM string which are potentially-invalid UTF-16. However,
495 both DOM text nodes and JS strings store strings that only contain code points
496 below U+0100 as Latin1 (byte value is the Unicode scalar value).
498 * **Windows and Cocoa use UTF-16.** Windows system APIs take UTF-16. Cocoa
499 ``NSString`` is UTF-16.
501 * **Gtk uses UTF-8.** Gtk APIs take UTF-8 for non-file paths. In the Gecko
502 case, we support only UTF-8 file paths outside Windows, so all Gtk strings
503 are UTF-8 for our purposes though file paths received from Gtk may not be
506 To assist with ASCII, Latin1, UTF-8, and UTF-16 conversions, there are some
507 helper methods and classes. Some of these classes look like functions,
508 because they are most often used as temporary objects on the stack.
510 Short zero-terminated ASCII strings
511 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
513 If you have a short zero-terminated string that you are certain is always
514 ASCII, use these special-case methods instead of the conversions described in
517 * If you are assigning an ASCII literal to an ``nsACString``, use
519 * If you are assigning a literal to an ``nsAString``, use ``AssignLiteral()``
520 and make the literal a ``u""`` literal. If the literal has to be a ``""``
521 literal (as opposed to ``u""``) and is ASCII, still use ``AppendLiteral()``,
522 but be aware that this involves a run-time inflation.
523 * If you are assigning a zero-terminated ASCII string that's not a literal from
524 the compiler's point of view at the call site and you don't know the length
525 of the string either (e.g. because it was looked up from an array of literals
526 of varying lengths), use ``AssignASCII()``.
528 UTF-8 / UTF-16 conversion
529 ~~~~~~~~~~~~~~~~~~~~~~~~~
531 .. cpp:function:: NS_ConvertUTF8toUTF16(const nsACString&)
533 a ``nsAutoString`` subclass that converts a UTF-8 encoded ``nsACString``
534 or ``const char*`` to a 16-bit UTF-16 string. If you need a ``const
535 char16_t*`` buffer, you can use the ``.get()`` method. For example:
539 /* signature: void HandleUnicodeString(const nsAString& str); */
540 object->HandleUnicodeString(NS_ConvertUTF8toUTF16(utf8String));
542 /* signature: void HandleUnicodeBuffer(const char16_t* str); */
543 object->HandleUnicodeBuffer(NS_ConvertUTF8toUTF16(utf8String).get());
545 .. cpp:function:: NS_ConvertUTF16toUTF8(const nsAString&)
547 a ``nsAutoCString`` which converts a 16-bit UTF-16 string (``nsAString``)
548 to a UTF-8 encoded string. As above, you can use ``.get()`` to access a
549 ``const char*`` buffer.
553 /* signature: void HandleUTF8String(const nsACString& str); */
554 object->HandleUTF8String(NS_ConvertUTF16toUTF8(utf16String));
556 /* signature: void HandleUTF8Buffer(const char* str); */
557 object->HandleUTF8Buffer(NS_ConvertUTF16toUTF8(utf16String).get());
559 .. cpp:function:: CopyUTF8toUTF16(const nsACString&, nsAString&)
565 // return a UTF-16 value
566 void Foo::GetUnicodeValue(nsAString& result) {
567 CopyUTF8toUTF16(mLocalUTF8Value, result);
570 .. cpp:function:: AppendUTF8toUTF16(const nsACString&, nsAString&)
572 converts and appends:
576 // return a UTF-16 value
577 void Foo::GetUnicodeValue(nsAString& result) {
578 result.AssignLiteral("prefix:");
579 AppendUTF8toUTF16(mLocalUTF8Value, result);
582 .. cpp:function:: CopyUTF16toUTF8(const nsAString&, nsACString&)
588 // return a UTF-8 value
589 void Foo::GetUTF8Value(nsACString& result) {
590 CopyUTF16toUTF8(mLocalUTF16Value, result);
593 .. cpp:function:: AppendUTF16toUTF8(const nsAString&, nsACString&)
595 converts and appends:
599 // return a UTF-8 value
600 void Foo::GetUnicodeValue(nsACString& result) {
601 result.AssignLiteral("prefix:");
602 AppendUTF16toUTF8(mLocalUTF16Value, result);
606 Latin1 / UTF-16 Conversion
607 ~~~~~~~~~~~~~~~~~~~~~~~~~~
609 The following should only be used when you can guarantee that the original
610 string is ASCII or Latin1 (in the sense that the byte value is the Unicode
611 scalar value; not in the windows-1252 sense). These helpers are very similar
612 to the UTF-8 / UTF-16 conversion helpers above.
615 UTF-16 to Latin1 converters
616 ```````````````````````````
618 These converters are **very dangerous** because they **lose information**
619 during the conversion process. You should **avoid UTF-16 to Latin1
620 conversions** unless your strings are guaranteed to be Latin1 or ASCII. (In
621 the future, these conversions may start asserting in debug builds that their
622 input is in the permissible range.) If the input is actually in the Latin1
623 range, each 16-bit code unit in narrowed to an 8-bit byte by removing the
624 high half. Unicode code points above U+00FF result in garbage whose nature
625 must not be relied upon. (In the future the nature of the garbage will be CPU
626 architecture-dependent.) If you want to ``printf()`` something and don't care
627 what happens to non-ASCII, please convert to UTF-8 instead.
630 .. cpp:function:: NS_LossyConvertUTF16toASCII(const nsAString&)
632 A ``nsAutoCString`` which holds a temporary buffer containing the Latin1
635 .. cpp:function:: void LossyCopyUTF16toASCII(Span<const char16_t>, nsACString&)
637 Does an in-place conversion from UTF-16 into an Latin1 string object.
639 .. cpp:function:: void LossyAppendUTF16toASCII(Span<const char16_t>, nsACString&)
641 Appends a UTF-16 string to a Latin1 string.
643 Latin1 to UTF-16 converters
644 ```````````````````````````
646 These converters are very dangerous because they will **produce wrong results
647 for non-ASCII UTF-8 or windows-1252 input** into a meaningless UTF-16 string.
648 You should **avoid ASCII to UTF-16 conversions** unless your strings are
649 guaranteed to be ASCII or Latin1 in the sense of the byte value being the
650 Unicode scalar value. Every byte is zero-extended into a 16-bit code unit.
652 It is correct to use these on most HTTP header values, but **it's always
653 wrong to use these on HTTP response bodies!** (Use ``mozilla::Encoding`` to
654 deal with response bodies.)
656 .. cpp:function:: NS_ConvertASCIItoUTF16(const nsACString&)
658 A ``nsAutoString`` which holds a temproary buffer contianing the value of
659 the Latin1 to UTF-16 conversion.
661 .. cpp:function:: void CopyASCIItoUTF16(Span<const char>, nsAString&)
663 does an in-place conversion from Latin1 to UTF-16.
665 .. cpp:function:: void AppendASCIItoUTF16(Span<const char>, nsAString&)
667 appends a Latin1 string to a UTF-16 string.
669 Comparing ns*Strings with C strings
670 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
672 You can compare ``ns*Strings`` with C strings by converting the ``ns*String``
673 to a C string, or by comparing directly against a C String.
675 .. cpp:function:: bool nsAString::EqualsASCII(const char*)
677 Compares with an ASCII C string.
679 .. cpp:function:: bool nsAString::EqualsLiteral(...)
681 Compares with a string literal.
689 A literal string is a raw string value that is written in some C++ code. For
690 example, in the statement ``printf("Hello World\n");`` the value ``"Hello
691 World\n"`` is a literal string. It is often necessary to insert literal
692 string values when an ``nsAString`` or ``nsACString`` is required. Two
693 user-defined literals are provided that implicitly convert to ``const
694 nsString&`` resp. ``const nsCString&``:
696 * ``""_ns`` for 8-bit literals, converting implicitly to ``const nsCString&``
697 * ``u""_ns`` for 16-bit literals, converting implicitly to ``const nsString&``
699 The benefits of the user-defined literals may seem unclear, given that
700 ``nsDependentCString`` will also wrap a string value in an ``nsCString``. The
701 advantage of the user-defined literals is twofold.
703 * The length of these strings is calculated at compile time, so the string does
704 not need to be scanned at runtime to determine its length.
706 * Literal strings live for the lifetime of the binary, and can be moved between
707 the ``ns[C]String`` classes without being copied or freed.
709 Here are some examples of proper usage of the literals (both standard and
714 // call Init(const nsLiteralString&) - enforces that it's only called with literals
715 Init(u"start value"_ns);
717 // call Init(const nsAString&)
718 Init(u"start value"_ns);
720 // call Init(const nsACString&)
721 Init("start value"_ns);
723 In case a literal is defined via a macro, you can just convert it to
724 ``nsLiteralString`` or ``nsLiteralCString`` using their constructor. You
725 could consider not using a macro at all but a named ``constexpr`` constant
728 In some cases, an 8-bit literal is defined via a macro, either within code or
729 from the environment, but it can't be changed or is used both as an 8-bit and
730 a 16-bit string. In these cases, you can use the
731 ``NS_LITERAL_STRING_FROM_CSTRING`` macro to construct a ``nsLiteralString``
732 and do the conversion at compile-time.
737 Strings can be concatenated together using the + operator. The resulting
738 string is a ``const nsSubstringTuple`` object. The resulting object can be
739 treated and referenced similarly to a ``nsAString`` object. Concatenation *does
740 not copy the substrings*. The strings are only copied when the concatenation
741 is assigned into another string object. The ``nsSubstringTuple`` object holds
742 pointers to the original strings. Therefore, the ``nsSubstringTuple`` object is
743 dependent on all of its substrings, meaning that their lifetime must be at
744 least as long as the ``nsSubstringTuple`` object.
746 For example, you can use the value of two strings and pass their
747 concatenation on to another function which takes an ``const nsAString&``:
751 void HandleTwoStrings(const nsAString& one, const nsAString& two) {
752 // call HandleString(const nsAString&)
753 HandleString(one + two);
756 NOTE: The two strings are implicitly combined into a temporary ``nsString``
757 in this case, and the temporary string is passed into ``HandleString``. If
758 ``HandleString`` assigns its input into another ``nsString``, then the string
759 buffer will be shared in this case negating the cost of the intermediate
760 temporary. You can concatenate N strings and store the result in a temporary
765 constexpr auto start = u"start "_ns;
766 constexpr auto middle = u"middle "_ns;
767 constexpr auto end = u"end"_ns;
768 // create a string with 3 dependent fragments - no copying involved!
769 nsString combinedString = start + middle + end;
771 // call void HandleString(const nsAString&);
772 HandleString(combinedString);
774 It is safe to concatenate user-defined literals because the temporary
775 ``nsLiteral[C]String`` objects will live as long as the temporary
776 concatenation object (of type ``nsSubstringTuple``).
780 // call HandlePage(const nsAString&);
781 // safe because the concatenated-string will live as long as its substrings
782 HandlePage(u"start "_ns + u"end"_ns);
787 Local variables within a function are usually stored on the stack. The
788 ``nsAutoString``/``nsAutoCString`` classes are subclasses of the
789 ``nsString``/``nsCString`` classes. They own a 64-character buffer allocated
790 in the same storage space as the string itself. If the ``nsAutoString`` is
791 allocated on the stack, then it has at its disposal a 64-character stack
792 buffer. This allows the implementation to avoid allocating extra memory when
793 dealing with small strings. ``nsAutoStringN``/``nsAutoCStringN`` are more
794 general alternatives that let you choose the number of characters in the
801 GetValue(value); // if the result is less than 64 code units,
802 // then this just saved us an allocation
808 In general, you should use the concrete classes ``nsString`` and
809 ``nsCString`` for member variables.
815 // these store UTF-8 and UTF-16 values respectively
816 nsCString mLocalName;
820 A common incorrect pattern is to use ``nsAutoString``/``nsAutoCString``
821 for member variables. As described in `Local Variables`_, these classes have
822 a built in buffer that make them very large. This means that if you include
823 them in a class, they bloat the class by 64 bytes (``nsAutoCString``) or 128
824 bytes (``nsAutoString``).
827 Raw Character Pointers
828 ~~~~~~~~~~~~~~~~~~~~~~
830 ``PromiseFlatString()`` and ``PromiseFlatCString()`` can be used to create a
831 temporary buffer which holds a null-terminated buffer containing the same
832 value as the source string. ``PromiseFlatString()`` will create a temporary
833 buffer if necessary. This is most often used in order to pass an
834 ``nsAString`` to an API which requires a null-terminated string.
836 In the following example, an ``nsAString`` is combined with a literal string,
837 and the result is passed to an API which requires a simple character buffer.
841 // Modify the URL and pass to AddPage(const char16_t* url)
842 void AddModifiedPage(const nsAString& url) {
843 constexpr auto httpPrefix = u"http://"_ns;
844 const nsAString& modifiedURL = httpPrefix + url;
846 // creates a temporary buffer
847 AddPage(PromiseFlatString(modifiedURL).get());
850 ``PromiseFlatString()`` is smart when handed a string that is already
851 null-terminated. It avoids creating the temporary buffer in such cases.
855 // Modify the URL and pass to AddPage(const char16_t* url)
856 void AddModifiedPage(const nsAString& url, PRBool addPrefix) {
858 // MUST create a temporary buffer - string is multi-fragmented
859 constexpr auto httpPrefix = u"http://"_ns;
860 AddPage(PromiseFlatString(httpPrefix + modifiedURL));
862 // MIGHT create a temporary buffer, does a runtime check
863 AddPage(PromiseFlatString(url).get());
869 It is **not** possible to efficiently transfer ownership of a string
870 class' internal buffer into an owned ``char*`` which can be safely
871 freed by other components due to the COW optimization.
873 If working with a legacy API which requires malloced ``char*`` buffers,
874 prefer using ``ToNewUnicode``, ``ToNewCString`` or ``ToNewUTF8String``
875 over ``strdup`` to create owned ``char*`` pointers.
877 ``printf`` and a UTF-16 string
878 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
880 For debugging, it's useful to ``printf`` a UTF-16 string (``nsString``,
881 ``nsAutoString``, etc). To do this usually requires converting it to an 8-bit
882 string, because that's what ``printf`` expects. Use:
886 printf("%s\n", NS_ConvertUTF16toUTF8(yourString).get());
888 Sequence of appends without reallocating
889 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
891 ``SetCapacity()`` allows you to give the string a hint of the future string
892 length caused by a sequence of appends (excluding appends that convert
893 between UTF-16 and UTF-8 in either direction) in order to avoid multiple
894 allocations during the sequence of appends. However, the other
895 allocation-avoidance features of XPCOM strings interact badly with
896 ``SetCapacity()`` making it something of a footgun.
898 ``SetCapacity()`` is appropriate to use before a sequence of multiple
899 operations from the following list (without operations that are not on the
900 list between the ``SetCapacity()`` call and operations from the list):
904 * ``AppendLiteral()``
908 * ``LossyAppendUTF16toASCII()``
909 * ``AppendASCIItoUTF16()``
911 **DO NOT** call ``SetCapacity()`` if the subsequent operations on the string
912 do not meet the criteria above. Operations that undo the benefits of
913 ``SetCapacity()`` include but are not limited to:
918 * ``AssignLiteral()``
920 * ``CopyASCIItoUTF16()``
921 * ``LossyCopyUTF16toASCII()``
922 * ``AppendUTF16toUTF8()``
923 * ``AppendUTF8toUTF16()``
924 * ``CopyUTF16toUTF8()``
925 * ``CopyUTF8toUTF16()``
927 If your string is an ``nsAuto[C]String`` and you are calling
928 ``SetCapacity()`` with a constant ``N``, please instead declare the string as
929 ``nsAuto[C]StringN<N+1>`` without calling ``SetCapacity()`` (while being
930 mindful of not using such a large ``N`` as to overflow the run-time stack).
932 There is no need to include room for the null terminator: it is the job of
935 Note: Calling ``SetCapacity()`` does not give you permission to use the
936 pointer obtained from ``BeginWriting()`` to write past the current length (as
937 returned by ``Length()``) of the string. Please use either ``BulkWrite()`` or
938 ``SetLength()`` instead.
940 .. _stringguide.xpidl:
945 The string library is also available through IDL. By declaring attributes and
946 methods using the specially defined IDL types, string classes are used as
947 parameters to the corresponding methods.
952 The C++ signatures follow the abstract-type convention described above, such
953 that all method parameters are based on the abstract classes. The following
954 table describes the purpose of each string type in IDL.
956 +-----------------+----------------+----------------------------------------------------------------------------------+
957 | XPIDL Type | C++ Type | Purpose |
958 +=================+================+==================================================================================+
959 | ``string`` | ``char*`` | Raw character pointer to ASCII (7-bit) string, no string classes used. |
961 | | | High bit is not guaranteed across XPConnect boundaries. |
962 +-----------------+----------------+----------------------------------------------------------------------------------+
963 | ``wstring`` | ``char16_t*`` | Raw character pointer to UTF-16 string, no string classes used. |
964 +-----------------+----------------+----------------------------------------------------------------------------------+
965 | ``AString`` | ``nsAString`` | UTF-16 string. |
966 +-----------------+----------------+----------------------------------------------------------------------------------+
967 | ``ACString`` | ``nsACString`` | 8-bit string. All bits are preserved across XPConnect boundaries. |
968 +-----------------+----------------+----------------------------------------------------------------------------------+
969 | ``AUTF8String`` | ``nsACString`` | UTF-8 string. |
971 | | | Converted to UTF-16 as necessary when value is used across XPConnect boundaries. |
972 +-----------------+----------------+----------------------------------------------------------------------------------+
974 Callers should prefer using the string classes ``AString``, ``ACString`` and
975 ``AUTF8String`` over the raw pointer types ``string`` and ``wstring`` in
976 almost all situations.
981 In XPIDL, ``in`` parameters are read-only, and the C++ signatures for
982 ``*String`` parameters follows the above guidelines by using ``const
983 nsAString&`` for these parameters. ``out`` and ``inout`` parameters are
984 defined simply as ``nsAString&`` so that the callee can write to them.
988 interface nsIFoo : nsISupports {
989 attribute AString utf16String;
990 AUTF8String getValue(in ACString key);
995 class nsIFoo : public nsISupports {
996 NS_IMETHOD GetUtf16String(nsAString& aResult) = 0;
997 NS_IMETHOD SetUtf16String(const nsAString& aValue) = 0;
998 NS_IMETHOD GetValue(const nsACString& aKey, nsACString& aResult) = 0;
1001 In the above example, ``utf16String`` is treated as a UTF-16 string. The
1002 implementation of ``GetUtf16String()`` will use ``aResult.Assign`` to
1003 "return" the value. In ``SetUtf16String()`` the value of the string can be
1004 used through a variety of methods including `Iterators`_,
1005 ``PromiseFlatString``, and assignment to other strings.
1007 In ``GetValue()``, the first parameter, ``aKey``, is treated as a raw
1008 sequence of 8-bit values. Any non-ASCII characters in ``aKey`` will be
1009 preserved when crossing XPConnect boundaries. The implementation of
1010 ``GetValue()`` will assign a UTF-8 encoded 8-bit string into ``aResult``. If
1011 the this method is called across XPConnect boundaries, such as from a script,
1012 then the result will be decoded from UTF-8 into UTF-16 and used as a Unicode
1018 Follow these simple rules in your code to keep your fellow developers,
1019 reviewers, and users happy.
1021 * Use the most abstract string class that you can. Usually this is:
1022 * ``nsAString`` for function parameters
1023 * ``nsString`` for member variables
1024 * ``nsAutoString`` for local (stack-based) variables
1025 * Use the ``""_ns`` and ``u""_ns`` user-defined literals to represent literal strings (e.g. ``"foo"_ns``) as nsAString-compatible objects.
1026 * Use string concatenation (i.e. the "+" operator) when combining strings.
1027 * Use ``nsDependentString`` when you have a raw character pointer that you need to convert to an nsAString-compatible string.
1028 * Use ``Substring()`` to extract fragments of existing strings.
1029 * Use `iterators`_ to parse and extract string fragments.
1034 .. cpp:class:: template<T> nsTSubstring<T>
1038 The ``nsTSubstring<char_type>`` class is usually written as
1039 ``nsAString`` or ``nsACString``.
1041 .. cpp:function:: size_type Length() const
1043 .. cpp:function:: bool IsEmpty() const
1045 .. cpp:function:: bool IsVoid() const
1047 .. cpp:function:: const char_type* BeginReading() const
1049 .. cpp:function:: const char_type* EndReading() const
1051 .. cpp:function:: bool Equals(const self_type&, comparator_type = ...) const
1053 .. cpp:function:: char_type First() const
1055 .. cpp:function:: char_type Last() const
1057 .. cpp:function:: size_type CountChar(char_type) const
1059 .. cpp:function:: int32_t FindChar(char_type, index_type aOffset = 0) const
1061 .. cpp:function:: void Assign(const self_type&)
1063 .. cpp:function:: void Append(const self_type&)
1065 .. cpp:function:: void Insert(const self_type&, index_type aPos)
1067 .. cpp:function:: void Cut(index_type aCutStart, size_type aCutLength)
1069 .. cpp:function:: void Replace(index_type aCutStart, size_type aCutLength, const self_type& aStr)
1071 .. cpp:function:: void Truncate(size_type aLength)
1073 .. cpp:function:: void SetIsVoid(bool)
1075 Make it null. XPConnect and WebIDL will convert void nsAStrings to
1076 JavaScript ``null``.
1078 .. cpp:function:: char_type* BeginWriting()
1080 .. cpp:function:: char_type* EndWriting()
1082 .. cpp:function:: void SetCapacity(size_type)
1084 Inform the string about buffer size need before a sequence of calls
1085 to ``Append()`` or converting appends that convert between UTF-16 and
1086 Latin1 in either direction. (Don't use if you use appends that
1087 convert between UTF-16 and UTF-8 in either direction.) Calling this
1088 method does not give you permission to use ``BeginWriting()`` to
1089 write past the logical length of the string. Use ``SetLength()`` or
1090 ``BulkWrite()`` as appropriate.
1092 .. cpp:function:: void SetLength(size_type)
1094 .. cpp:function:: Result<BulkWriteHandle<char_type>, nsresult> BulkWrite(size_type aCapacity, size_type aPrefixToPreserve, bool aAllowShrinking)
1097 Original Document Information
1098 -----------------------------
1100 This document was originally hosted on MDN as part of the XPCOM guide.
1102 * Author: `Alec Flett <mailto:alecf@flett.org>`_
1103 * Copyright Information: Portions of this content are © 1998–2007 by individual mozilla.org contributors; content available under a Creative Commons license.
1104 * Thanks to David Baron for `actual docs <http://dbaron.org/mozilla/coding-practices>`_,
1105 * Peter Annema for lots of direction
1106 * Myk Melez for some more docs
1107 * David Bradley for a diagram
1108 * Revised by Darin Fisher for Mozilla 1.7
1109 * Revised by Jungshik Shin to clarify character encoding issues
1110 * Migrated to in-tree documentation by Nika Layzell