gcc/ada/doc/gnat_rm/the_implementation_of_standard_i_o.rst

   1 .. _The_Implementation_of_Standard_I/O:
   2
   3 **********************************
   4 The Implementation of Standard I/O
   5 **********************************
   6
   7 GNAT implements all the required input-output facilities described in
   8 A.6 through A.14.  These sections of the Ada Reference Manual describe the
   9 required behavior of these packages from the Ada point of view, and if
  10 you are writing a portable Ada program that does not need to know the
  11 exact manner in which Ada maps to the outside world when it comes to
  12 reading or writing external files, then you do not need to read this
  13 chapter.  As long as your files are all regular files (not pipes or
  14 devices), and as long as you write and read the files only from Ada, the
  15 description in the Ada Reference Manual is sufficient.
  16
  17 However, if you want to do input-output to pipes or other devices, such
  18 as the keyboard or screen, or if the files you are dealing with are
  19 either generated by some other language, or to be read by some other
  20 language, then you need to know more about the details of how the GNAT
  21 implementation of these input-output facilities behaves.
  22
  23 In this chapter we give a detailed description of exactly how GNAT
  24 interfaces to the file system.  As always, the sources of the system are
  25 available to you for answering questions at an even more detailed level,
  26 but for most purposes the information in this chapter will suffice.
  27
  28 Another reason that you may need to know more about how input-output is
  29 implemented arises when you have a program written in mixed languages
  30 where, for example, files are shared between the C and Ada sections of
  31 the same program.  GNAT provides some additional facilities, in the form
  32 of additional child library packages, that facilitate this sharing, and
  33 these additional facilities are also described in this chapter.
  34
  35 .. _Standard_I/O_Packages:
  36
  37 Standard I/O Packages
  38 =====================
  39
  40 The Standard I/O packages described in Annex A for
  41
  42 *
  43   Ada.Text_IO
  44 *
  45   Ada.Text_IO.Complex_IO
  46 *
  47   Ada.Text_IO.Text_Streams
  48 *
  49   Ada.Wide_Text_IO
  50 *
  51   Ada.Wide_Text_IO.Complex_IO
  52 *
  53   Ada.Wide_Text_IO.Text_Streams
  54 *
  55   Ada.Wide_Wide_Text_IO
  56 *
  57   Ada.Wide_Wide_Text_IO.Complex_IO
  58 *
  59   Ada.Wide_Wide_Text_IO.Text_Streams
  60 *
  61   Ada.Stream_IO
  62 *
  63   Ada.Sequential_IO
  64 *
  65   Ada.Direct_IO
  66
  67 are implemented using the C
  68 library streams facility; where
  69
  70 *
  71   All files are opened using ``fopen``.
  72 *
  73   All input/output operations use ``fread``/``fwrite``.
  74
  75 There is no internal buffering of any kind at the Ada library level. The only
  76 buffering is that provided at the system level in the implementation of the
  77 library routines that support streams. This facilitates shared use of these
  78 streams by mixed language programs. Note though that system level buffering is
  79 explicitly enabled at elaboration of the standard I/O packages and that can
  80 have an impact on mixed language programs, in particular those using I/O before
  81 calling the Ada elaboration routine (e.g., adainit). It is recommended to call
  82 the Ada elaboration routine before performing any I/O or when impractical,
  83 flush the common I/O streams and in particular Standard_Output before
  84 elaborating the Ada code.
  85
  86 .. _FORM_Strings:
  87
  88 FORM Strings
  89 ============
  90
  91 The format of a FORM string in GNAT is:
  92
  93
  94 ::
  95
  96   "keyword=value,keyword=value,...,keyword=value"
  97
  98
  99 where letters may be in upper or lower case, and there are no spaces
 100 between values.  The order of the entries is not important.  Currently
 101 the following keywords defined.
 102
 103
 104 ::
 105
 106   TEXT_TRANSLATION=[YES|NO|TEXT|BINARY|U8TEXT|WTEXT|U16TEXT]
 107   SHARED=[YES|NO]
 108   WCEM=[n|h|u|s|e|8|b]
 109   ENCODING=[UTF8|8BITS]
 110
 111
 112 The use of these parameters is described later in this section. If an
 113 unrecognized keyword appears in a form string, it is silently ignored
 114 and not considered invalid.
 115
 116 .. _Direct_IO:
 117
 118 Direct_IO
 119 =========
 120
 121 Direct_IO can only be instantiated for definite types.  This is a
 122 restriction of the Ada language, which means that the records are fixed
 123 length (the length being determined by ``type'Size``, rounded
 124 up to the next storage unit boundary if necessary).
 125
 126 The records of a Direct_IO file are simply written to the file in index
 127 sequence, with the first record starting at offset zero, and subsequent
 128 records following.  There is no control information of any kind.  For
 129 example, if 32-bit integers are being written, each record takes
 130 4-bytes, so the record at index ``K`` starts at offset ``(K-1)*4``.
 131
 132 There is no limit on the size of Direct_IO files, they are expanded as
 133 necessary to accommodate whatever records are written to the file.
 134
 135 .. _Sequential_IO:
 136
 137 Sequential_IO
 138 =============
 139
 140 Sequential_IO may be instantiated with either a definite (constrained)
 141 or indefinite (unconstrained) type.
 142
 143 For the definite type case, the elements written to the file are simply
 144 the memory images of the data values with no control information of any
 145 kind.  The resulting file should be read using the same type, no validity
 146 checking is performed on input.
 147
 148 For the indefinite type case, the elements written consist of two
 149 parts.  First is the size of the data item, written as the memory image
 150 of a ``Interfaces.C.size_t`` value, followed by the memory image of
 151 the data value.  The resulting file can only be read using the same
 152 (unconstrained) type.  Normal assignment checks are performed on these
 153 read operations, and if these checks fail, ``Data_Error`` is
 154 raised.  In particular, in the array case, the lengths must match, and in
 155 the variant record case, if the variable for a particular read operation
 156 is constrained, the discriminants must match.
 157
 158 Note that it is not possible to use Sequential_IO to write variable
 159 length array items, and then read the data back into different length
 160 arrays.  For example, the following will raise ``Data_Error``:
 161
 162
 163 .. code-block:: ada
 164
 165    package IO is new Sequential_IO (String);
 166    F : IO.File_Type;
 167    S : String (1..4);
 168    ...
 169    IO.Create (F)
 170    IO.Write (F, "hello!")
 171    IO.Reset (F, Mode=>In_File);
 172    IO.Read (F, S);
 173    Put_Line (S);
 174
 175
 176
 177 On some Ada implementations, this will print ``hell``, but the program is
 178 clearly incorrect, since there is only one element in the file, and that
 179 element is the string ``hello!``.
 180
 181 In Ada 95 and Ada 2005, this kind of behavior can be legitimately achieved
 182 using Stream_IO, and this is the preferred mechanism.  In particular, the
 183 above program fragment rewritten to use Stream_IO will work correctly.
 184
 185 .. _Text_IO:
 186
 187 Text_IO
 188 =======
 189
 190 Text_IO files consist of a stream of characters containing the following
 191 special control characters:
 192
 193
 194 ::
 195
 196   LF (line feed, 16#0A#) Line Mark
 197   FF (form feed, 16#0C#) Page Mark
 198
 199
 200 A canonical Text_IO file is defined as one in which the following
 201 conditions are met:
 202
 203 *
 204   The character ``LF`` is used only as a line mark, i.e., to mark the end
 205   of the line.
 206
 207 *
 208   The character ``FF`` is used only as a page mark, i.e., to mark the
 209   end of a page and consequently can appear only immediately following a
 210   ``LF`` (line mark) character.
 211
 212 *
 213   The file ends with either ``LF`` (line mark) or ``LF``-`FF`
 214   (line mark, page mark).  In the former case, the page mark is implicitly
 215   assumed to be present.
 216
 217 A file written using Text_IO will be in canonical form provided that no
 218 explicit ``LF`` or ``FF`` characters are written using ``Put``
 219 or ``Put_Line``.  There will be no ``FF`` character at the end of
 220 the file unless an explicit ``New_Page`` operation was performed
 221 before closing the file.
 222
 223 A canonical Text_IO file that is a regular file (i.e., not a device or a
 224 pipe) can be read using any of the routines in Text_IO.  The
 225 semantics in this case will be exactly as defined in the Ada Reference
 226 Manual, and all the routines in Text_IO are fully implemented.
 227
 228 A text file that does not meet the requirements for a canonical Text_IO
 229 file has one of the following:
 230
 231 *
 232   The file contains ``FF`` characters not immediately following a
 233   ``LF`` character.
 234
 235 *
 236   The file contains ``LF`` or ``FF`` characters written by
 237   ``Put`` or ``Put_Line``, which are not logically considered to be
 238   line marks or page marks.
 239
 240 *
 241   The file ends in a character other than ``LF`` or ``FF``,
 242   i.e., there is no explicit line mark or page mark at the end of the file.
 243
 244 Text_IO can be used to read such non-standard text files but subprograms
 245 to do with line or page numbers do not have defined meanings.  In
 246 particular, a ``FF`` character that does not follow a ``LF``
 247 character may or may not be treated as a page mark from the point of
 248 view of page and line numbering.  Every ``LF`` character is considered
 249 to end a line, and there is an implied ``LF`` character at the end of
 250 the file.
 251
 252 .. _Stream_Pointer_Positioning:
 253
 254 Stream Pointer Positioning
 255 --------------------------
 256
 257 ``Ada.Text_IO`` has a definition of current position for a file that
 258 is being read.  No internal buffering occurs in Text_IO, and usually the
 259 physical position in the stream used to implement the file corresponds
 260 to this logical position defined by Text_IO.  There are two exceptions:
 261
 262 *
 263   After a call to ``End_Of_Page`` that returns ``True``, the stream
 264   is positioned past the ``LF`` (line mark) that precedes the page
 265   mark.  Text_IO maintains an internal flag so that subsequent read
 266   operations properly handle the logical position which is unchanged by
 267   the ``End_Of_Page`` call.
 268
 269 *
 270   After a call to ``End_Of_File`` that returns ``True``, if the
 271   Text_IO file was positioned before the line mark at the end of file
 272   before the call, then the logical position is unchanged, but the stream
 273   is physically positioned right at the end of file (past the line mark,
 274   and past a possible page mark following the line mark.  Again Text_IO
 275   maintains internal flags so that subsequent read operations properly
 276   handle the logical position.
 277
 278 These discrepancies have no effect on the observable behavior of
 279 Text_IO, but if a single Ada stream is shared between a C program and
 280 Ada program, or shared (using ``shared=yes`` in the form string)
 281 between two Ada files, then the difference may be observable in some
 282 situations.
 283
 284 .. _Reading_and_Writing_Non-Regular_Files:
 285
 286 Reading and Writing Non-Regular Files
 287 -------------------------------------
 288
 289 A non-regular file is a device (such as a keyboard), or a pipe.  Text_IO
 290 can be used for reading and writing.  Writing is not affected and the
 291 sequence of characters output is identical to the normal file case, but
 292 for reading, the behavior of Text_IO is modified to avoid undesirable
 293 look-ahead as follows:
 294
 295 An input file that is not a regular file is considered to have no page
 296 marks.  Any ``Ascii.FF`` characters (the character normally used for a
 297 page mark) appearing in the file are considered to be data
 298 characters.  In particular:
 299
 300 *
 301   ``Get_Line`` and ``Skip_Line`` do not test for a page mark
 302   following a line mark.  If a page mark appears, it will be treated as a
 303   data character.
 304
 305 *
 306   This avoids the need to wait for an extra character to be typed or
 307   entered from the pipe to complete one of these operations.
 308
 309 *
 310   ``End_Of_Page`` always returns ``False``
 311
 312 *
 313   ``End_Of_File`` will return ``False`` if there is a page mark at
 314   the end of the file.
 315
 316 Output to non-regular files is the same as for regular files.  Page marks
 317 may be written to non-regular files using ``New_Page``, but as noted
 318 above they will not be treated as page marks on input if the output is
 319 piped to another Ada program.
 320
 321 Another important discrepancy when reading non-regular files is that the end
 322 of file indication is not 'sticky'.  If an end of file is entered, e.g., by
 323 pressing the :kbd:`EOT` key,
 324 then end of file
 325 is signaled once (i.e., the test ``End_Of_File``
 326 will yield ``True``, or a read will
 327 raise ``End_Error``), but then reading can resume
 328 to read data past that end of
 329 file indication, until another end of file indication is entered.
 330
 331 .. _Get_Immediate:
 332
 333 Get_Immediate
 334 -------------
 335
 336 .. index:: Get_Immediate
 337
 338 Get_Immediate returns the next character (including control characters)
 339 from the input file.  In particular, Get_Immediate will return LF or FF
 340 characters used as line marks or page marks.  Such operations leave the
 341 file positioned past the control character, and it is thus not treated
 342 as having its normal function.  This means that page, line and column
 343 counts after this kind of Get_Immediate call are set as though the mark
 344 did not occur.  In the case where a Get_Immediate leaves the file
 345 positioned between the line mark and page mark (which is not normally
 346 possible), it is undefined whether the FF character will be treated as a
 347 page mark.
 348
 349 .. _Treating_Text_IO_Files_as_Streams:
 350
 351 Treating Text_IO Files as Streams
 352 ---------------------------------
 353
 354 .. index:: Stream files
 355
 356 The package ``Text_IO.Streams`` allows a ``Text_IO`` file to be treated
 357 as a stream.  Data written to a ``Text_IO`` file in this stream mode is
 358 binary data.  If this binary data contains bytes 16#0A# (``LF``) or
 359 16#0C# (``FF``), the resulting file may have non-standard
 360 format.  Similarly if read operations are used to read from a Text_IO
 361 file treated as a stream, then ``LF`` and ``FF`` characters may be
 362 skipped and the effect is similar to that described above for
 363 ``Get_Immediate``.
 364
 365 .. _Text_IO_Extensions:
 366
 367 Text_IO Extensions
 368 ------------------
 369
 370 .. index:: Text_IO extensions
 371
 372 A package GNAT.IO_Aux in the GNAT library provides some useful extensions
 373 to the standard ``Text_IO`` package:
 374
 375 * function File_Exists (Name : String) return Boolean;
 376   Determines if a file of the given name exists.
 377
 378 * function Get_Line return String;
 379   Reads a string from the standard input file.  The value returned is exactly
 380   the length of the line that was read.
 381
 382 * function Get_Line (File : Ada.Text_IO.File_Type) return String;
 383   Similar, except that the parameter File specifies the file from which
 384   the string is to be read.
 385
 386
 387 .. _Text_IO_Facilities_for_Unbounded_Strings:
 388
 389 Text_IO Facilities for Unbounded Strings
 390 ----------------------------------------
 391
 392 .. index:: Text_IO for unbounded strings
 393
 394 .. index:: Unbounded_String, Text_IO operations
 395
 396 The package ``Ada.Strings.Unbounded.Text_IO``
 397 in library files :file:`a-suteio.ads/adb` contains some GNAT-specific
 398 subprograms useful for Text_IO operations on unbounded strings:
 399
 400
 401 * function Get_Line (File : File_Type) return Unbounded_String;
 402   Reads a line from the specified file
 403   and returns the result as an unbounded string.
 404
 405 * procedure Put (File : File_Type; U : Unbounded_String);
 406   Writes the value of the given unbounded string to the specified file
 407   Similar to the effect of
 408   ``Put (To_String (U))`` except that an extra copy is avoided.
 409
 410 * procedure Put_Line (File : File_Type; U : Unbounded_String);
 411   Writes the value of the given unbounded string to the specified file,
 412   followed by a ``New_Line``.
 413   Similar to the effect of ``Put_Line (To_String (U))`` except
 414   that an extra copy is avoided.
 415
 416 In the above procedures, ``File`` is of type ``Ada.Text_IO.File_Type``
 417 and is optional.  If the parameter is omitted, then the standard input or
 418 output file is referenced as appropriate.
 419
 420 The package ``Ada.Strings.Wide_Unbounded.Wide_Text_IO`` in library
 421 files :file:`a-swuwti.ads` and :file:`a-swuwti.adb` provides similar extended
 422 ``Wide_Text_IO`` functionality for unbounded wide strings.
 423
 424 The package ``Ada.Strings.Wide_Wide_Unbounded.Wide_Wide_Text_IO`` in library
 425 files :file:`a-szuzti.ads` and :file:`a-szuzti.adb` provides similar extended
 426 ``Wide_Wide_Text_IO`` functionality for unbounded wide wide strings.
 427
 428 .. _Wide_Text_IO:
 429
 430 Wide_Text_IO
 431 ============
 432
 433 ``Wide_Text_IO`` is similar in most respects to Text_IO, except that
 434 both input and output files may contain special sequences that represent
 435 wide character values.  The encoding scheme for a given file may be
 436 specified using a FORM parameter:
 437
 438
 439 ::
 440
 441   WCEM=`x`
 442
 443
 444 as part of the FORM string (WCEM = wide character encoding method),
 445 where ``x`` is one of the following characters
 446
 447 ========== ====================
 448 Character  Encoding
 449 ========== ====================
 450 *h*        Hex ESC encoding
 451 *u*        Upper half encoding
 452 *s*        Shift-JIS encoding
 453 *e*        EUC Encoding
 454 *8*        UTF-8 encoding
 455 *b*        Brackets encoding
 456 ========== ====================
 457
 458 The encoding methods match those that
 459 can be used in a source
 460 program, but there is no requirement that the encoding method used for
 461 the source program be the same as the encoding method used for files,
 462 and different files may use different encoding methods.
 463
 464 The default encoding method for the standard files, and for opened files
 465 for which no WCEM parameter is given in the FORM string matches the
 466 wide character encoding specified for the main program (the default
 467 being brackets encoding if no coding method was specified with -gnatW).
 468
 469
 470
 471 *Hex Coding*
 472   In this encoding, a wide character is represented by a five character
 473   sequence:
 474
 475
 476 ::
 477
 478     ESC a b c d
 479
 480 ..
 481
 482   where ``a``, ``b``, ``c``, ``d`` are the four hexadecimal
 483   characters (using upper case letters) of the wide character code.  For
 484   example, ESC A345 is used to represent the wide character with code
 485   16#A345#.  This scheme is compatible with use of the full
 486   ``Wide_Character`` set.
 487
 488
 489 *Upper Half Coding*
 490   The wide character with encoding 16#abcd#, where the upper bit is on
 491   (i.e., a is in the range 8-F) is represented as two bytes 16#ab# and
 492   16#cd#.  The second byte may never be a format control character, but is
 493   not required to be in the upper half.  This method can be also used for
 494   shift-JIS or EUC where the internal coding matches the external coding.
 495
 496
 497 *Shift JIS Coding*
 498   A wide character is represented by a two character sequence 16#ab# and
 499   16#cd#, with the restrictions described for upper half encoding as
 500   described above.  The internal character code is the corresponding JIS
 501   character according to the standard algorithm for Shift-JIS
 502   conversion.  Only characters defined in the JIS code set table can be
 503   used with this encoding method.
 504
 505
 506 *EUC Coding*
 507   A wide character is represented by a two character sequence 16#ab# and
 508   16#cd#, with both characters being in the upper half.  The internal
 509   character code is the corresponding JIS character according to the EUC
 510   encoding algorithm.  Only characters defined in the JIS code set table
 511   can be used with this encoding method.
 512
 513
 514 *UTF-8 Coding*
 515   A wide character is represented using
 516   UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO
 517   10646-1/Am.2.  Depending on the character value, the representation
 518   is a one, two, or three byte sequence:
 519
 520
 521 ::
 522
 523     16#0000#-16#007f#: 2#0xxxxxxx#
 524     16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx#
 525     16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#
 526
 527 ..
 528
 529   where the ``xxx`` bits correspond to the left-padded bits of the
 530   16-bit character value.  Note that all lower half ASCII characters
 531   are represented as ASCII bytes and all upper half characters and
 532   other wide characters are represented as sequences of upper-half
 533   (The full UTF-8 scheme allows for encoding 31-bit characters as
 534   6-byte sequences, but in this implementation, all UTF-8 sequences
 535   of four or more bytes length will raise a Constraint_Error, as
 536   will all invalid UTF-8 sequences.)
 537
 538
 539 *Brackets Coding*
 540   In this encoding, a wide character is represented by the following eight
 541   character sequence:
 542
 543
 544 ::
 545
 546     [ " a b c d " ]
 547
 548 ..
 549
 550   where ``a``, ``b``, ``c``, ``d`` are the four hexadecimal
 551   characters (using uppercase letters) of the wide character code.  For
 552   example, ``["A345"]`` is used to represent the wide character with code
 553   ``16#A345#``.
 554   This scheme is compatible with use of the full Wide_Character set.
 555   On input, brackets coding can also be used for upper half characters,
 556   e.g., ``["C1"]`` for lower case a.  However, on output, brackets notation
 557   is only used for wide characters with a code greater than ``16#FF#``.
 558
 559   Note that brackets coding is not normally used in the context of
 560   Wide_Text_IO or Wide_Wide_Text_IO, since it is really just designed as
 561   a portable way of encoding source files. In the context of Wide_Text_IO
 562   or Wide_Wide_Text_IO, it can only be used if the file does not contain
 563   any instance of the left bracket character other than to encode wide
 564   character values using the brackets encoding method. In practice it is
 565   expected that some standard wide character encoding method such
 566   as UTF-8 will be used for text input output.
 567
 568   If brackets notation is used, then any occurrence of a left bracket
 569   in the input file which is not the start of a valid wide character
 570   sequence will cause Constraint_Error to be raised. It is possible to
 571   encode a left bracket as ["5B"] and Wide_Text_IO and Wide_Wide_Text_IO
 572   input will interpret this as a left bracket.
 573
 574   However, when a left bracket is output, it will be output as a left bracket
 575   and not as ["5B"]. We make this decision because for normal use of
 576   Wide_Text_IO for outputting messages, it is unpleasant to clobber left
 577   brackets. For example, if we write:
 578
 579
 580   .. code-block:: ada
 581
 582        Put_Line ("Start of output [first run]");
 583
 584
 585   we really do not want to have the left bracket in this message clobbered so
 586   that the output reads:
 587
 588
 589 ::
 590
 591        Start of output ["5B"]first run]
 592
 593 ..
 594
 595   In practice brackets encoding is reasonably useful for normal Put_Line use
 596   since we won't get confused between left brackets and wide character
 597   sequences in the output. But for input, or when files are written out
 598   and read back in, it really makes better sense to use one of the standard
 599   encoding methods such as UTF-8.
 600
 601
 602 For the coding schemes other than UTF-8, Hex, or Brackets encoding,
 603 not all wide character
 604 values can be represented.  An attempt to output a character that cannot
 605 be represented using the encoding scheme for the file causes
 606 Constraint_Error to be raised.  An invalid wide character sequence on
 607 input also causes Constraint_Error to be raised.
 608
 609 .. _Stream_Pointer_Positioning_1:
 610
 611 Stream Pointer Positioning
 612 --------------------------
 613
 614 ``Ada.Wide_Text_IO`` is similar to ``Ada.Text_IO`` in its handling
 615 of stream pointer positioning (:ref:`Text_IO`).  There is one additional
 616 case:
 617
 618 If ``Ada.Wide_Text_IO.Look_Ahead`` reads a character outside the
 619 normal lower ASCII set, i.e. a character in the range:
 620
 621
 622 .. code-block:: ada
 623
 624   Wide_Character'Val (16#0080#) .. Wide_Character'Val (16#FFFF#)
 625
 626
 627 then although the logical position of the file pointer is unchanged by
 628 the ``Look_Ahead`` call, the stream is physically positioned past the
 629 wide character sequence.  Again this is to avoid the need for buffering
 630 or backup, and all ``Wide_Text_IO`` routines check the internal
 631 indication that this situation has occurred so that this is not visible
 632 to a normal program using ``Wide_Text_IO``.  However, this discrepancy
 633 can be observed if the wide text file shares a stream with another file.
 634
 635 .. _Reading_and_Writing_Non-Regular_Files_1:
 636
 637 Reading and Writing Non-Regular Files
 638 -------------------------------------
 639
 640 As in the case of Text_IO, when a non-regular file is read, it is
 641 assumed that the file contains no page marks (any form characters are
 642 treated as data characters), and ``End_Of_Page`` always returns
 643 ``False``.  Similarly, the end of file indication is not sticky, so
 644 it is possible to read beyond an end of file.
 645
 646 .. _Wide_Wide_Text_IO:
 647
 648 Wide_Wide_Text_IO
 649 =================
 650
 651 ``Wide_Wide_Text_IO`` is similar in most respects to Text_IO, except that
 652 both input and output files may contain special sequences that represent
 653 wide wide character values.  The encoding scheme for a given file may be
 654 specified using a FORM parameter:
 655
 656
 657 ::
 658
 659   WCEM=`x`
 660
 661
 662 as part of the FORM string (WCEM = wide character encoding method),
 663 where ``x`` is one of the following characters
 664
 665 ========== ====================
 666 Character  Encoding
 667 ========== ====================
 668 *h*        Hex ESC encoding
 669 *u*        Upper half encoding
 670 *s*        Shift-JIS encoding
 671 *e*        EUC Encoding
 672 *8*        UTF-8 encoding
 673 *b*        Brackets encoding
 674 ========== ====================
 675
 676
 677 The encoding methods match those that
 678 can be used in a source
 679 program, but there is no requirement that the encoding method used for
 680 the source program be the same as the encoding method used for files,
 681 and different files may use different encoding methods.
 682
 683 The default encoding method for the standard files, and for opened files
 684 for which no WCEM parameter is given in the FORM string matches the
 685 wide character encoding specified for the main program (the default
 686 being brackets encoding if no coding method was specified with -gnatW).
 687
 688
 689
 690 *UTF-8 Coding*
 691   A wide character is represented using
 692   UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO
 693   10646-1/Am.2.  Depending on the character value, the representation
 694   is a one, two, three, or four byte sequence:
 695
 696
 697 ::
 698
 699     16#000000#-16#00007f#: 2#0xxxxxxx#
 700     16#000080#-16#0007ff#: 2#110xxxxx# 2#10xxxxxx#
 701     16#000800#-16#00ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#
 702     16#010000#-16#10ffff#: 2#11110xxx# 2#10xxxxxx# 2#10xxxxxx# 2#10xxxxxx#
 703
 704 ..
 705
 706   where the ``xxx`` bits correspond to the left-padded bits of the
 707   21-bit character value.  Note that all lower half ASCII characters
 708   are represented as ASCII bytes and all upper half characters and
 709   other wide characters are represented as sequences of upper-half
 710   characters.
 711
 712
 713 *Brackets Coding*
 714   In this encoding, a wide wide character is represented by the following eight
 715   character sequence if is in wide character range
 716
 717
 718 ::
 719
 720     [ " a b c d " ]
 721
 722 ..
 723
 724   and by the following ten character sequence if not
 725
 726
 727 ::
 728
 729     [ " a b c d e f " ]
 730
 731 ..
 732
 733   where ``a``, ``b``, ``c``, ``d``, ``e``, and ``f``
 734   are the four or six hexadecimal
 735   characters (using uppercase letters) of the wide wide character code.  For
 736   example, ``["01A345"]`` is used to represent the wide wide character
 737   with code ``16#01A345#``.
 738
 739   This scheme is compatible with use of the full Wide_Wide_Character set.
 740   On input, brackets coding can also be used for upper half characters,
 741   e.g., ``["C1"]`` for lower case a.  However, on output, brackets notation
 742   is only used for wide characters with a code greater than ``16#FF#``.
 743
 744
 745 If is also possible to use the other Wide_Character encoding methods,
 746 such as Shift-JIS, but the other schemes cannot support the full range
 747 of wide wide characters.
 748 An attempt to output a character that cannot
 749 be represented using the encoding scheme for the file causes
 750 Constraint_Error to be raised.  An invalid wide character sequence on
 751 input also causes Constraint_Error to be raised.
 752
 753 .. _Stream_Pointer_Positioning_2:
 754
 755 Stream Pointer Positioning
 756 --------------------------
 757
 758 ``Ada.Wide_Wide_Text_IO`` is similar to ``Ada.Text_IO`` in its handling
 759 of stream pointer positioning (:ref:`Text_IO`).  There is one additional
 760 case:
 761
 762 If ``Ada.Wide_Wide_Text_IO.Look_Ahead`` reads a character outside the
 763 normal lower ASCII set, i.e. a character in the range:
 764
 765
 766 .. code-block:: ada
 767
 768   Wide_Wide_Character'Val (16#0080#) .. Wide_Wide_Character'Val (16#10FFFF#)
 769
 770
 771 then although the logical position of the file pointer is unchanged by
 772 the ``Look_Ahead`` call, the stream is physically positioned past the
 773 wide character sequence.  Again this is to avoid the need for buffering
 774 or backup, and all ``Wide_Wide_Text_IO`` routines check the internal
 775 indication that this situation has occurred so that this is not visible
 776 to a normal program using ``Wide_Wide_Text_IO``.  However, this discrepancy
 777 can be observed if the wide text file shares a stream with another file.
 778
 779 .. _Reading_and_Writing_Non-Regular_Files_2:
 780
 781 Reading and Writing Non-Regular Files
 782 -------------------------------------
 783
 784 As in the case of Text_IO, when a non-regular file is read, it is
 785 assumed that the file contains no page marks (any form characters are
 786 treated as data characters), and ``End_Of_Page`` always returns
 787 ``False``.  Similarly, the end of file indication is not sticky, so
 788 it is possible to read beyond an end of file.
 789
 790 .. _Stream_IO:
 791
 792 Stream_IO
 793 =========
 794
 795 A stream file is a sequence of bytes, where individual elements are
 796 written to the file as described in the Ada Reference Manual.  The type
 797 ``Stream_Element`` is simply a byte.  There are two ways to read or
 798 write a stream file.
 799
 800 *
 801   The operations ``Read`` and ``Write`` directly read or write a
 802   sequence of stream elements with no control information.
 803
 804 *
 805   The stream attributes applied to a stream file transfer data in the
 806   manner described for stream attributes.
 807
 808 .. _Text_Translation:
 809
 810 Text Translation
 811 ================
 812
 813 ``Text_Translation=xxx`` may be used as the Form parameter
 814 passed to Text_IO.Create and Text_IO.Open. ``Text_Translation=xxx``
 815 has no effect on Unix systems. Possible values are:
 816
 817
 818 *
 819   ``Yes`` or ``Text`` is the default, which means to
 820   translate LF to/from CR/LF on Windows systems.
 821
 822   ``No`` disables this translation; i.e. it
 823   uses binary mode. For output files, ``Text_Translation=No``
 824   may be used to create Unix-style files on
 825   Windows.
 826
 827 *
 828   ``wtext`` translation enabled in Unicode mode.
 829   (corresponds to _O_WTEXT).
 830
 831 *
 832   ``u8text`` translation enabled in Unicode UTF-8 mode.
 833   (corresponds to O_U8TEXT).
 834
 835 *
 836   ``u16text`` translation enabled in Unicode UTF-16
 837   mode. (corresponds to_O_U16TEXT).
 838
 839
 840 .. _Shared_Files:
 841
 842 Shared Files
 843 ============
 844
 845 Section A.14 of the Ada Reference Manual allows implementations to
 846 provide a wide variety of behavior if an attempt is made to access the
 847 same external file with two or more internal files.
 848
 849 To provide a full range of functionality, while at the same time
 850 minimizing the problems of portability caused by this implementation
 851 dependence, GNAT handles file sharing as follows:
 852
 853 *
 854   In the absence of a ``shared=xxx`` form parameter, an attempt
 855   to open two or more files with the same full name is considered an error
 856   and is not supported.  The exception ``Use_Error`` will be
 857   raised.  Note that a file that is not explicitly closed by the program
 858   remains open until the program terminates.
 859
 860 *
 861   If the form parameter ``shared=no`` appears in the form string, the
 862   file can be opened or created with its own separate stream identifier,
 863   regardless of whether other files sharing the same external file are
 864   opened.  The exact effect depends on how the C stream routines handle
 865   multiple accesses to the same external files using separate streams.
 866
 867 *
 868   If the form parameter ``shared=yes`` appears in the form string for
 869   each of two or more files opened using the same full name, the same
 870   stream is shared between these files, and the semantics are as described
 871   in Ada Reference Manual, Section A.14.
 872
 873 When a program that opens multiple files with the same name is ported
 874 from another Ada compiler to GNAT, the effect will be that
 875 ``Use_Error`` is raised.
 876
 877 The documentation of the original compiler and the documentation of the
 878 program should then be examined to determine if file sharing was
 879 expected, and ``shared=xxx`` parameters added to ``Open``
 880 and ``Create`` calls as required.
 881
 882 When a program is ported from GNAT to some other Ada compiler, no
 883 special attention is required unless the ``shared=xxx`` form
 884 parameter is used in the program.  In this case, you must examine the
 885 documentation of the new compiler to see if it supports the required
 886 file sharing semantics, and form strings modified appropriately.  Of
 887 course it may be the case that the program cannot be ported if the
 888 target compiler does not support the required functionality.  The best
 889 approach in writing portable code is to avoid file sharing (and hence
 890 the use of the ``shared=xxx`` parameter in the form string)
 891 completely.
 892
 893 One common use of file sharing in Ada 83 is the use of instantiations of
 894 Sequential_IO on the same file with different types, to achieve
 895 heterogeneous input-output.  Although this approach will work in GNAT if
 896 ``shared=yes`` is specified, it is preferable in Ada to use Stream_IO
 897 for this purpose (using the stream attributes)
 898
 899 .. _Filenames_encoding:
 900
 901 Filenames encoding
 902 ==================
 903
 904 An encoding form parameter can be used to specify the filename
 905 encoding ``encoding=xxx``.
 906
 907 *
 908   If the form parameter ``encoding=utf8`` appears in the form string, the
 909   filename must be encoded in UTF-8.
 910
 911 *
 912   If the form parameter ``encoding=8bits`` appears in the form
 913   string, the filename must be a standard 8bits string.
 914
 915 In the absence of a ``encoding=xxx`` form parameter, the
 916 encoding is controlled by the ``GNAT_CODE_PAGE`` environment
 917 variable. And if not set ``utf8`` is assumed.
 918
 919
 920
 921 *CP_ACP*
 922   The current system Windows ANSI code page.
 923
 924 *CP_UTF8*
 925   UTF-8 encoding
 926
 927 This encoding form parameter is only supported on the Windows
 928 platform. On the other Operating Systems the run-time is supporting
 929 UTF-8 natively.
 930
 931 .. _File_content_encoding:
 932
 933 File content encoding
 934 =====================
 935
 936 For text files it is possible to specify the encoding to use. This is
 937 controlled by the by the ``GNAT_CCS_ENCODING`` environment
 938 variable. And if not set ``TEXT`` is assumed.
 939
 940 The possible values are those supported on Windows:
 941
 942
 943
 944 *TEXT*
 945   Translated text mode
 946
 947 *WTEXT*
 948   Translated unicode encoding
 949
 950 *U16TEXT*
 951   Unicode 16-bit encoding
 952
 953 *U8TEXT*
 954   Unicode 8-bit encoding
 955
 956 This encoding is only supported on the Windows platform.
 957
 958 .. _Open_Modes:
 959
 960 Open Modes
 961 ==========
 962
 963 ``Open`` and ``Create`` calls result in a call to ``fopen``
 964 using the mode shown in the following table:
 965
 966 +----------------------------+---------------+------------------+
 967 |           ``Open`` and ``Create`` Call Modes                  |
 968 +----------------------------+---------------+------------------+
 969 |                            |   **OPEN**    |     **CREATE**   |
 970 +============================+===============+==================+
 971 | Append_File                |   "r+"        |    "w+"          |
 972 +----------------------------+---------------+------------------+
 973 | In_File                    |   "r"         |    "w+"          |
 974 +----------------------------+---------------+------------------+
 975 | Out_File (Direct_IO)       |   "r+"        |    "w"           |
 976 +----------------------------+---------------+------------------+
 977 | Out_File (all other cases) |   "w"         |    "w"           |
 978 +----------------------------+---------------+------------------+
 979 | Inout_File                 |   "r+"        |    "w+"          |
 980 +----------------------------+---------------+------------------+
 981
 982
 983 If text file translation is required, then either ``b`` or ``t``
 984 is added to the mode, depending on the setting of Text.  Text file
 985 translation refers to the mapping of CR/LF sequences in an external file
 986 to LF characters internally.  This mapping only occurs in DOS and
 987 DOS-like systems, and is not relevant to other systems.
 988
 989 A special case occurs with Stream_IO.  As shown in the above table, the
 990 file is initially opened in ``r`` or ``w`` mode for the
 991 ``In_File`` and ``Out_File`` cases.  If a ``Set_Mode`` operation
 992 subsequently requires switching from reading to writing or vice-versa,
 993 then the file is reopened in ``r+`` mode to permit the required operation.
 994
 995 .. _Operations_on_C_Streams:
 996
 997 Operations on C Streams
 998 =======================
 999
1000 The package ``Interfaces.C_Streams`` provides an Ada program with direct
1001 access to the C library functions for operations on C streams:
1002
1003
1004 .. code-block:: ada
1005
1006   package Interfaces.C_Streams is
1007     -- Note: the reason we do not use the types that are in
1008     -- Interfaces.C is that we want to avoid dragging in the
1009     -- code in this unit if possible.
1010     subtype chars is System.Address;
1011     -- Pointer to null-terminated array of characters
1012     subtype FILEs is System.Address;
1013     -- Corresponds to the C type FILE*
1014     subtype voids is System.Address;
1015     -- Corresponds to the C type void*
1016     subtype int is Integer;
1017     subtype long is Long_Integer;
1018     -- Note: the above types are subtypes deliberately, and it
1019     -- is part of this spec that the above correspondences are
1020     -- guaranteed.  This means that it is legitimate to, for
1021     -- example, use Integer instead of int.  We provide these
1022     -- synonyms for clarity, but in some cases it may be
1023     -- convenient to use the underlying types (for example to
1024     -- avoid an unnecessary dependency of a spec on the spec
1025     -- of this unit).
1026     type size_t is mod 2 ** Standard'Address_Size;
1027     NULL_Stream : constant FILEs;
1028     -- Value returned (NULL in C) to indicate an
1029     -- fdopen/fopen/tmpfile error
1030     ----------------------------------
1031     -- Constants Defined in stdio.h --
1032     ----------------------------------
1033     EOF : constant int;
1034     -- Used by a number of routines to indicate error or
1035     -- end of file
1036     IOFBF : constant int;
1037     IOLBF : constant int;
1038     IONBF : constant int;
1039     -- Used to indicate buffering mode for setvbuf call
1040     SEEK_CUR : constant int;
1041     SEEK_END : constant int;
1042     SEEK_SET : constant int;
1043     -- Used to indicate origin for fseek call
1044     function stdin return FILEs;
1045     function stdout return FILEs;
1046     function stderr return FILEs;
1047     -- Streams associated with standard files
1048     --------------------------
1049     -- Standard C functions --
1050     --------------------------
1051     -- The functions selected below are ones that are
1052     -- available in UNIX (but not necessarily in ANSI C).
1053     -- These are very thin interfaces
1054     -- which copy exactly the C headers.  For more
1055     -- documentation on these functions, see the Microsoft C
1056     -- "Run-Time Library Reference" (Microsoft Press, 1990,
1057     -- ISBN 1-55615-225-6), which includes useful information
1058     -- on system compatibility.
1059     procedure clearerr (stream : FILEs);
1060     function fclose (stream : FILEs) return int;
1061     function fdopen (handle : int; mode : chars) return FILEs;
1062     function feof (stream : FILEs) return int;
1063     function ferror (stream : FILEs) return int;
1064     function fflush (stream : FILEs) return int;
1065     function fgetc (stream : FILEs) return int;
1066     function fgets (strng : chars; n : int; stream : FILEs)
1067         return chars;
1068     function fileno (stream : FILEs) return int;
1069     function fopen (filename : chars; Mode : chars)
1070         return FILEs;
1071     -- Note: to maintain target independence, use
1072     -- text_translation_required, a boolean variable defined in
1073     -- a-sysdep.c to deal with the target dependent text
1074     -- translation requirement.  If this variable is set,
1075     -- then  b/t should be appended to the standard mode
1076     -- argument to set the text translation mode off or on
1077     -- as required.
1078     function fputc (C : int; stream : FILEs) return int;
1079     function fputs (Strng : chars; Stream : FILEs) return int;
1080     function fread
1081        (buffer : voids;
1082         size : size_t;
1083         count : size_t;
1084         stream : FILEs)
1085         return size_t;
1086     function freopen
1087        (filename : chars;
1088         mode : chars;
1089         stream : FILEs)
1090         return FILEs;
1091     function fseek
1092        (stream : FILEs;
1093         offset : long;
1094         origin : int)
1095         return int;
1096     function ftell (stream : FILEs) return long;
1097     function fwrite
1098        (buffer : voids;
1099         size : size_t;
1100         count : size_t;
1101         stream : FILEs)
1102         return size_t;
1103     function isatty (handle : int) return int;
1104     procedure mktemp (template : chars);
1105     -- The return value (which is just a pointer to template)
1106     -- is discarded
1107     procedure rewind (stream : FILEs);
1108     function rmtmp return int;
1109     function setvbuf
1110        (stream : FILEs;
1111         buffer : chars;
1112         mode : int;
1113         size : size_t)
1114         return int;
1115
1116     function tmpfile return FILEs;
1117     function ungetc (c : int; stream : FILEs) return int;
1118     function unlink (filename : chars) return int;
1119     ---------------------
1120     -- Extra functions --
1121     ---------------------
1122     -- These functions supply slightly thicker bindings than
1123     -- those above.  They are derived from functions in the
1124     -- C Run-Time Library, but may do a bit more work than
1125     -- just directly calling one of the Library functions.
1126     function is_regular_file (handle : int) return int;
1127     -- Tests if given handle is for a regular file (result 1)
1128     -- or for a non-regular file (pipe or device, result 0).
1129     ---------------------------------
1130     -- Control of Text/Binary Mode --
1131     ---------------------------------
1132     -- If text_translation_required is true, then the following
1133     -- functions may be used to dynamically switch a file from
1134     -- binary to text mode or vice versa.  These functions have
1135     -- no effect if text_translation_required is false (i.e., in
1136     -- normal UNIX mode).  Use fileno to get a stream handle.
1137     procedure set_binary_mode (handle : int);
1138     procedure set_text_mode (handle : int);
1139     ----------------------------
1140     -- Full Path Name support --
1141     ----------------------------
1142     procedure full_name (nam : chars; buffer : chars);
1143     -- Given a NUL terminated string representing a file
1144     -- name, returns in buffer a NUL terminated string
1145     -- representing the full path name for the file name.
1146     -- On systems where it is relevant the   drive is also
1147     -- part of the full path name.  It is the responsibility
1148     -- of the caller to pass an actual parameter for buffer
1149     -- that is big enough for any full path name.  Use
1150     -- max_path_len given below as the size of buffer.
1151     max_path_len : integer;
1152     -- Maximum length of an allowable full path name on the
1153     -- system, including a terminating NUL character.
1154   end Interfaces.C_Streams;
1155
1156
1157 .. _Interfacing_to_C_Streams:
1158
1159 Interfacing to C Streams
1160 ========================
1161
1162 The packages in this section permit interfacing Ada files to C Stream
1163 operations.
1164
1165
1166 .. code-block:: ada
1167
1168    with Interfaces.C_Streams;
1169    package Ada.Sequential_IO.C_Streams is
1170       function C_Stream (F : File_Type)
1171          return Interfaces.C_Streams.FILEs;
1172       procedure Open
1173         (File : in out File_Type;
1174          Mode : in File_Mode;
1175          C_Stream : in Interfaces.C_Streams.FILEs;
1176          Form : in String := "");
1177    end Ada.Sequential_IO.C_Streams;
1178
1179     with Interfaces.C_Streams;
1180     package Ada.Direct_IO.C_Streams is
1181        function C_Stream (F : File_Type)
1182           return Interfaces.C_Streams.FILEs;
1183        procedure Open
1184          (File : in out File_Type;
1185           Mode : in File_Mode;
1186           C_Stream : in Interfaces.C_Streams.FILEs;
1187           Form : in String := "");
1188     end Ada.Direct_IO.C_Streams;
1189
1190     with Interfaces.C_Streams;
1191     package Ada.Text_IO.C_Streams is
1192        function C_Stream (F : File_Type)
1193           return Interfaces.C_Streams.FILEs;
1194        procedure Open
1195          (File : in out File_Type;
1196           Mode : in File_Mode;
1197           C_Stream : in Interfaces.C_Streams.FILEs;
1198           Form : in String := "");
1199     end Ada.Text_IO.C_Streams;
1200
1201     with Interfaces.C_Streams;
1202     package Ada.Wide_Text_IO.C_Streams is
1203        function C_Stream (F : File_Type)
1204           return Interfaces.C_Streams.FILEs;
1205        procedure Open
1206          (File : in out File_Type;
1207           Mode : in File_Mode;
1208           C_Stream : in Interfaces.C_Streams.FILEs;
1209           Form : in String := "");
1210    end Ada.Wide_Text_IO.C_Streams;
1211
1212     with Interfaces.C_Streams;
1213     package Ada.Wide_Wide_Text_IO.C_Streams is
1214        function C_Stream (F : File_Type)
1215           return Interfaces.C_Streams.FILEs;
1216        procedure Open
1217          (File : in out File_Type;
1218           Mode : in File_Mode;
1219           C_Stream : in Interfaces.C_Streams.FILEs;
1220           Form : in String := "");
1221    end Ada.Wide_Wide_Text_IO.C_Streams;
1222
1223    with Interfaces.C_Streams;
1224    package Ada.Stream_IO.C_Streams is
1225       function C_Stream (F : File_Type)
1226          return Interfaces.C_Streams.FILEs;
1227       procedure Open
1228         (File : in out File_Type;
1229          Mode : in File_Mode;
1230          C_Stream : in Interfaces.C_Streams.FILEs;
1231          Form : in String := "");
1232    end Ada.Stream_IO.C_Streams;
1233
1234
1235 In each of these six packages, the ``C_Stream`` function obtains the
1236 ``FILE`` pointer from a currently opened Ada file.  It is then
1237 possible to use the ``Interfaces.C_Streams`` package to operate on
1238 this stream, or the stream can be passed to a C program which can
1239 operate on it directly.  Of course the program is responsible for
1240 ensuring that only appropriate sequences of operations are executed.
1241
1242 One particular use of relevance to an Ada program is that the
1243 ``setvbuf`` function can be used to control the buffering of the
1244 stream used by an Ada file.  In the absence of such a call the standard
1245 default buffering is used.
1246
1247 The ``Open`` procedures in these packages open a file giving an
1248 existing C Stream instead of a file name.  Typically this stream is
1249 imported from a C program, allowing an Ada file to operate on an
1250 existing C file.