manual/io.texi

   1 @node I/O Overview, I/O on Streams, Pattern Matching, Top
   2 @c %MENU% Introduction to the I/O facilities
   3 @chapter Input/Output Overview
   4
   5 Most programs need to do either input (reading data) or output (writing
   6 data), or most frequently both, in order to do anything useful.  The GNU
   7 C library provides such a large selection of input and output functions
   8 that the hardest part is often deciding which function is most
   9 appropriate!
  10
  11 This chapter introduces concepts and terminology relating to input
  12 and output.  Other chapters relating to the GNU I/O facilities are:
  13
  14 @itemize @bullet
  15 @item
  16 @ref{I/O on Streams}, which covers the high-level functions
  17 that operate on streams, including formatted input and output.
  18
  19 @item
  20 @ref{Low-Level I/O}, which covers the basic I/O and control
  21 functions on file descriptors.
  22
  23 @item
  24 @ref{File System Interface}, which covers functions for operating on
  25 directories and for manipulating file attributes such as access modes
  26 and ownership.
  27
  28 @item
  29 @ref{Pipes and FIFOs}, which includes information on the basic interprocess
  30 communication facilities.
  31
  32 @item
  33 @ref{Sockets}, which covers a more complicated interprocess communication
  34 facility with support for networking.
  35
  36 @item
  37 @ref{Low-Level Terminal Interface}, which covers functions for changing
  38 how input and output to terminals or other serial devices are processed.
  39 @end itemize
  40
  41
  42 @menu
  43 * I/O Concepts::       Some basic information and terminology.
  44 * File Names::         How to refer to a file.
  45 @end menu
  46
  47 @node I/O Concepts, File Names,  , I/O Overview
  48 @section Input/Output Concepts
  49
  50 Before you can read or write the contents of a file, you must establish
  51 a connection or communications channel to the file.  This process is
  52 called @dfn{opening} the file.  You can open a file for reading, writing,
  53 or both.
  54 @cindex opening a file
  55
  56 The connection to an open file is represented either as a stream or as a
  57 file descriptor.  You pass this as an argument to the functions that do
  58 the actual read or write operations, to tell them which file to operate
  59 on.  Certain functions expect streams, and others are designed to
  60 operate on file descriptors.
  61
  62 When you have finished reading to or writing from the file, you can
  63 terminate the connection by @dfn{closing} the file.  Once you have
  64 closed a stream or file descriptor, you cannot do any more input or
  65 output operations on it.
  66
  67 @menu
  68 * Streams and File Descriptors::    The GNU Library provides two ways
  69                                      to access the contents of files.
  70 * File Position::                   The number of bytes from the
  71                                      beginning of the file.
  72 @end menu
  73
  74 @node Streams and File Descriptors, File Position,  , I/O Concepts
  75 @subsection Streams and File Descriptors
  76
  77 When you want to do input or output to a file, you have a choice of two
  78 basic mechanisms for representing the connection between your program
  79 and the file: file descriptors and streams.  File descriptors are
  80 represented as objects of type @code{int}, while streams are represented
  81 as @code{FILE *} objects.
  82
  83 File descriptors provide a primitive, low-level interface to input and
  84 output operations.  Both file descriptors and streams can represent a
  85 connection to a device (such as a terminal), or a pipe or socket for
  86 communicating with another process, as well as a normal file.  But, if
  87 you want to do control operations that are specific to a particular kind
  88 of device, you must use a file descriptor; there are no facilities to
  89 use streams in this way.  You must also use file descriptors if your
  90 program needs to do input or output in special modes, such as
  91 nonblocking (or polled) input (@pxref{File Status Flags}).
  92
  93 Streams provide a higher-level interface, layered on top of the
  94 primitive file descriptor facilities.  The stream interface treats all
  95 kinds of files pretty much alike---the sole exception being the three
  96 styles of buffering that you can choose (@pxref{Stream Buffering}).
  97
  98 The main advantage of using the stream interface is that the set of
  99 functions for performing actual input and output operations (as opposed
 100 to control operations) on streams is much richer and more powerful than
 101 the corresponding facilities for file descriptors.  The file descriptor
 102 interface provides only simple functions for transferring blocks of
 103 characters, but the stream interface also provides powerful formatted
 104 input and output functions (@code{printf} and @code{scanf}) as well as
 105 functions for character- and line-oriented input and output.
 106 @c !!! glibc has dprintf, which lets you do printf on an fd.
 107
 108 Since streams are implemented in terms of file descriptors, you can
 109 extract the file descriptor from a stream and perform low-level
 110 operations directly on the file descriptor.  You can also initially open
 111 a connection as a file descriptor and then make a stream associated with
 112 that file descriptor.
 113
 114 In general, you should stick with using streams rather than file
 115 descriptors, unless there is some specific operation you want to do that
 116 can only be done on a file descriptor.  If you are a beginning
 117 programmer and aren't sure what functions to use, we suggest that you
 118 concentrate on the formatted input functions (@pxref{Formatted Input})
 119 and formatted output functions (@pxref{Formatted Output}).
 120
 121 If you are concerned about portability of your programs to systems other
 122 than GNU, you should also be aware that file descriptors are not as
 123 portable as streams.  You can expect any system running @w{ISO C} to
 124 support streams, but non-GNU systems may not support file descriptors at
 125 all, or may only implement a subset of the GNU functions that operate on
 126 file descriptors.  Most of the file descriptor functions in the GNU
 127 library are included in the POSIX.1 standard, however.
 128
 129 @node File Position,  , Streams and File Descriptors, I/O Concepts
 130 @subsection File Position
 131
 132 One of the attributes of an open file is its @dfn{file position} that
 133 keeps track of where in the file the next character is to be read or
 134 written.  In the GNU system, and all POSIX.1 systems, the file position
 135 is simply an integer representing the number of bytes from the beginning
 136 of the file.
 137
 138 The file position is normally set to the beginning of the file when it
 139 is opened, and each time a character is read or written, the file
 140 position is incremented.  In other words, access to the file is normally
 141 @dfn{sequential}.
 142 @cindex file position
 143 @cindex sequential-access files
 144
 145 Ordinary files permit read or write operations at any position within
 146 the file.  Some other kinds of files may also permit this.  Files which
 147 do permit this are sometimes referred to as @dfn{random-access} files.
 148 You can change the file position using the @code{fseek} function on a
 149 stream (@pxref{File Positioning}) or the @code{lseek} function on a file
 150 descriptor (@pxref{I/O Primitives}).  If you try to change the file
 151 position on a file that doesn't support random access, you get the
 152 @code{ESPIPE} error.
 153 @cindex random-access files
 154
 155 Streams and descriptors that are opened for @dfn{append access} are
 156 treated specially for output: output to such files is @emph{always}
 157 appended sequentially to the @emph{end} of the file, regardless of the
 158 file position.  However, the file position is still used to control where in
 159 the file reading is done.
 160 @cindex append-access files
 161
 162 If you think about it, you'll realize that several programs can read a
 163 given file at the same time.  In order for each program to be able to
 164 read the file at its own pace, each program must have its own file
 165 pointer, which is not affected by anything the other programs do.
 166
 167 In fact, each opening of a file creates a separate file position.
 168 Thus, if you open a file twice even in the same program, you get two
 169 streams or descriptors with independent file positions.
 170
 171 By contrast, if you open a descriptor and then duplicate it to get
 172 another descriptor, these two descriptors share the same file position:
 173 changing the file position of one descriptor will affect the other.
 174
 175 @node File Names,  , I/O Concepts, I/O Overview
 176 @section File Names
 177
 178 In order to open a connection to a file, or to perform other operations
 179 such as deleting a file, you need some way to refer to the file.  Nearly
 180 all files have names that are strings---even files which are actually
 181 devices such as tape drives or terminals.  These strings are called
 182 @dfn{file names}.  You specify the file name to say which file you want
 183 to open or operate on.
 184
 185 This section describes the conventions for file names and how the
 186 operating system works with them.
 187 @cindex file name
 188
 189 @menu
 190 * Directories::                 Directories contain entries for files.
 191 * File Name Resolution::        A file name specifies how to look up a file.
 192 * File Name Errors::            Error conditions relating to file names.
 193 * File Name Portability::       File name portability and syntax issues.
 194 @end menu
 195
 196
 197 @node Directories, File Name Resolution,  , File Names
 198 @subsection Directories
 199
 200 In order to understand the syntax of file names, you need to understand
 201 how the file system is organized into a hierarchy of directories.
 202
 203 @cindex directory
 204 @cindex link
 205 @cindex directory entry
 206 A @dfn{directory} is a file that contains information to associate other
 207 files with names; these associations are called @dfn{links} or
 208 @dfn{directory entries}.  Sometimes, people speak of ``files in a
 209 directory'', but in reality, a directory only contains pointers to
 210 files, not the files themselves.
 211
 212 @cindex file name component
 213 The name of a file contained in a directory entry is called a @dfn{file
 214 name component}.  In general, a file name consists of a sequence of one
 215 or more such components, separated by the slash character (@samp{/}).  A
 216 file name which is just one component names a file with respect to its
 217 directory.  A file name with multiple components names a directory, and
 218 then a file in that directory, and so on.
 219
 220 Some other documents, such as the POSIX standard, use the term
 221 @dfn{pathname} for what we call a file name, and either @dfn{filename}
 222 or @dfn{pathname component} for what this manual calls a file name
 223 component.  We don't use this terminology because a ``path'' is
 224 something completely different (a list of directories to search), and we
 225 think that ``pathname'' used for something else will confuse users.  We
 226 always use ``file name'' and ``file name component'' (or sometimes just
 227 ``component'', where the context is obvious) in GNU documentation.  Some
 228 macros use the POSIX terminology in their names, such as
 229 @code{PATH_MAX}.  These macros are defined by the POSIX standard, so we
 230 cannot change their names.
 231
 232 You can find more detailed information about operations on directories
 233 in @ref{File System Interface}.
 234
 235 @node File Name Resolution, File Name Errors, Directories, File Names
 236 @subsection File Name Resolution
 237
 238 A file name consists of file name components separated by slash
 239 (@samp{/}) characters.  On the systems that the GNU C library supports,
 240 multiple successive @samp{/} characters are equivalent to a single
 241 @samp{/} character.
 242
 243 @cindex file name resolution
 244 The process of determining what file a file name refers to is called
 245 @dfn{file name resolution}.  This is performed by examining the
 246 components that make up a file name in left-to-right order, and locating
 247 each successive component in the directory named by the previous
 248 component.  Of course, each of the files that are referenced as
 249 directories must actually exist, be directories instead of regular
 250 files, and have the appropriate permissions to be accessible by the
 251 process; otherwise the file name resolution fails.
 252
 253 @cindex root directory
 254 @cindex absolute file name
 255 If a file name begins with a @samp{/}, the first component in the file
 256 name is located in the @dfn{root directory} of the process (usually all
 257 processes on the system have the same root directory).  Such a file name
 258 is called an @dfn{absolute file name}.
 259 @c !!! xref here to chroot, if we ever document chroot. -rm
 260
 261 @cindex relative file name
 262 Otherwise, the first component in the file name is located in the
 263 current working directory (@pxref{Working Directory}).  This kind of
 264 file name is called a @dfn{relative file name}.
 265
 266 @cindex parent directory
 267 The file name components @file{.} (``dot'') and @file{..} (``dot-dot'')
 268 have special meanings.  Every directory has entries for these file name
 269 components.  The file name component @file{.} refers to the directory
 270 itself, while the file name component @file{..} refers to its
 271 @dfn{parent directory} (the directory that contains the link for the
 272 directory in question).  As a special case, @file{..} in the root
 273 directory refers to the root directory itself, since it has no parent;
 274 thus @file{/..} is the same as @file{/}.
 275
 276 Here are some examples of file names:
 277
 278 @table @file
 279 @item /a
 280 The file named @file{a}, in the root directory.
 281
 282 @item /a/b
 283 The file named @file{b}, in the directory named @file{a} in the root directory.
 284
 285 @item a
 286 The file named @file{a}, in the current working directory.
 287
 288 @item /a/./b
 289 This is the same as @file{/a/b}.
 290
 291 @item ./a
 292 The file named @file{a}, in the current working directory.
 293
 294 @item ../a
 295 The file named @file{a}, in the parent directory of the current working
 296 directory.
 297 @end table
 298
 299 @c An empty string may ``work'', but I think it's confusing to
 300 @c try to describe it.  It's not a useful thing for users to use--rms.
 301 A file name that names a directory may optionally end in a @samp{/}.
 302 You can specify a file name of @file{/} to refer to the root directory,
 303 but the empty string is not a meaningful file name.  If you want to
 304 refer to the current working directory, use a file name of @file{.} or
 305 @file{./}.
 306
 307 Unlike some other operating systems, the GNU system doesn't have any
 308 built-in support for file types (or extensions) or file versions as part
 309 of its file name syntax.  Many programs and utilities use conventions
 310 for file names---for example, files containing C source code usually
 311 have names suffixed with @samp{.c}---but there is nothing in the file
 312 system itself that enforces this kind of convention.
 313
 314 @node File Name Errors, File Name Portability, File Name Resolution, File Names
 315 @subsection File Name Errors
 316
 317 @cindex file name errors
 318 @cindex usual file name errors
 319
 320 Functions that accept file name arguments usually detect these
 321 @code{errno} error conditions relating to the file name syntax or
 322 trouble finding the named file.  These errors are referred to throughout
 323 this manual as the @dfn{usual file name errors}.
 324
 325 @table @code
 326 @item EACCES
 327 The process does not have search permission for a directory component
 328 of the file name.
 329
 330 @item ENAMETOOLONG
 331 This error is used when either the total length of a file name is
 332 greater than @code{PATH_MAX}, or when an individual file name component
 333 has a length greater than @code{NAME_MAX}.  @xref{Limits for Files}.
 334
 335 In the GNU system, there is no imposed limit on overall file name
 336 length, but some file systems may place limits on the length of a
 337 component.
 338
 339 @item ENOENT
 340 This error is reported when a file referenced as a directory component
 341 in the file name doesn't exist, or when a component is a symbolic link
 342 whose target file does not exist.  @xref{Symbolic Links}.
 343
 344 @item ENOTDIR
 345 A file that is referenced as a directory component in the file name
 346 exists, but it isn't a directory.
 347
 348 @item ELOOP
 349 Too many symbolic links were resolved while trying to look up the file
 350 name.  The system has an arbitrary limit on the number of symbolic links
 351 that may be resolved in looking up a single file name, as a primitive
 352 way to detect loops.  @xref{Symbolic Links}.
 353 @end table
 354
 355
 356 @node File Name Portability,  , File Name Errors, File Names
 357 @subsection Portability of File Names
 358
 359 The rules for the syntax of file names discussed in @ref{File Names},
 360 are the rules normally used by the GNU system and by other POSIX
 361 systems.  However, other operating systems may use other conventions.
 362
 363 There are two reasons why it can be important for you to be aware of
 364 file name portability issues:
 365
 366 @itemize @bullet
 367 @item
 368 If your program makes assumptions about file name syntax, or contains
 369 embedded literal file name strings, it is more difficult to get it to
 370 run under other operating systems that use different syntax conventions.
 371
 372 @item
 373 Even if you are not concerned about running your program on machines
 374 that run other operating systems, it may still be possible to access
 375 files that use different naming conventions.  For example, you may be
 376 able to access file systems on another computer running a different
 377 operating system over a network, or read and write disks in formats used
 378 by other operating systems.
 379 @end itemize
 380
 381 The @w{ISO C} standard says very little about file name syntax, only that
 382 file names are strings.  In addition to varying restrictions on the
 383 length of file names and what characters can validly appear in a file
 384 name, different operating systems use different conventions and syntax
 385 for concepts such as structured directories and file types or
 386 extensions.  Some concepts such as file versions might be supported in
 387 some operating systems and not by others.
 388
 389 The POSIX.1 standard allows implementations to put additional
 390 restrictions on file name syntax, concerning what characters are
 391 permitted in file names and on the length of file name and file name
 392 component strings.  However, in the GNU system, you do not need to worry
 393 about these restrictions; any character except the null character is
 394 permitted in a file name string, and there are no limits on the length
 395 of file name strings.