C/the.ansi.c.programming.language/c.programming.notes.int/sx2f.html

   1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
   2 <!-- This collection of hypertext pages is Copyright 1995-7 by Steve Summit. -->
   3 <!-- This material may be freely redistributed and used -->
   4 <!-- but may not be republished or sold without permission. -->
   5 <html>
   6 <head>
   7 <link rev="owner" href="mailto:scs@eskimo.com">
   8 <link rev="made" href="mailto:scs@eskimo.com">
   9 <title>16.6: Formatted Input (<TT>scanf</TT>)</title>
  10 <link href="sx2e.html" rev=precedes>
  11 <link href="sx2g.html" rel=precedes>
  12 <link href="sx2.html" rev=subdocument>
  13 </head>
  14 <body>
  15 <H2>16.6: Formatted Input (<TT>scanf</TT>)</H2>
  16
  17 <p>Just as <TT>putchar</TT> has its <TT>getchar</TT>
  18 and <TT>fputs</TT> has its <TT>fgets</TT>,
  19 there's an input analog to <TT>printf</TT>,
  20 namely <TT>scanf</TT>.
  21 <TT>scanf</TT> reads characters
  22 from standard input,
  23 under control of a format string,
  24 perhaps converting some components of the string
  25 and storing them into variables.
  26 For example,
  27 just as you could use the call
  28 <pre>
  29         printf("(%d, %d)", x, y);
  30 </pre>
  31 to print two integer values and some surrounding punctuation,
  32 you could use the call
  33 <pre>
  34         scanf("(%d, %d)", &amp;x, &amp;y);
  35 </pre>
  36 to attempt to extract two integer values
  37 from some input containing similar punctuation.
  38 </p><p><TT>scanf</TT> interprets a format string,
  39 much like <TT>printf</TT>,
  40 with the first difference being
  41 that <TT>scanf</TT> attempts to read characters
  42 and match them against the format string,
  43 rather than printing under control of the format string.
  44 For each ordinary character in the format string,
  45 <TT>scanf</TT> expects to see that character on the input;
  46 if not, it fails.
  47 For each format specifier in the input string,
  48 <TT>scanf</TT> attempts to match and convert
  49 a string appropriate to the format specifier,
  50 storing the converted result into a variable
  51 pointed to by the corresponding argument.
  52 If it can't find any characters matching the format specifier,
  53 it fails.
  54 </p><p>Since <TT>scanf</TT> ``returns'' many values
  55 (one for each format specifier in the format string),
  56 it must do so using pointers which the caller passes.
  57 For each value to be converted,
  58 the caller passes a pointer to the variable
  59 (or other location)
  60 where <TT>scanf</TT> should write the converted value.
  61 All arguments passed to <TT>scanf</TT> must be pointers.
  62 </p><p>The format strings used by <TT>scanf</TT>
  63 are similar to those used by <TT>printf</TT>,
  64 but there are several differences.
  65 </p><p>The optional <I>width</I>
  66 gives the maximum number of characters to read
  67 while performing the conversion requested by a particular format specifier.
  68 (If there are many adjacent characters which could satisfy
  69 a request--many
  70 digits for one of the numeric conversions,
  71 or many characters for <TT>%s</TT>
  72 conversion--the
  73 <I>width</I> keeps <TT>scanf</TT> from gobbling all of them up at once.)
  74 </p><p>There is no equivalent to the <I>precision</I> modifier.
  75 </p><p>If the <TT>*</TT> flag appears,
  76 it indicates that the converted value should be discarded,
  77 not written to a location
  78 pointed to by one
  79
  80 of the pointers in the argument list.
  81 (In other words,
  82 there is no corresponding argument.)
  83 Since <TT>*</TT> is usurped for this function,
  84 there is no way to use a variable field width
  85 from the argument list
  86 with <TT>scanf</TT>.
  87 There are no other <I>flags</I>.
  88 </p><p>The <I>modifier</I> characters are more significant.
  89 An <TT>h</TT> indicates that the corresponding integer pointer argument
  90 (for <TT>%d</TT>, <TT>%u</TT>, <TT>%o</TT>, or <TT>%x</TT>)
  91 is a <TT>short int *</TT> or <TT>unsigned short int *</TT>.
  92 An <TT>l</TT> indicates that the corresponding integer pointer argument
  93 (for <TT>%d</TT>, <TT>%u</TT>, <TT>%o</TT>, or <TT>%x</TT>)
  94 is a <TT>long int *</TT> or
  95 <TT>unsigned long int *</TT>,
  96 or that the floating-point pointer argument
  97 (for <TT>%e</TT>, <TT>%f</TT>, or <TT>%g</TT>)
  98 is a <TT>double *</TT> rather than a <TT>float *</TT>.
  99 (Similarly,
 100 an <TT>L</TT> indicates a <TT>long double *</TT>.)
 101 </p><p>The <TT>%c</TT> format will read more than one character
 102 if an explicit <I>width</I> greater than 1 is specified.
 103 The corresponding argument must be a pointer to enough space
 104 to hold all the characters read.
 105 </p><p>The <TT>%e</TT>, <TT>%f</TT>, and <TT>%g</TT> formats
 106 all read strings in either scientific notation
 107 or conventional decimal fraction <TT>m.n</TT> notation.
 108 (In other words,
 109 the three formats
 110 act
 111 just
 112 the same.)
 113 However,
 114 they assume a <TT>float *</TT> argument
 115 unless the <TT>l</TT> modifier appears,
 116 in which case they expect a <TT>double *</TT>.
 117 (This is in contrast to <TT>printf</TT>,
 118 which accepts either <TT>float</TT> or <TT>double</TT> arguments
 119 for <TT>%e</TT>, <TT>%f</TT>, and <TT>%g</TT>,
 120 due to the default argument promotions.)
 121 </p><p>The <TT>%i</TT> format
 122 will read a number in decimal, octal, or hexadecimal,
 123 taking a leading <TT>0</TT> to indicate octal
 124 and a leading <TT>0x</TT> (or <TT>0X</TT>) to indicate hexadecimal,
 125 i.e. the same rules as used by C constants.
 126 </p><p>The <TT>%n</TT> format causes the number of characters read so far
 127 (by this call to <TT>scanf</TT>)
 128 to be stored in the integer pointed to by the corresponding argument.
 129 </p><p>The <TT>%s</TT> format will read a string,
 130 up to the next whitespace character,
 131 and copy the string,
 132 terminated by a <TT>\0</TT>,
 133 to the corresponding argument,
 134 which must be a <TT>char *</TT>.
 135 The caller must ensure (perhaps by using an explicit <I>width</I>)
 136 that there is enough space to hold the received characters.
 137 </p><p><TT>scanf</TT> has a special format specifier <TT>%[</TT>...<TT>]</TT>,
 138 which matches any string composed of characters specified in the <TT>[]</TT>.
 139 For example,
 140 <TT>%[abc]</TT>
 141 would match any string composed of a's, b's, and c's.
 142 The corresponding argument is a <TT>char *</TT>;
 143 the matched string is written to the location pointed to,
 144 followed by a <TT>\0</TT>.
 145 The caller must ensure
 146 (perhaps by using an explicit <I>width</I>)
 147 that there is enough space to hold the received characters.
 148 A second form,
 149 <TT>%[^</TT>...<TT>]</TT>,
 150 matches a string of characters <em>not</em> found in the set.
 151 For example,
 152 <TT>scanf("(%[^)])", s)</TT> reads, into the string <TT>s</TT>,
 153 a string of characters (possibly including whitespace)
 154 from an input in which the string appears enclosed in parentheses.
 155 It may also be possible to specify ranges of characters
 156 (e.g. <TT>%[a-z]</TT>, <TT>%[0-9]</TT>, etc.),
 157 but these are not as portable.
 158 </p><p>With the exception of <TT>%c</TT>, <TT>%n</TT>, and <TT>%[</TT>,
 159 all of the conversion specifiers skip any leading whitespace
 160 (spaces, tabs, or newlines)
 161 which might precede the value or string converted.
 162 Also,
 163 any whitespace character in the format string
 164 matches any number of whitespace characters in the input.
 165 Therefore,
 166 the format <TT>"%d %d"</TT>
 167 would match the input <TT>"12 34"</TT>
 168 or <TT>"12    34"</TT>
 169 or <TT>"12\t34"</TT>.
 170 However,
 171 the format <TT>"%d%d"</TT> would match all of these inputs as well,
 172 since the second <TT>%d</TT> first
 173
 174 scans past any whitespace preceding the <TT>34</TT>.
 175 </p><p><TT>scanf</TT> returns the number of items
 176 it successfully converts and stores.
 177 It will return a number less than expected
 178 (less than the number of format specifiers not containing <TT>*</TT>,
 179 or less than the number of corresponding pointer arguments)
 180 if the conversion fails at any point,
 181 and it will leave any unrecognized characters
 182 (i.e. the ones that caused the last match to fail)
 183 waiting in the input for next time.
 184 <TT>scanf</TT> returns <TT>EOF</TT>
 185 if it encounters end-of-file before converting anything.
 186 </p><p>If you want to read characters from an arbitrary stream,
 187 you can use <TT>fscanf</TT>,
 188 which takes an initial <TT>FILE *</TT> argument.
 189 </p><p>You can scan and convert characters from a string
 190 (rather than from a stream)
 191 using <TT>sscanf</TT>.
 192 For example,
 193 <pre>
 194         int x, y;
 195         sscanf("12 34", "%d %d", &amp;x, &amp;y);
 196 </pre>
 197 would place 12 in <TT>x</TT> and 34 in <TT>y</TT>.
 198 </p><p><TT>scanf</TT> and <TT>fscanf</TT> are seductively useful,
 199 but they have a number of drawbacks in practice.
 200 They seem to make it very easy to,
 201 say,
 202 prompt the user for a number:
 203 <pre>
 204         int x;
 205         printf("Type a number:\n");
 206         scanf("%d", &amp;x);
 207 </pre>
 208 But what happens if the user fumbles,
 209 and types something other than a number?
 210 Even if the code checks <TT>scanf</TT>'s return value,
 211 and prompts the user again if <TT>scanf</TT> returns 0,
 212 the non-numeric input remains on the input,
 213 and will be encountered by the next call to <TT>scanf</TT>
 214 unless some other steps are taken.
 215 (That is,
 216 <TT>scanf</TT> will rediscover the user's old, bad input
 217 before it gets to any new input.)
 218 It's also easy to write things like
 219 <pre>
 220         scanf("%d\n", &amp;x);
 221 </pre>
 222 but this code does <em>not</em> work as intended;
 223 the <TT>\n</TT> in the format string is a whitespace character,
 224 which asks <TT>scanf</TT> to discard one or more whitespace characters,
 225 so it will <em>keep reading</em> characters
 226 as long as they are whitespace characters,
 227 that is,
 228 it will read characters
 229 until it finds something that is not a whitespace character.
 230 It won't read that eventual whitespace character once it finds it,
 231 but in the process of looking for it
 232 it will seem to jam your program,
 233 since the call to <TT>scanf</TT> won't return
 234 right after the user types a number.
 235
 236 </p><p>Therefore,
 237 it's much better to read interactive user input
 238 a line at a time,
 239 and then use functions like <TT>atoi</TT>
 240 (or perhaps <TT>sscanf</TT>)
 241 to interpret the line that the user typed.
 242
 243 </p><hr>
 244 <p>
 245 Read sequentially:
 246 <a href="sx2e.html" rev=precedes>prev</a>
 247 <a href="sx2g.html" rel=precedes>next</a>
 248 <a href="sx2.html" rev=subdocument>up</a>
 249 <a href="top.html">top</a>
 250 </p>
 251 <p>
 252 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
 253 // <a href="copyright.html">Copyright</a> 1996-1999
 254 // <a href="mailto:scs@eskimo.com">mail feedback</a>
 255 </p>
 256 </body>
 257 </html>