* better
[mascara-docs.git] / lang / C / the.ansi.c.programming.language / c.programming.notes / sx6b.html
blob192bdb147acde53253f14f3553a19b09d336d3f7
1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
2 <!-- This collection of hypertext pages is Copyright 1995-7 by Steve Summit. -->
3 <!-- This material may be freely redistributed and used -->
4 <!-- but may not be republished or sold without permission. -->
5 <html>
6 <head>
7 <link rev="owner" href="mailto:scs@eskimo.com">
8 <link rev="made" href="mailto:scs@eskimo.com">
9 <title>6.2 Character Input and Output</title>
10 <link href="sx6a.html" rev=precedes>
11 <link href="sx6c.html" rel=precedes>
12 <link href="sx6.html" rev=subdocument>
13 </head>
14 <body>
15 <H2>6.2 Character Input and Output</H2>
17 <p>[This section corresponds to K&amp;R Sec. 1.5]
18 </p><p>Unless a program can read some input,
19 it's hard to keep it from doing exactly the same thing every time it's run,
20 and thus being rather boring after a while.
21 </p><p>The most basic way of reading input is by calling the function
22 <TT>getchar</TT>.
23 <TT>getchar</TT> reads one character from the ``standard input,''
24 which is usually the user's keyboard, but which can sometimes
25 be redirected by the operating system.
26 <TT>getchar</TT> returns (rather obviously) the character it reads,
27 or, if there are no more characters available,
28 the
29 special value
30 <TT>EOF</TT> (``end of file'').
31 </p><p>A companion function is <TT>putchar</TT>, which writes one
32 character to the ``standard output.''
33 (The standard output is,
34 again not surprisingly,
35 usually the user's screen, although it, too, can be redirected.
36 <TT>printf</TT>, like <TT>putchar</TT>, prints to the
37 standard output;
38 in fact, you can imagine that <TT>printf</TT> calls
39 <TT>putchar</TT> to actually print each of the characters it
40 formats.)
41 </p><p>Using these two functions, we can write a very basic program
42 to copy the input,
43 a character at a time,
44 to the output:
46 <pre>
47 #include &lt;stdio.h&gt;
49 /* copy input to output */
51 main()
53 int c;
55 c = getchar();
57 while(c != EOF)
59 putchar(c);
60 c = getchar();
63 return 0;
65 </pre>
66 </p><p>This code is straightforward,
67 and I encourage you to type it in and try it out.
68 It reads one character,
69 and if it is not the <TT>EOF</TT> code,
70 enters a <TT>while</TT> loop,
71 printing one character and reading another,
72 as long as the character read is not <TT>EOF</TT>.
73 This is a straightforward loop,
74 although there's one mystery
75 surrounding the declaration of the variable <TT>c</TT>:
76 if it holds characters, why is it an <TT>int</TT>?
77 </p><p>We said
79 that a <TT>char</TT> variable could hold integers
80 corresponding to character set values,
81 and that an <TT>int</TT> could hold integers of more
82 arbitrary values
83 (up to +-32767).
84 Since most character sets contain a few hundred characters
85 (nowhere near 32767),
86 an <TT>int</TT> variable can
87 in general
88 comfortably hold all <TT>char</TT> values, and then some.
89 Therefore, there's nothing wrong with declaring
90 <TT>c</TT>
91 as an <TT>int</TT>.
92 But in fact,
93 it's important to do so,
94 because <TT>getchar</TT> can return every character value,
95 <em>plus</em> that special, non-character value <TT>EOF</TT>,
96 indicating that there are no more characters.
97 Type <TT>char</TT> is only guaranteed to be able to hold all
98 the character values;
99 it is <em>not</em> guaranteed to be able to hold this
100 ``no more characters'' value without possibly mixing it up with
101 some actual character value.
102 (It's like trying to cram five pounds of books into a four-pound box,
104 or 13 eggs into a carton that holds a dozen.)
105 Therefore,
106 you should always
107 remember to use an <TT>int</TT> for anything you assign
108 <TT>getchar</TT>'s return value to.
109 </p><p>When you run the character copying program,
110 and it begins copying its input (your typing) to its output (your screen),
111 you may find yourself wondering how to stop it.
112 It stops when it receives end-of-file (EOF), but how do you send EOF?
113 The answer depends on what kind of computer you're using.
114 On Unix and Unix-related systems, it's almost always control-D.
115 On MS-DOS machines, it's control-Z followed by the RETURN key.
116 Under Think C on the Macintosh, it's control-D, just like Unix.
117 On other systems, you may have to do some research to learn how to send EOF.
118 </p><p>(Note, too, that
119 the character you type to generate an end-of-file condition from the keyboard
120 is <em>not</em> the same as
121 the special <TT>EOF</TT> value returned by <TT>getchar</TT>.
122 The <TT>EOF</TT> value returned by <TT>getchar</TT> is a code
123 indicating that the input system has detected an end-of-file condition,
124 whether it's reading the keyboard or a file
125 or a magnetic tape or a network connection or anything else.
126 In a disk file, at least,
127 there is not likely to be any character <em>in</em> the file
128 corresponding to <TT>EOF</TT>;
129 as far as your program is concerned,
130 <TT>EOF</TT> indicates the absence of any more characters to read.)
131 </p><p>Another excellent thing to know when doing any kind of programming
132 is how to terminate a runaway program.
133 If a program is running forever waiting for input,
134 you can usually stop it by sending it an end-of-file, as above,
135 but if it's running forever <em>not</em> waiting for something,
136 you'll have to take more drastic measures.
137 Under Unix, control-C
138 (or, occasionally, the DELETE key)
139 will terminate the current program,
140 almost no matter what.
141 Under MS-DOS, control-C or control-BREAK will sometimes
142 terminate the current program,
143 but by default MS-DOS only checks for control-C when it's looking for input,
144 so an infinite loop can be unkillable.
145 There's a DOS command,
146 <pre>
147 break on
148 </pre>
149 which tells DOS to look for control-C more often,
150 and I recommend using this command if you're doing any programming.
151 (If a program is in a really tight infinite loop under MS-DOS,
152 there can be no way of killing it short of rebooting.)
154 On the Mac, try command-period or command-option-ESCAPE.
155 </p><p>Finally, don't be disappointed
156 (as I was)
157 the first time you run
158 the character copying program.
159 You'll type a character, and see it on the screen right away,
160 and assume it's your program working,
161 but it's only your computer echoing every key you type,
162 as it always does.
163 When you hit RETURN,
164 a full line of characters is made available to your program.
165 It then zips several times through its loop,
166 reading and printing all the characters in the line in quick succession.
167 In other words, when you run this program,
168 it will probably seem to copy the input a line at a time,
169 rather than a character at a time.
170 You may wonder how a program
171 could instead
172 read a character right away,
173 without waiting for the user to hit RETURN.
174 That's an excellent question,
175 but unfortunately the answer is rather complicated,
177 beyond the scope of
178 our discussion here.
179 (Among other things,
180 how to read a character right away is one of the things that's
181 not defined by the C language,
182 and it's not defined by any of the standard library functions,
183 either.
184 How to do it depends on which operating system you're using.)
185 </p><p>Stylistically,
186 the character-copying program above
187 can be said to have
188 one minor flaw:
189 it contains two calls to <TT>getchar</TT>,
190 one which reads the first character and one which reads
191 (by virtue of the fact that it's in the body of the loop) all
192 the other characters.
193 This seems inelegant and perhaps unnecessary, and it can also
194 be risky:
195 if there were more things going on within the loop, and if we
196 ever changed the way we read characters, it would be easy to
197 change one of the <TT>getchar</TT> calls but forget to change
198 the other one.
199 Is there a way to rewrite the loop so that there is only one
200 call to <TT>getchar</TT>, responsible for reading all the
201 characters?
202 Is there a way to read a character, test it for <TT>EOF</TT>, and assign
203 it to the variable <TT>c</TT>, all at the same time?
205 </p><p>There is.
206 It relies on the fact that the assignment operator, <TT>=</TT>,
207 is just another operator in C.
208 An assignment is not
209 (necessarily) a standalone statement;
210 it is an expression, and it has a value
211 (the value that's assigned to the variable on the left-hand side),
212 and it can therefore participate in a larger, surrounding expression.
213 Therefore, most C programmers would write the character-copying
214 loop like this:
215 <pre>
216 while((c = getchar()) != EOF)
217 putchar(c);
218 </pre>
219 What does this mean?
220 The function <TT>getchar</TT> is called, as before,
221 and its return value is assigned to the variable <TT>c</TT>.
222 Then the value is immediately compared against the value <TT>EOF</TT>.
223 Finally, the true/false value of the comparison controls the
224 <TT>while</TT> loop: as long as the value is not <TT>EOF</TT>,
225 the loop continues executing, but as soon as an <TT>EOF</TT> is received,
226 no more trips through the loop are taken, and it exits.
227 The net result
228 is that the call to <TT>getchar</TT> happens inside
229 the test at the top of the <TT>while</TT> loop,
230 and doesn't have to be repeated before the loop and within the loop
231 (more on this in a bit).
232 </p><p>Stated another way, the syntax of a <TT>while</TT> loop is always
233 <pre>
234 while( <I>expression</I> ) ...
235 </pre>
236 A comparison (using the <TT>!=</TT> operator)
237 is of course an expression;
238 the syntax is
239 <pre>
240 <I>expression</I> != <I>expression</I>
241 </pre>
242 And an assignment is an expression;
243 the syntax is
244 <pre>
245 <I>expression</I> = <I>expression</I>
246 </pre>
247 What we're seeing is just another example of
248 the fact that expressions can be combined with essentially
249 limitless generality
250 and therefore
251 infinite variety.
252 The left-hand side of the <TT>!=</TT> operator
253 (its first <I>expression</I>)
254 is the (sub)expression <TT>c = getchar()</TT>,
255 and the combined expression is the <I>expression</I>
256 needed by the <TT>while</TT> loop.
257 </p><p>The extra parentheses around
258 <pre>
259 (c = getchar())
260 </pre>
261 are important,
262 and are there because
263 because the <dfn>precedence</dfn>
264 of the <TT>!=</TT> operator is higher than
265 that of the <TT>=</TT> operator.
266 If we (incorrectly) wrote
267 <pre>
268 while(c = getchar() != EOF) /* WRONG */
269 </pre>
270 the compiler would interpret it as
271 <pre>
272 while(c = (getchar() != EOF))
273 </pre>
274 That is,
276 would assign the result of the <TT>!=</TT> operator
277 to the variable <TT>c</TT>,
278 which is <em>not</em> what we want.
279 </p><p>(``Precedence'' refers to the rules for
280 which operators are applied to their operands
281 in which order,
282 that is,
283 to the rules controlling the default grouping
284 of expressions and subexpressions.
285 For example,
286 the multiplication operator <TT>*</TT> has higher precedence
287 than the addition operator <TT>+</TT>,
288 which means that the expression <TT>a + b * c</TT>
289 is parsed as <TT>a + (b * c)</TT>.
290 We'll have more to say about precedence later.)
291 </p><p>The line
292 <pre>
293 while((c = getchar()) != EOF)
294 </pre>
295 epitomizes the cryptic brevity which C is notorious for.
296 You may find this terseness infuriating
297 (and you're not alone!),
298 and it can certainly be carried too far,
299 but bear with me for a moment while I defend it.
300 </p><p>The simple example
301 we've been discussing
302 illustrates the tradeoffs
303 well.
304 We have four things to do:
305 <OL><li>call <TT>getchar</TT>,
306 <li>assign its return value to a variable,
307 <li>test the return value against <TT>EOF</TT>,
309 <li>process the character
310 (in this case, print it out again).
311 </OL>We can't eliminate any of these steps.
312 We have to assign <TT>getchar</TT>'s value to a variable
313 (we can't just use it directly)
314 because we have to do two different things with it
315 (test, and print).
316 Therefore, compressing the assignment and test into the same line
317 is the only
319 good
320 way of avoiding two distinct calls to <TT>getchar</TT>.
321 You may not agree that the compressed idiom is better for being
322 more compact or easier to read,
323 but the fact that there is now only one call to
324 <TT>getchar</TT> <em>is</em> a real virtue.
325 </p><p>Don't think that you'll have to write compressed lines like
326 <pre>
327 while((c = getchar()) != EOF)
328 </pre>
329 right away, or in order to be an ``expert C programmer.''
330 But, for better or worse, most experienced C programmers do
331 like to use these idioms
332 (whether they're justified or not),
333 so you'll need to be able to at least recognize and understand
334 them when you're reading other peoples' code.
335 </p><hr>
337 Read sequentially:
338 <a href="sx6a.html" rev=precedes>prev</a>
339 <a href="sx6c.html" rel=precedes>next</a>
340 <a href="sx6.html" rev=subdocument>up</a>
341 <a href="top.html">top</a>
342 </p>
344 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
345 // <a href="copyright.html">Copyright</a> 1995-1997
346 // <a href="mailto:scs@eskimo.com">mail feedback</a>
347 </p>
348 </body>
349 </html>