* remove "\r" nonsense
[mascara-docs.git] / C / the.ansi.c.programming.language / notes.accompany.ansi.c / sx4m.html
blob35270530f2d0cc02651b069280bdff0a117d39b7
1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
2 <!-- This collection of hypertext pages is Copyright 1995, 1996 by Steve Summit. -->
3 <!-- This material may be freely redistributed and used -->
4 <!-- but may not be republished or sold without permission. -->
5 <html>
6 <head>
7 <link rev="owner" href="mailto:scs@eskimo.com">
8 <link rev="made" href="mailto:scs@eskimo.com">
9 <title>section 1.9: Character Arrays</title>
10 <link href="sx4l.html" rev=precedes>
11 <link href="sx4n.html" rel=precedes>
12 <link href="sx4.html" rev=subdocument>
13 </head>
14 <body>
15 <H2>section 1.9: Character Arrays</H2>
17 <p>Pay attention to the way this program is developed first in ``pseudocode,''
18 and then refined into real C code.
19 A clear pseudocode statement
20 not only makes it easier to think about
21 the structure of the eventual real code,
22 but if you make the eventual real code mimic the pseudocode,
23 the real code will be equally straightforward and easy to read.
24 </p><p>The function <TT>getline</TT>,
25 introduced here,
26 is extremely useful,
27 and we'll have as much use for it in our own programs as the
28 authors do in theirs.
29 (In other words,
30 they have succeeded in their goal of making it
31 ``useful in other contexts.''
32 In fact,
33 I've been using a <TT>getline</TT> function much like this one
34 ever since I learned C from K&amp;R,
35 and I generally find it preferable
36 to the standard library's line-reading function.)
37 </p><p>Pages 28 through 30 introduce quite a lot of material all at once;
38 you'll probably want to read it several times,
39 especially if arrays or character strings are new to you.
40 </p><p>Earlier we said that C provided no particular built-in support
41 for composite objects such as character strings, and here we
42 begin to see the significance of that omission.
43 A string is just an array of characters,
44 and you can access the characters within a string exactly as easily
45 (because you use exactly the same syntax)
46 as you access the elements within any other array.
47 </p><p>If you've used BASIC,
48 you will probably wonder where C's SUBSTR function is.
49 C doesn't have one, for two reasons.
50 First of all, there's less of a need for one,
51 because it's so easy the get at the individual characters
52 within a string in C.
53 More importantly, a SUBSTR function implies that you take a
54 string and extract a substring as a new string.
55 However, creating a new string
56 (i.e. the extracted substring) involves allocating arbitrary
57 amounts of memory to hold the string,
58 and C rarely if ever allocates memory implicitly for you.
59 </p><p>If anything, it's too easy to access the individual characters
60 within strings in C.
61 String handling illustrates one of the potentially frustrating
62 aspects of C we mentioned earlier:
63 the language doesn't define
64 any high-level string handling features for you,
65 so you're free to do whatever low-level string processing you wish.
66 The down side is that constantly manipulating strings down at
67 the character level,
68 and always having to remember to allocate memory for new strings,
69 can get tedious after a while.
70 </p><p>The preceding paragraph is not meant to discourage you,
71 but just to point out a reality:
72 any C program which manipulates strings
73 (and this includes most C programs)
74 will find itself doing a certain amount of character-level fiddling
75 and a certain amount of memory allocation.
77 will also find that it can do just about anything it wants to do
78 (and that its programmer has the patience to do)
79 with the strings it manipulates.
80 </p><p>Since string processing,
81 and at this relatively low level,
82 is so common in C,
83 you'll want to pay careful attention to the discussion
84 on page 30
85 of how strings are stored in character arrays,
86 and particularly to the fact that a <TT>'\0'</TT> character
87 is always present to mark the end of a string.
88 (It's easy to forget to count the <TT>'\0'</TT> character when
89 allocating space for a string, for instance.)
90 Notice the nice picture on page 30;
91 this is a good way of thinking about data structures
92 (and not just simple character arrays, either).
93 </p><p>page 29
94 </p><p>Note that the program explicitly allocates space for the two
95 strings it manipulates:
96 the current line <TT>line</TT>,
97 and the longest line <TT>longest</TT>.
98 (It only needs these two strings at any one time,
99 even though the input consists of arbitrarily many lines.)
100 Note that it cannot simply assign one string to another
101 (because C provides no built-in support
102 for composite objects such as character strings);
103 the program
104 calls the <TT>copy</TT> function to do so.
105 (The authors write their own <TT>copy</TT> function for
106 explanatory purposes;
107 the standard library contains a string-copying function which
108 would normally be used.)
109 The only strings that aren't explicitly allocated are the
110 arrays in the <TT>getline</TT> and <TT>copy</TT> functions;
111 as the discussion briefly mentions,
112 these do not need to be allocated because they're already
113 allocated in the caller.
114 (There are a number of subtleties about array parameters to functions;
115 we'll have more to say about them later.)
116 </p><p>The code on page 29 contains a number of examples
117 of compressed assignments and tests;
118 evidently the authors expect you to get used to this style in a hurry.
119 The line
120 <pre> while ((len = getline(line, MAXLINE)) &gt; 0)
121 </pre>is similar to the <TT>getchar</TT> loops earlier in this chapter;
122 it calls <TT>getline</TT>,
123 saves its return value in the variable <TT>len</TT>,
124 and tests it against 0.
125 </p><p>The comparison
126 <pre> i&lt;lim-1 &amp;&amp; (c=getchar())!=EOF &amp;&amp; c!='\n'
127 </pre>in the <TT>for</TT> loop in the <TT>getline</TT> function does
128 several things:
129 it makes sure there is room for another character in the array;
130 it calls, assigns, and tests <TT>getchar</TT>'s return value
131 against EOF, as before;
132 and it also tests the returned character against <TT>'\n'</TT>,
133 to detect end of line.
134 The surrounding code is mildly clumsy
135 in that it has to check for <TT>\n</TT> a second time;
136 later, when we learn more about loops,
137 we may find a way of writing it more cleanly.
138 You may also notice that the code deals correctly with the
139 possibility that EOF is seen without a <TT>\n</TT>.
140 </p><p>The line
141 <pre> while ((to[i] = from[i]) != '\0')
142 </pre>in the <TT>copy</TT> function does two things at once:
143 it copies characters from the <TT>from</TT> array to the <TT>to</TT> array,
144 and at the same time it compares the copied character against <TT>'\0'</TT>,
145 so that it stops at the end of the string.
146 (If you think this is cryptic,
147 wait 'til we get to page 106 in chapter 5!)
148 </p><p>We've also just learned another <TT>printf</TT>
149 conversion specifier: <TT>%s</TT> prints a string.
150 </p><p>page 30
151 </p><p>Deep sentence:
152 <blockquote>There is no way for a user of <TT>getline</TT>
153 to know in advance how long an input line might be,
154 so <TT>getline</TT> checks for overflow.
155 </blockquote>Because dynamically allocating memory for arbitrary-length
156 strings is mildly tedious in C,
157 it's tempting to use fixed-size arrays.
158 (It's so tempting, in fact,
159 that that's what most programs do,
160 and since fixed-size arrays are also considerably easier to
161 discuss, all of our early example programs will use them.)
162 Using fixed-size arrays is fine,
163 as long as some assurance is made that they don't overflow.
164 Unfortunately, it's also tempting
165 (and easy)
166 to forget to guard against array overflow,
167 perhaps by deluding yourself into thinking that too-long inputs
168 ``can't happen.''
169 Murphy's law says that they do happen,
170 and the various corrolaries to Murphy's law say that they
171 happen in the most unpleasant way and at the least convenient
172 time.
173 Don't be cavalier about arrays;
174 do make sure that they're big enough <em>and</em> that you
175 guard against overflowing them.
176 (In another mark of C's general insensitivity to beginning
177 programmers,
178 most compilers do <em>not</em> check for array overflow;
179 if you write more data to an array than it is declared to hold,
180 you quietly scribble on other parts of memory,
181 usually with disastrous
182 results.)
183 </p><hr>
185 Read sequentially:
186 <a href="sx4l.html" rev=precedes>prev</a>
187 <a href="sx4n.html" rel=precedes>next</a>
188 <a href="sx4.html" rev=subdocument>up</a>
189 <a href="top.html">top</a>
190 </p>
192 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
193 // <a href="copyright.html">Copyright</a> 1995, 1996
194 // <a href="mailto:scs@eskimo.com">mail feedback</a>
195 </p>
196 </body>
197 </html>