* remove "\r" nonsense
[mascara-docs.git] / C / the.ansi.c.programming.language / c.programming.notes / sx10h.html
blob3c5d7df28fc5836462c71f2e864f454d038837e4
1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
2 <!-- This collection of hypertext pages is Copyright 1995, 1996 by Steve Summit. -->
3 <!-- This material may be freely redistributed and used -->
4 <!-- but may not be republished or sold without permission. -->
5 <html>
6 <head>
7 <link rev="owner" href="mailto:scs@eskimo.com">
8 <link rev="made" href="mailto:scs@eskimo.com">
9 <title>10.8 Example: Breaking a Line into ``Words''</title>
10 <link href="sx10g.html" rev=precedes>
11 <link href="sx11.html" rel=precedes>
12 <link href="sx10.html" rev=subdocument>
13 </head>
14 <body>
15 <H2>10.8 Example: Breaking a Line into ``Words''</H2>
17 <p>In
18 an earlier
19 assignment,
20 an ``extra credit'' version of a problem
21 asked you to write a little checkbook balancing program
22 that accepted a series of lines of the form
23 <pre>
24 deposit 1000
25 check 10
26 check 12.34
27 deposit 50
28 check 20
29 </pre>
30 It was a surprising nuisance to do this
31 in an <I>ad hoc</I> way,
32 using only the tools we
33 had at the time.
34 It was easy to read each line, but it was cumbersome to break
35 it up into the word (``deposit'' or ``check'')
36 and the amount.
37 </p><p>I find it very convenient
39 use a more general approach:
40 first,
41 break lines like these into
42 a series of
43 whitespace-separated words,
44 then deal with each word separately.
45 To do this,
46 we will use an <em>array of pointers to</em> <TT>char</TT>,
47 which we can also think of as an ``array of strings,''
48 since a string is an array of <TT>char</TT>,
49 and a pointer-to-<TT>char</TT> can easily point at a string.
50 Here is the declaration of such an array:
51 <pre>
52 char *words[10];
53 </pre>
54 This is the first complicated C declaration we've seen:
55 it says that <TT>words</TT> is an array of 10 pointers to <TT>char</TT>.
56 We're going to write a function,
57 <TT>getwords</TT>,
58 which we can call like this:
59 <pre>
60 int nwords;
61 nwords = getwords(line, words, 10);
62 </pre>
63 where <TT>line</TT> is the line we're breaking into words,
64 <TT>words</TT> is the array to be filled in with the
65 (pointers to the)
66 words,
67 and <TT>nwords</TT> (the return value from <TT>getwords</TT>)
68 is the number of words which the function finds.
69 (As with <TT>getline</TT>,
70 we tell the function the size of the array
71 so that if the line should happen to contain more words than that,
72 it won't overflow the array).
73 </p><p>Here is the definition of the
74 <TT>getwords</TT>
75 function.
77 finds the beginning of each word,
78 places a pointer to it in the array,
79 finds the end of that word
80 (which is signified by at least one whitespace character)
81 and terminates the word by
82 placing a <TT>'\0'</TT> character after it.
83 (The <TT>'\0'</TT> character will overwrite
84 the first whitespace character following the word.)
85 Note that the original input string is therefore modified by <TT>getwords</TT>:
86 if you were to try to print the input line after calling <TT>getwords</TT>,
87 it would appear to contain only its first word
88 (because of the first inserted <TT>'\0'</TT>).
89 <pre>
90 #include &lt;stddef.h&gt;
91 #include &lt;ctype.h&gt;
93 getwords(char *line, char *words[], int maxwords)
95 char *p = line;
96 int nwords = 0;
98 while(1)
100 while(isspace(*p))
101 p++;
103 if(*p == '\0')
104 return nwords;
106 words[nwords++] = p;
108 while(!isspace(*p) &amp;&amp; *p != '\0')
109 p++;
111 if(*p == '\0')
112 return nwords;
114 *p++ = '\0';
116 if(nwords &gt;= maxwords)
117 return nwords;
120 </pre>
121 Each time through the outer <TT>while</TT> loop,
122 the function tries to find another word.
123 First it skips over whitespace
124 (which might be leading spaces on the line,
125 or the space(s) separating this word from the previous one).
126 The <TT>isspace</TT> function is
127 new:
128 it's in the standard library,
129 declared in the header file <TT>&lt;ctype.h&gt;</TT>,
130 and it returns nonzero (``true'')
131 if the character you hand it is a space character
132 (a space or a tab, or any other whitespace character
133 there might happen to be).
134 </p><p>When the function finds a non-whitespace character,
135 it has found the beginning of another word,
136 so it places the pointer
137 to that character
138 in the next cell of the <TT>words</TT> array.
139 Then it steps though the word,
140 looking at non-whitespace characters,
141 until it finds another whitespace character,
142 or the <TT>\0</TT> at the end of the line.
143 If it finds the <TT>\0</TT>,
144 it's done
145 with the entire line;
146 otherwise,
147 it changes the whitespace character to a <TT>\0</TT>,
148 to terminate the word it's just found,
149 and continues.
150 (If it's found as many words as will fit in the <TT>words</TT> array,
151 it returns prematurely.)
152 </p><p>Each time it finds a word,
153 the function increments the number of words (<TT>nwords</TT>)
154 it has found.
155 Since arrays
156 in C
157 start at <TT>[0]</TT>,
158 the number of words the function has found so far
159 is also the index of the cell in the <TT>words</TT> array
160 where the next word should be stored.
161 The function actually assigns the next word and increments
162 <TT>nwords</TT> in one expression:
163 <pre>
164 words[nwords++] = p;
165 </pre>
166 You should convince yourself that this arrangement works,
167 and that (in this case)
168 the preincrement form
169 <pre>
170 words[++nwords] = p; /* WRONG */
171 </pre>
172 would <em>not</em> behave as desired.
173 </p><p>When the function is done
174 (when it finds the <TT>\0</TT> terminating the input line,
175 or when it runs out of cells in the <TT>words</TT> array)
176 it returns the number of words it has found.
177 </p><p>Here is a complete example
178 of calling <TT>getwords</TT>:
179 <pre>
180 char line[] = "this is a test";
181 int i;
183 nwords = getwords(line, words, 10);
184 for(i = 0; i &lt; nwords; i++)
185 printf("%s\n", words[i]);
186 </pre>
187 </p><p></p><hr>
189 Read sequentially:
190 <a href="sx10g.html" rev=precedes>prev</a>
191 <a href="sx11.html" rel=precedes>next</a>
192 <a href="sx10.html" rev=subdocument>up</a>
193 <a href="top.html">top</a>
194 </p>
196 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
197 // <a href="copyright.html">Copyright</a> 1995, 1996
198 // <a href="mailto:scs@eskimo.com">mail feedback</a>
199 </p>
200 </body>
201 </html>