C/the.ansi.c.programming.language/c.programming.notes/sx10h.html

   1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
   2 <!-- This collection of hypertext pages is Copyright 1995, 1996 by Steve Summit. -->
   3 <!-- This material may be freely redistributed and used -->
   4 <!-- but may not be republished or sold without permission. -->
   5 <html>
   6 <head>
   7 <link rev="owner" href="mailto:scs@eskimo.com">
   8 <link rev="made" href="mailto:scs@eskimo.com">
   9 <title>10.8 Example: Breaking a Line into ``Words''</title>
  10 <link href="sx10g.html" rev=precedes>
  11 <link href="sx11.html" rel=precedes>
  12 <link href="sx10.html" rev=subdocument>
  13 </head>
  14 <body>
  15 <H2>10.8 Example: Breaking a Line into ``Words''</H2>
  16
  17 <p>In
  18 an earlier
  19 assignment,
  20 an ``extra credit'' version of a problem
  21 asked you to write a little checkbook balancing program
  22 that accepted a series of lines of the form
  23 <pre>
  24         deposit 1000
  25         check 10
  26         check 12.34
  27         deposit 50
  28         check 20
  29 </pre>
  30 It was a surprising nuisance to do this
  31 in an <I>ad hoc</I> way,
  32 using only the tools we
  33 had at the time.
  34 It was easy to read each line, but it was cumbersome to break
  35 it up into the word (``deposit'' or ``check'')
  36 and the amount.
  37 </p><p>I find it very convenient
  38 to
  39 use a more general approach:
  40 first,
  41 break lines like these into
  42 a series of
  43 whitespace-separated words,
  44 then deal with each word separately.
  45 To do this,
  46 we will use an <em>array of pointers to</em> <TT>char</TT>,
  47 which we can also think of as an ``array of strings,''
  48 since a string is an array of <TT>char</TT>,
  49 and a pointer-to-<TT>char</TT> can easily point at a string.
  50 Here is the declaration of such an array:
  51 <pre>
  52         char *words[10];
  53 </pre>
  54 This is the first complicated C declaration we've seen:
  55 it says that <TT>words</TT> is an array of 10 pointers to <TT>char</TT>.
  56 We're going to write a function,
  57 <TT>getwords</TT>,
  58 which we can call like this:
  59 <pre>
  60         int nwords;
  61         nwords = getwords(line, words, 10);
  62 </pre>
  63 where <TT>line</TT> is the line we're breaking into words,
  64 <TT>words</TT> is the array to be filled in with the
  65 (pointers to the)
  66 words,
  67 and <TT>nwords</TT> (the return value from <TT>getwords</TT>)
  68 is the number of words which the function finds.
  69 (As with <TT>getline</TT>,
  70 we tell the function the size of the array
  71 so that if the line should happen to contain more words than that,
  72 it won't overflow the array).
  73 </p><p>Here is the definition of the
  74 <TT>getwords</TT>
  75 function.
  76 It
  77 finds the beginning of each word,
  78 places a pointer to it in the array,
  79 finds the end of that word
  80 (which is signified by at least one whitespace character)
  81 and terminates the word by
  82 placing a <TT>'\0'</TT> character after it.
  83 (The <TT>'\0'</TT> character will overwrite
  84 the first whitespace character following the word.)
  85 Note that the original input string is therefore modified by <TT>getwords</TT>:
  86 if you were to try to print the input line after calling <TT>getwords</TT>,
  87 it would appear to contain only its first word
  88 (because of the first inserted <TT>'\0'</TT>).
  89 <pre>
  90 #include &lt;stddef.h&gt;
  91 #include &lt;ctype.h&gt;
  92
  93 getwords(char *line, char *words[], int maxwords)
  94 {
  95 char *p = line;
  96 int nwords = 0;
  97
  98 while(1)
  99         {
 100         while(isspace(*p))
 101                 p++;
 102
 103         if(*p == '\0')
 104                 return nwords;
 105
 106         words[nwords++] = p;
 107
 108         while(!isspace(*p) &amp;&amp; *p != '\0')
 109                 p++;
 110
 111         if(*p == '\0')
 112                 return nwords;
 113
 114         *p++ = '\0';
 115
 116         if(nwords &gt;= maxwords)
 117                 return nwords;
 118         }
 119 }
 120 </pre>
 121 Each time through the outer <TT>while</TT> loop,
 122 the function tries to find another word.
 123 First it skips over whitespace
 124 (which might be leading spaces on the line,
 125 or the space(s) separating this word from the previous one).
 126 The <TT>isspace</TT> function is
 127 new:
 128 it's in the standard library,
 129 declared in the header file <TT>&lt;ctype.h&gt;</TT>,
 130 and it returns nonzero (``true'')
 131 if the character you hand it is a space character
 132 (a space or a tab, or any other whitespace character
 133 there might happen to be).
 134 </p><p>When the function finds a non-whitespace character,
 135 it has found the beginning of another word,
 136 so it places the pointer
 137 to that character
 138 in the next cell of the <TT>words</TT> array.
 139 Then it steps though the word,
 140 looking at non-whitespace characters,
 141 until it finds another whitespace character,
 142 or the <TT>\0</TT> at the end of the line.
 143 If it finds the <TT>\0</TT>,
 144 it's done
 145 with the entire line;
 146 otherwise,
 147 it changes the whitespace character to a <TT>\0</TT>,
 148 to terminate the word it's just found,
 149 and continues.
 150 (If it's found as many words as will fit in the <TT>words</TT> array,
 151 it returns prematurely.)
 152 </p><p>Each time it finds a word,
 153 the function increments the number of words (<TT>nwords</TT>)
 154 it has found.
 155 Since arrays
 156 in C
 157 start at <TT>[0]</TT>,
 158 the number of words the function has found so far
 159 is also the index of the cell in the <TT>words</TT> array
 160 where the next word should be stored.
 161 The function actually assigns the next word and increments
 162 <TT>nwords</TT> in one expression:
 163 <pre>
 164         words[nwords++] = p;
 165 </pre>
 166 You should convince yourself that this arrangement works,
 167 and that (in this case)
 168 the preincrement form
 169 <pre>
 170         words[++nwords] = p;            /* WRONG */
 171 </pre>
 172 would <em>not</em> behave as desired.
 173 </p><p>When the function is done
 174 (when it finds the <TT>\0</TT> terminating the input line,
 175 or when it runs out of cells in the <TT>words</TT> array)
 176 it returns the number of words it has found.
 177 </p><p>Here is a complete example
 178 of calling <TT>getwords</TT>:
 179 <pre>
 180         char line[] = "this is a test";
 181         int i;
 182
 183         nwords = getwords(line, words, 10);
 184         for(i = 0; i &lt; nwords; i++)
 185                 printf("%s\n", words[i]);
 186 </pre>
 187 </p><p></p><hr>
 188 <p>
 189 Read sequentially:
 190 <a href="sx10g.html" rev=precedes>prev</a>
 191 <a href="sx11.html" rel=precedes>next</a>
 192 <a href="sx10.html" rev=subdocument>up</a>
 193 <a href="top.html">top</a>
 194 </p>
 195 <p>
 196 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
 197 // <a href="copyright.html">Copyright</a> 1995, 1996
 198 // <a href="mailto:scs@eskimo.com">mail feedback</a>
 199 </p>
 200 </body>
 201 </html>