lang/C/the.ansi.c.programming.language/c.programming.notes.int/sx5.html

   1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
   2 <!-- This collection of hypertext pages is Copyright 1995-7 by Steve Summit. -->
   3 <!-- This material may be freely redistributed and used -->
   4 <!-- but may not be republished or sold without permission. -->
   5 <html>
   6 <head>
   7 <link rev="owner" href="mailto:scs@eskimo.com">
   8 <link rev="made" href="mailto:scs@eskimo.com">
   9 <title>Chapter 19: Returning Arrays</title>
  10 <link href="sx4cc.html" rev=precedes>
  11 <link href="sx6.html" rel=precedes>
  12 <link href="top.html" rev=subdocument>
  13 </head>
  14 <body>
  15 <H1>Chapter 19: Returning Arrays</H1>
  16
  17 <p>Arrays are ``second-class citizens'' in C.
  18 Related to the fact that arrays can't be assigned
  19 is the fact that
  20 they can't be returned by functions, either;
  21 that is,
  22 there is no such type as ``function returning array of ...''.
  23 In this chapter we'll study three workarounds,
  24 three ways to implement a function
  25 which attempts to return a string
  26 (that is, an array of <TT>char</TT>)
  27 or an array of some other type.
  28 </p><p>In the last chapter, we looked at some code for converting an integer
  29 into a string of digits representing its value.
  30 This operation is the inverse of the function performed
  31 by the standard function <TT>atoi</TT>.
  32 Suppose we wanted to wrap our digit-generating code
  33 up in a function and call it <TT>itoa</TT>.
  34 How would it return the generated string of digits?
  35 We'll use this example to demonstrate all three techniques.
  36 For simplicity, though,
  37 we won't repeat the <TT>do</TT>/<TT>while</TT> loop
  38 in each example function;
  39 instead, we'll simply call <TT>sprintf</TT>.
  40 (In fact, since calling <TT>sprintf</TT> is so easy,
  41 most C programs call it directly when they need to
  42 convert integers to strings,
  43 and consequently there is no standard <TT>itoa</TT> function.)
  44 </p><p>First, let's look at the way that <em>won't</em> work,
  45 so that we can set it aside and make sure we never use it.
  46 What if we wrote <TT>itoa</TT> like this?
  47 <pre>
  48         char *itoa(int n)
  49         {
  50         char retbuf[25];
  51         sprintf(retbuf, "%d", n);
  52         return retbuf;
  53         }
  54 </pre>
  55 This looks superficially reasonable,
  56 and it might well be what we'd write at first if we weren't
  57 being careful.
  58 (It might even seem to work, at first.)
  59 However, it has a serious, fatal flaw:
  60 let's think about that local array, <TT>retbuf</TT>.
  61 Since it's a regular local variable,
  62 it has <dfn>automatic</dfn> duration,
  63 which means that it springs into existence when the function is called
  64 <em>and disappears when the function returns</em>.
  65 Therefore,
  66 the pointer that this version of <TT>itoa</TT> returns
  67 is to an array which no longer exists by the time the caller
  68 receives the pointer.
  69 (Remember that the statement <TT>return retbuf;</TT>
  70 returns a pointer to the first character in <TT>retbuf</TT>;
  71 by the ``equivalence of arrays and pointers,''
  72 the mention of the array <TT>retbuf</TT> in this context
  73 is equivalent to <TT>&amp;retbuf[0]</TT>.)
  74 When the caller tries to use the pointer,
  75 the string
  76 created
  77 by <TT>itoa</TT> might still be there,
  78 or the memory might have been re-used by some other function.
  79 Therefore, this first version of <TT>itoa</TT> is <em>not</em>
  80 adequate and <em>not</em> acceptable.
  81 Functions must never return pointers to local,
  82 automatic-duration arrays.
  83 </p><p>Since the problem with returning a pointer to a local array
  84 is that the array has automatic duration by default,
  85 the simplest fix to the above non-functional version of <TT>itoa</TT>,
  86 and the first of our three working methods of returning arrays from functions,
  87 is to declare the array <TT>static</TT>, instead:
  88 <pre>
  89         char *itoa(int n)
  90         {
  91         static char retbuf[25];
  92         sprintf(retbuf, "%d", n);
  93         return retbuf;
  94         }
  95 </pre>
  96 Now, the <TT>retbuf</TT> array does not disappear when <TT>itoa</TT>
  97 returns, so the pointer is still valid by the time the caller uses it.
  98 </p><p>Returning a pointer to a <TT>static</TT> array
  99 is a practical and popular solution
 100 to the problem of ``returning'' an array,
 101 but it has one drawback.
 102 Each time you call the function,
 103 it re-uses the same array and returns the same pointer.
 104 Therefore,
 105 when you call the function a second time,
 106 whatever information it ``returned'' to you last time
 107 will be overwritten.
 108 (More precisely, the information,
 109 that the function returned a pointer to,
 110 will be overwritten.)
 111 For example,
 112 suppose we had occasion
 113 to save the pointer returned by <TT>itoa</TT> for a little while,
 114 with the intention of using it later,
 115 after calling <TT>itoa</TT> again in the meantime:
 116 <pre>
 117         int i = 23;
 118         char *p1, *p2;
 119         p1 = itoa(i);
 120         i = i + 10;
 121         p2 = itoa(i);
 122         printf("old i = %s, new i = %s\n", p1, p2);
 123 </pre>
 124 But this won't work as we
 125 expect--the
 126 second call to <TT>itoa</TT> will overwrite the string
 127 (stored in <TT>itoa</TT>'s
 128 static
 129 <TT>retbuf</TT> array)
 130 which was stored by the first call.
 131 Instead of printing <TT>i</TT>'s old and new value,
 132 the last line will print the new value, twice.
 133 Both <TT>p1</TT> and <TT>p2</TT> will point to the same place,
 134 to the <TT>retbuf</TT> array down inside <TT>itoa</TT>,
 135 because each call to <TT>itoa</TT> always returns
 136 the same pointer to that same array.
 137 </p><p>We can see the same problem in an even simpler example.
 138 Suppose we had never heard of
 139 the <TT>%d</TT> format specifier in <TT>printf</TT>.
 140 We might try to call something like this:
 141 <pre>
 142         printf("i = %s, j = %s\n", itoa(i), itoa(j));
 143 </pre>
 144 where <TT>i</TT> and <TT>j</TT> are
 145 two different <TT>int</TT> variables.
 146 What will happen?
 147 Either the compiler will make
 148 the first call to <TT>itoa</TT> first,
 149 or the second.
 150 (It turns out that it's not specified
 151 which order the compiler will use;
 152 different compilers behave differently in this respect.)
 153 Whichever call to <TT>itoa</TT> happens <em>second</em>
 154 will be the one that
 155 gets to keep its return value in
 156 <TT>retbuf</TT>.
 157 The <TT>printf</TT> call will either print <TT>i</TT>'s value twice,
 158 or <TT>j</TT>'s value twice,
 159 but it won't be able to print two distinct values.
 160 </p><p>The moral is that
 161 although the <TT>static</TT> return array technique will work,
 162 the caller has to be a little bit careful,
 163 and must never expect the return pointer from one call to the function
 164 to be usable after a later call to the function.
 165 Sometimes this restriction is a real problem;
 166 other times it's perfectly acceptable.
 167 (Some of the functions in the standard C library use this technique;
 168 one example is <TT>ctime</TT>,
 169 which converts timestamp values to printable strings.
 170 When you see a cryptic sentence like
 171 ``The returned pointer is to static data
 172 which is overwritten with each call''
 173 in the documentation for a library function,
 174 it means that the function is using this technique.)
 175 When this restriction <em>would</em> be too onerous on the caller,
 176 we should use one of the other two techniques, described next.
 177 </p><p>If the function can't use a local or local <TT>static</TT> array
 178 to hold the return value,
 179 the next option is to have the <em>caller</em> allocate an array,
 180 and use that.
 181 In this case,
 182 the function accepts
 183 at least one additional argument
 184 (in addition to any data to be operated on):
 185 a pointer to the location to write the result back to.
 186 Our familiar <TT>getline</TT> function has worked this way all along.
 187 If we rewrote <TT>itoa</TT>
 188 along these lines,
 189 it might look like this:
 190 <pre>
 191         char *itoa(int n, char buf[])
 192         {
 193         sprintf(buf, "%d", n);
 194         return buf;
 195         }
 196 </pre>
 197 Now the caller must pass an <TT>int</TT> value to be converted
 198 <em>and</em> an array to hold the converted result:
 199 <pre>
 200         int i = 23;
 201         char buf[25];
 202         char *str = itoa(i, buf);
 203 </pre>
 204 There are two differences between this
 205 version of <TT>itoa</TT> and our old <TT>getline</TT> function.
 206 (Well, three, really;
 207 of course
 208 the two functions do totally different things.)
 209 One difference is that
 210 <TT>getline</TT> accepted another extra argument
 211 which was the <em>size</em> of the array in the caller,
 212 so that <TT>getline</TT> could promise not to overflow that array.
 213 Our latest version of <TT>itoa</TT> does not accept such an argument,
 214 which is a deficiency.
 215 If the caller ever passes an array
 216 which is too small to hold all the digits of the converted integer,
 217 <TT>itoa</TT> (actually, <TT>sprintf</TT>)
 218 will sail off the end of the array
 219 and scribble on some other part of memory.
 220 (Needless to say, this can be a disaster.)
 221 </p><p>Another difference is that the return value
 222 of this latest version of <TT>itoa</TT>
 223 isn't terribly useful.
 224 The pointer which this version of <TT>itoa</TT> returns
 225 is always the same as the pointer you handed it.
 226 Even if this version of <TT>itoa</TT> didn't return anything
 227 as its formal return value,
 228 you could still get your hands on the string it created,
 229 since it would be sitting right there in your own array
 230 (the one that you passed
 231 to
 232 <TT>itoa</TT>).
 233 In the case of <TT>getline</TT>,
 234 we had a second thing to return as the formal return value,
 235 namely the length of the line we'd just read.
 236 </p><p>However, this second strategy is also popular and workable.
 237 Besides our own <TT>getline</TT> function,
 238 the standard library functions <TT>fgets</TT> and <TT>fread</TT>
 239 both use this technique.
 240 </p><p>When the limit of a single static return array within the function
 241 would be unacceptable,
 242 and when it would be a nuisance for the caller
 243 to have to declare or otherwise allocate return arrays,
 244 a third option
 245 is for the function to dynamically allocate some memory
 246 for the returned array
 247 by calling <TT>malloc</TT>.
 248 Here is our last version of <TT>itoa</TT>,
 249 demonstrating this technique:
 250 <pre>
 251         char *itoa(int n)
 252         {
 253         char *retbuf = malloc(25);
 254         if(retbuf == NULL)
 255                 return NULL;
 256         sprintf(retbuf, "%d", n);
 257         return retbuf;
 258         }
 259 </pre>
 260 Now the caller can go back to saying simple things like
 261 <pre>
 262         char *p = itoa(i);
 263 </pre>
 264 and it no longer has to worry about the possibility that
 265 a later call to <TT>itoa</TT>
 266 will overwrite the results of the first.
 267 However, the caller now has two <em>new</em> things to worry about:
 268 <OL><li>This version of <TT>itoa</TT> returns a null pointer if
 269 <TT>malloc</TT> fails to return the memory that <TT>itoa</TT> needs.
 270 The caller should really be checking for this null pointer return
 271 each time it calls <TT>itoa</TT>,
 272 before using the pointer.
 273 <li>If the caller calls <TT>itoa</TT> 10,000 times,
 274 we'll have allocated
 275 25 <TT>*</TT> 10,000 = 250,000 bytes of memory,
 276 or a quarter of a meg.
 277 Unless someone is careful to call <TT>free</TT>
 278 to deallocate all of that memory,
 279 it will be wasted.
 280 Few programs can afford to waste that much memory.
 281 (Once upon a time,
 282 few programs could get that much memory, period.)
 283 The ``someone''
 284 who is going to have to call <TT>free</TT>
 285 isn't <TT>itoa</TT>;
 286 it has no idea when the caller is done
 287 with the memory returned by a previous call to <TT>itoa</TT>,
 288 and in fact <TT>itoa</TT> might never get called again.
 289 So it will be the caller's responsibility
 290 to keep track of each pointer returned by <TT>itoa</TT>,
 291 and to free it when it's no longer needed,
 292 or else memory will gradually leak away.
 293 </OL>We can work around the first problem--if
 294 we expect that there will usually be enough memory,
 295 such that the call to <TT>malloc</TT> will rarely if ever fail,
 296 and if all the caller would do in an out-of-memory situation is
 297 print an error message and abort,
 298 we can move the test down into the function:
 299 <pre>
 300         char *retbuf = malloc(25);
 301         if(retbuf == NULL)
 302                 {
 303                 fprintf(stderr, "out of memory\n");
 304                 exit(EXIT_FAILURE);
 305                 }
 306 </pre>
 307 Now the function never returns a null pointer,
 308 so the caller doesn't have to check.
 309 (When <TT>malloc</TT> fails, the function doesn't return at all.)
 310 </p><p></p><p>In summary, we've seen three ways
 311 of ``returning'' arrays from functions,
 312 none of which is perfect.
 313 The <TT>static</TT> array technique is usually convenient for the caller,
 314 but only for functions
 315 which it's unlikely that the caller will be trying to call multiple times
 316 and retain multiple return values.
 317 (The <TT>static</TT> array technique is also definitely imperfect
 318 in that it violates the notion
 319 that calling code shouldn't need to know
 320 about the inner, implementation details of a called function.)
 321 The caller-passes-an-array technique
 322 is useful when the caller might have a number of calls to the function active,
 323 but when that number is small and fixed,
 324 so that the caller can easily declare and keep track
 325 of a number of return arrays
 326 (if necessary).
 327 Finally,
 328 when there might be an arbitrary number of calls to the function,
 329 or when maximum flexibility is otherwise needed,
 330 the function-calls-<TT>malloc</TT> technique is appropriate,
 331 but with its extra flexibility comes some costs,
 332 the most important of which is that the caller must remember to
 333 free the returned pointers.
 334 </p><hr>
 335 <p>
 336 Read sequentially:
 337 <a href="sx4cc.html" rev=precedes>prev</a>
 338 <a href="sx6.html" rel=precedes>next</a>
 339 <a href="top.html" rev=subdocument>up</a>
 340 <a href="top.html">top</a>
 341 </p>
 342 <p>
 343 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
 344 // <a href="copyright.html">Copyright</a> 1996-1999
 345 // <a href="mailto:scs@eskimo.com">mail feedback</a>
 346 </p>
 347 </body>
 348 </html>