From a4833af9f0b9077f1b5acf568e4b39baa79d4b4b Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 2 Jan 2007 23:06:07 +0000 Subject: [PATCH] * doc/autoconf.texi (Integer Overflow): Greatly expand and rewrite, taking notions from the recent discussion on the gcc and autoconf mailing lists; please see http://lists.gnu.org/archive/html/autoconf-patches/2006-12/msg00091.html and follow the many links. (Integer Overflow Basics, Signed Overflow Examples): (Optimization and Wraparound, Signed Overflow Advice): (Signed Integer Division): New sections. SCALAR(0x828309c) Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc. --- ChangeLog | 15 ++- doc/autoconf.texi | 309 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 304 insertions(+), 20 deletions(-) diff --git a/ChangeLog b/ChangeLog index 501f72b9..953c4f81 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,14 @@ +2007-01-02 Paul Eggert + + * doc/autoconf.texi (Integer Overflow): Greatly expand and + rewrite, taking notions from the recent discussion on the gcc and + autoconf mailing lists; please see + http://lists.gnu.org/archive/html/autoconf-patches/2006-12/msg00091.html + and follow the many links. + (Integer Overflow Basics, Signed Overflow Examples): + (Optimization and Wraparound, Signed Overflow Advice): + (Signed Integer Division): New sections. + 2006-12-28 Steven G. Johnson * lib/autoconf/general.m4 (AC_DEFINE_TRACE): Don't include @@ -13034,8 +13045,8 @@ ----- - Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006 Free Software - Foundation, Inc. + Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007 Free + Software Foundation, Inc. This file is part of GNU Autoconf. diff --git a/doc/autoconf.texi b/doc/autoconf.texi index f225e794..73255da0 100644 --- a/doc/autoconf.texi +++ b/doc/autoconf.texi @@ -183,7 +183,7 @@ a package for creating scripts to configure source code packages using templates and an M4 macro package. Copyright @copyright{} 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, -2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. +2001, 2002, 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document @@ -14949,15 +14949,296 @@ the programs work well enough in practice. @node Integer Overflow @section Integer Overflow -@cindex overflow, arithmetic - -In C, signed integer overflow leads to undefined behavior. However, -many programs and Autoconf tests assume that signed integer overflow after -addition, subtraction, or multiplication silently -wraps around modulo a power of two, using two's complement arithmetic, -so long as you cast the resulting value -to an integer type or store it into an integer variable. Such programs -are portable to the vast majority of modern platforms. However, signed +@cindex integer overflow +@cindex overflow, signed integer +@cindex signed integer overflow +@cindex wraparound arithmetic + +Many portable C programs assume that signed integer overflow wraps +around reliably using two's complement arithmetic. Yet the C standard +says that program behavior is undefined on overflow, and in a few cases +C programs do not work on some modern implementations because their +overflows do not wrap around as their authors intended. Conversely, in +at least one common case related to overflow, the C standard requires +behavior that is commonly not implemented. + +@menu +* Integer Overflow Basics:: Why integer overflow is a problem +* Signed Overflow Examples:: Examples of code assuming wraparound +* Optimization and Wraparound:: Optimizations that break uses of wraparound +* Signed Overflow Advice:: Practical advice for signed overflow issues +* Signed Integer Division:: @code{INT_MIN / -1} and @code{INT_MIN % -1} +@end menu + +@node Integer Overflow Basics +@subsection Basics of Integer Overflow +@cindex integer overflow +@cindex overflow, signed integer +@cindex signed integer overflow +@cindex wraparound arithmetic + +In languages like C, unsigned integer overflow reliably wraps around +modulo the word size. This is guaranteed by the C standard and is +portable in practice, unless you specify aggressive optimization options +suitable only for special applications. + +In contrast, the C standard says that signed integer overflow leads to +undefined behavior where a program can do anything, including dumping +core or overrunning a buffer. The misbehavior can even precede the +overflow. Such an overflow can occur during addition, subtraction, +multiplication, division, and left shift. + +Despite this requirement of the standard, many C programs and Autoconf +tests assume that signed integer overflow silently wraps around modulo a +power of two, using two's complement arithmetic, so long as you cast the +resulting value to a signed integer type or store it into a signed +integer variable. If you use conservative optimization flags, such +programs are generally portable to the vast majority of modern +platforms, with a few exceptions discussed later. + +For historical reasons the C standard also allows implementations with +ones' complement or signed magnitude arithmetic, but it is safe to +assume two's complement nowadays. + +@node Signed Overflow Examples +@subsection Examples of Code Assuming Wraparound Overflow +@cindex integer overflow +@cindex overflow, signed integer +@cindex signed integer overflow +@cindex wraparound arithmetic + +There has long been a tension between what the C standard requires for +signed integer overflow, and what C programs commonly assume. The +standard allows aggressive optimizations based on assumptions that +overflow never occurs, but many practical C programs rely on overflow +wrapping around. These programs do not conform to the standard, but +they commonly work in practice because compiler writers are +understandably reluctant to implement optimizations that would break +many programs, unless perhaps a user specifies aggressive optimization. + +The C Standard says that if a program has signed integer overflow its +behavior is undefined, and the undefined behavior can even precede the +overflow. To take an extreme example: + +@c Inspired by Robert Dewar's example in +@c (2007-01-01). +@example +if (password == expected_password) + allow_superuser_privileges (); +else + printf ("%d password mismatches\n", counter++); +@end example + +@noindent +If @code{counter} is an @code{int} and a compiler can deduce that +@code{counter == INT_MAX} or that @code{counter} previously overflowed, +the C standard allows the compiler to optimize away the password test +and generate code that allows superuser privileges unconditionally. + +Despite this requirement by the standard, it has long been common for C +code to assume wraparound arithmetic after signed overflow, and all +known practical C implementations support some C idioms that assume +wraparound signed arithmetic, even if the idioms does not conform +strictly to the standard. If your code looks like the following +examples it will almost surely work with real-world compilers. + +Here is an example derived from the 7th Edition Unix implementation of +@code{atoi} (1979-01-10): + +@example +char *p; +int f, n; +@dots{} +while (*p >= '0' && *p <= '9') + n = n * 10 + *p++ - '0'; +return (f ? -n : n); +@end example + +@noindent +Even if the input string is in range, on most modern machines this has +signed overflow when computing the most negative integer (the @code{-n} +overflows) or a value near an extreme integer (the first @code{+} +overflows). + +Here is another example, taken from the 7th Edition implementation of +@code{rand} (1979-01-10). Here the programmer expects both +multiplication and addition to wrap on overflow: + +@example +static long int randx = 1; +@dots{} +randx = randx * 1103515245 + 12345; +return (randx >> 16) & 077777; +@end example + +In the following example, derived from the @acronym{GNU} C Library 2.5 +implementation of @code{mktime} (2006-09-09), the code assumes +wraparound arithmetic in @code{+} to detect signed overflow: + +@example +time_t t, t1, t2; +int sec_requested, sec_adjustment; +@dots{} +t1 = t + sec_requested; +t2 = t1 + sec_adjustment; +if (((t1 < t) != (sec_requested < 0)) + | ((t2 < t1) != (sec_adjustment < 0))) + return -1; +@end example + +If your code looks like these examples, it is probably safe even though +it does not strictly conform to the C standard. This might lead one to +believe that one can generally assume wraparound on overflow, but that +is not always true, as can be seen in the next section. + +@node Optimization and Wraparound +@subsection Optimizations That Break Wraparound Arithmetic +@cindex loop induction + +Compilers sometimes generate code that is incompatible with wraparound +integer arithmetic. A simple example is an algebraic simplification: a +compiler might translate @code{(i * 2000) / 1000} to @code{i * 2} +because it assumes that @code{i * 2000} does not overflow. The +translation is not equivalent to the original when overflow occurs: +e.g., in the typical case of 32-bit signed two's complement wraparound +@code{int}, if @code{i} has type @code{int} and value @code{1073742}, +the original expression returns @minus{}2147483 but the optimized +version returns the mathematically correct value 2147484. + +More subtly, loop induction optimizations often exploit the undefined +behavior of signed overflow. Consider the following contrived function +@code{sumc}: + +@example +int +sumc (int lo, int hi) +@{ + int sum = 0; + int i; + for (i = lo; i <= hi; i++) + sum ^= i * 53; + return sum; +@} +@end example + +@noindent +To avoid multiplying by 53 each time through the loop, an optimizing +compiler might internally transform @code{sumc} to the equivalent of the +following: + +@example +int +transformed_sumc (int lo, int hi) +@{ + int sum = 0; + int hic = hi * 53; + int ic; + for (ic = lo * 53; ic <= hic; ic += 53) + sum ^= ic; + return sum; +@} +@end example + +@noindent +This transformation is allowed by the C standard, but it is invalid for +wraparound arithmetic when @code{INT_MAX / 53 < hi}, because then the +overflow in computing expressions like @code{hi * 53} can cause the +expression @code{i <= hi} to yield a different value from the +transformed expression @code{ic <= hic}. + +For this reason, compilers that use loop induction and similar +techniques often do not support reliable wraparound arithmetic when a +loop induction variable like @code{ic} is involved. Since loop +induction variables are generated by the compiler, and are not visible +in the source code, it is not always trivial to say whether the problem +affects your code. + +Hardly any code actually depends on wraparound arithmetic in cases like +these, so in practice these loop induction optimizations are almost +always useful. However, edge cases in this area can cause problems. +For example: + +@example +int j; +for (j = 1; 0 < j; j *= 2) + test (j); +@end example + +@noindent +Here, the loop attempts to iterate through all powers of 2 that +@code{int} can represent, but some test versions of @acronym{GCC} +optimize away the comparison to zero and thus generate an infinite loop, +under the argument that behavior is undefined on overflow. As of this +writing this optimization is not done by any production version of +@acronym{GCC} with @option{-O2}, but it might be performed by more +aggressive @acronym{GCC} optimization options, or by other compilers. + +@node Signed Overflow Advice +@subsection Practical Advice for Signed Overflow Issues +@cindex integer overflow +@cindex overflow, signed integer +@cindex signed integer overflow +@cindex wraparound arithmetic + +Ideally the safest approach is to avoid signed integer overflow +entirely. For example, instead of multiplying two signed integers, you +can convert them to unsigned integers, multiply the unsigned values, +then test whether the result is in signed range. + +Rewriting code in this way will be inconvenient, though, particularly if +the signed values might be negative. Also, it will probably hurt +performance. Using unsigned arithmetic to check for overflow is +particularly painful to do portably and efficiently when dealing with an +integer type like @code{uid_t} whose width and signedness vary from +platform to platform. + +Furthermore, many C applications pervasively assume wraparound behavior +and typically it is not easy to find and remove all these assumptions. +Hence it is often useful to maintain nonstandard code that assumes +wraparound on overflow, instead of rewriting the code. The rest of this +section attempts to give practical advice for this situation. + +If your code uses a signed loop index, make sure that the index cannot +overflow, along with all signed expressions derived from the index. +Here is a contrived example of problematic code with two instances of +overflow. + +@example +for (i = INT_MAX - 10; i <= INT_MAX; i++) + if (i + 1 < 0) + @{ + report_overflow (); + break; + @} +@end example + +@noindent +Because of the two overflows, a compiler might optimize away or +transform the two comparisons in a way that is incompatible with the +wraparound assumption. + +If your code uses an expression like @code{(i * 2000) / 1000} and you +actually want the multiplication to wrap around reliably, put the +product into a temporary variable and divide that by 1000. This +inhibits the algebraic optimization on many platforms. + +If your code assumes wraparound behavior and you want to insulate it +against any @acronym{GCC} optimizations that would fail to support that +behavior, you should use @acronym{GCC}'s @option{-fwrapv} option, which +causes signed overflow to wrap around reliably (except for division and +remainder, as discussed in the next section). + +If you need to port to platforms where signed integer overflow does not +reliably wrap around (e.g., due to hardware overflow checking, or to +highly aggressive optimizations), you should consider using +@acronym{GCC}'s @option{-ftrapv} option, which causes signed overflow to +raise an exception. + +@node Signed Integer Division +@subsection Signed Integer Division and Integer Overflow +@cindex division, integer + +Overflow in signed integer division is not always harmless: for example, on CPUs of the i386 family, dividing @code{INT_MIN} by @code{-1} yields a SIGFPE signal which by default terminates the program. Worse, taking the remainder @@ -14965,14 +15246,6 @@ of these two values typically yields the same signal on these CPUs, even though the C standard requires @code{INT_MIN % -1} to yield zero because the expression does not overflow. -@acronym{GCC} users might consider using the -@option{-ftrapv} option if they are worried about porting their code to -the rare platforms where signed integer overflow does not wrap around -after addition, subtraction, or multiplication. - -Unsigned integer overflow reliably wraps around modulo the word size. -This is guaranteed by the C standard and is portable in practice. - @node Null Pointers @section Properties of Null Pointers @cindex null pointers -- 2.11.4.GIT