1 // This small program does some raytracing. It tests Valgrind's handling of
2 // FP operations. It apparently does a lot of trigonometry operations.
4 // Licensing: This program is closely based on the one of the same name from
5 // http://www.fourmilab.ch/. The front page of that site says:
7 // "Except for a few clearly-marked exceptions, all the material on this
8 // site is in the public domain and may be used in any manner without
9 // permission, restriction, attribution, or compensation."
11 /* This program can be used in two ways. If INTRIG is undefined, sin,
12 cos, tan, etc, will be used as supplied by <math.h>. If it is
13 defined, then the program calculates all this stuff from first
14 principles (so to speak) and does not use the libc facilities. For
15 benchmarking purposes it seems better to avoid the libc stuff, so
16 that the inner loops (sin, sqrt) present a workload independent of
17 libc implementations on different platforms. Hence: */
24 John Walker's Floating Point Benchmark, derived from...
26 Marinchip Interactive Lens Design System
28 John Walker December 1980
31 http://www.fourmilab.ch/
33 This program may be used, distributed, and modified freely as
34 long as the origin information is preserved.
36 This is a complete optical design raytracing algorithm,
37 stripped of its user interface and recast into portable C. It
38 not only determines execution speed on an extremely floating
39 point (including trig function) intensive real-world
40 application, it checks accuracy on an algorithm that is
41 exquisitely sensitive to errors. The performance of this
42 program is typically far more sensitive to changes in the
43 efficiency of the trigonometric library routines than the
44 average floating point program.
46 The benchmark may be compiled in two modes. If the symbol
47 INTRIG is defined, built-in trigonometric and square root
48 routines will be used for all calculations. Timings made with
49 INTRIG defined reflect the machine's basic floating point
50 performance for the arithmetic operators. If INTRIG is not
51 defined, the system library <math.h> functions are used.
52 Results with INTRIG not defined reflect the system's library
53 performance and/or floating point hardware support for trig
54 functions and square root. Results with INTRIG defined are a
55 good guide to general floating point performance, while
56 results with INTRIG undefined indicate the performance of an
57 application which is math function intensive.
59 Special note regarding errors in accuracy: this program has
60 generated numbers identical to the last digit it formats and
61 checks on the following machines, floating point
62 architectures, and languages:
64 Marinchip 9900 QBASIC IBM 370 double-precision (REAL * 8) format
66 IBM PC / XT / AT Lattice C IEEE 64 bit, 80 bit temporaries
67 High C same, in line 80x87 code
68 BASICA "Double precision"
69 Quick BASIC IEEE double precision, software routines
71 Sun 3 C IEEE 64 bit, 80 bit temporaries,
72 in-line 68881 code, in-line FPA code.
74 MicroVAX II C Vax "G" format floating point
76 Macintosh Plus MPW C SANE floating point, IEEE 64 bit format
79 Inaccuracies reported by this program should be taken VERY
80 SERIOUSLY INDEED, as the program has been demonstrated to be
81 invariant under changes in floating point format, as long as
82 the format is a recognised double precision format. If you
83 encounter errors, please remember that they are just as likely
84 to be in the floating point editing library or the
85 trigonometric libraries as in the low level operator code.
87 The benchmark assumes that results are basically reliable, and
88 only tests the last result computed against the reference. If
89 you're running on a suspect system you can compile this
90 program with ACCURACY defined. This will generate a version
91 which executes as an infinite loop, performing the ray trace
92 and checking the results on every pass. All incorrect results
95 Representative timings are given below. All have been
96 normalised as if run for 1000 iterations.
98 Time in seconds Computer, Compiler, and notes
101 3466.00 4031.00 Commodore 128, 2 Mhz 8510 with software floating
102 point. Abacus Software/Data-Becker Super-C 128,
103 version 3.00, run in fast (2 Mhz) mode. Note:
104 the results generated by this system differed
105 from the reference results in the 8th to 10th
108 3290.00 IBM PC/AT 6 Mhz, Microsoft/IBM BASICA version A3.00.
109 Run with the "/d" switch, software floating point.
111 2131.50 IBM PC/AT 6 Mhz, Lattice C version 2.14, small model.
112 This version of Lattice compiles subroutine
113 calls which either do software floating point
114 or use the 80x87. The machine on which I ran
115 this had an 80287, but the results were so bad
116 I wonder if it was being used.
118 1598.00 Macintosh Plus, MPW C, SANE Software floating point.
120 1582.13 Marinchip 9900 2 Mhz, QBASIC compiler with software
121 floating point. This was a QBASIC version of the
122 program which contained the identical algorithm.
124 404.00 IBM PC/AT 6 Mhz, Microsoft QuickBASIC version 2.0.
125 Software floating point.
127 165.15 IBM PC/AT 6 Mhz, Metaware High C version 1.3, small
128 model. This was compiled to call subroutines for
129 floating point, and the machine contained an 80287
130 which was used by the subroutines.
132 143.20 Macintosh II, MPW C, SANE calls. I was unable to
133 determine whether SANE was using the 68881 chip or
136 121.80 Sun 3/160 16 Mhz, Sun C. Compiled with -fsoft switch
137 which executes floating point in software.
139 78.78 110.11 IBM RT PC (Model 6150). IBM AIX 1.0 C compiler
142 75.2 254.0 Microsoft Quick C 1.0, in-line 8087 instructions,
143 compiled with 80286 optimisation on. (Switches
144 were -Ol -FPi87-G2 -AS). Small memory model.
146 69.50 IBM PC/AT 6Mhz, Borland Turbo BASIC 1.0. Compiled
147 in "8087 required" mode to generate in-line
148 code for the math coprocessor.
150 66.96 IBM PC/AT 6Mhz, Microsoft QuickBASIC 4.0. This
151 release of QuickBASIC compiles code for the
152 80287 math coprocessor.
154 66.36 206.35 IBM PC/AT 6Mhz, Metaware High C version 1.3, small
155 model. This was compiled with in-line code for the
156 80287 math coprocessor. Trig functions still call
159 63.07 220.43 IBM PC/AT, 6Mhz, Borland Turbo C, in-line 8087 code,
160 small model, word alignment, no stack checking,
163 17.18 Apollo DN-3000, 12 Mhz 68020 with 68881, compiled
164 with in-line code for the 68881 coprocessor.
165 According to Apollo, the library routines are chosen
166 at runtime based on coprocessor presence. Since the
167 coprocessor was present, the library is supposed to
168 use in-line floating point code.
170 15.55 27.56 VAXstation II GPX. Compiled and executed under
173 15.14 37.93 Macintosh II, Unix system V. Green Hills 68020
174 Unix compiler with in-line code for the 68881
175 coprocessor (-O -ZI switches).
177 12.69 Sun 3/160 16 Mhz, Sun C. Compiled with -fswitch,
178 which calls a subroutine to select the fastest
179 floating point processor. This was using the 68881.
181 11.74 26.73 Compaq Deskpro 386, 16 Mhz 80386 with 16 Mhz 80387.
182 Metaware High C version 1.3, compiled with in-line
183 for the math coprocessor (but not optimised for the
184 80386/80387). Trig functions still call library
187 8.43 30.49 Sun 3/160 16 Mhz, Sun C. Compiled with -f68881,
188 generating in-line MC68881 instructions. Trig
189 functions still call library routines.
191 6.29 25.17 Sun 3/260 25 Mhz, Sun C. Compiled with -f68881,
192 generating in-line MC68881 instructions. Trig
193 functions still call library routines.
195 4.57 Sun 3/260 25 Mhz, Sun FORTRAN 77. Compiled with
196 -O -f68881, generating in-line MC68881 instructions.
197 Trig functions are compiled in-line. This used
198 the FORTRAN 77 version of the program, FBFORT77.F.
200 4.00 14.20 Sun386i/25 Mhz model 250, Sun C compiler.
202 4.00 14.00 Sun386i/25 Mhz model 250, Metaware C.
204 3.10 12.00 Compaq 386/387 25 Mhz running SCO Xenix 2.
205 Compiled with Metaware HighC 386, optimized
208 3.00 12.00 Compaq 386/387 25MHZ optimized for 386/387.
210 2.96 5.17 Sun 4/260, Sparc RISC processor. Sun C,
211 compiled with the -O2 switch for global
214 2.47 COMPAQ 486/25, secondary cache disabled, High C,
215 486/387, inline f.p., small memory model.
217 2.20 3.40 Data General Motorola 88000, 16 Mhz, Gnu C.
219 1.56 COMPAQ 486/25, 128K secondary cache, High C, 486/387,
220 inline f.p., small memory model.
222 0.66 1.50 DEC Pmax, Mips processor.
224 0.63 0.91 Sun SparcStation 2, Sun C (SunOS 4.1.1) with
225 -O4 optimisation and "/usr/lib/libm.il" inline
228 0.60 1.07 Intel 860 RISC processor, 33 Mhz, Greenhills
231 0.40 0.90 Dec 3MAX, MIPS 3000 processor, -O4.
233 0.31 0.90 IBM RS/6000, -O.
235 0.1129 0.2119 Dell Dimension XPS P133c, Pentium 133 MHz,
236 Windows 95, Microsoft Visual C 5.0.
238 0.0883 0.2166 Silicon Graphics Indigo², MIPS R4400,
241 0.0351 0.0561 Dell Dimension XPS R100, Pentium II 400 MHz,
242 Windows 98, Microsoft Visual C 5.0.
244 0.0312 0.0542 Sun Ultra 2, UltraSPARC V9, 300 MHz, Solaris
247 0.00862 0.01074 Dell Inspiron 9100, Pentium 4, 3.4 GHz, gcc -O3.
259 #define cot(x) (1.0 / tan(x))
264 #define max_surfaces 10
266 /* Local variables */
268 /* static char tbfr[132]; */
270 static short current_surfaces
;
271 static short paraxial
;
273 static double clear_aperture
;
275 static double aberr_lspher
;
276 static double aberr_osc
;
277 static double aberr_lchrom
;
279 static double max_lspher
;
280 static double max_osc
;
281 static double max_lchrom
;
283 static double radius_of_curvature
;
284 static double object_distance
;
285 static double ray_height
;
286 static double axis_slope_angle
;
287 static double from_index
;
288 static double to_index
;
290 static double spectral_line
[9];
291 static double s
[max_surfaces
][5];
292 static double od_sa
[2][2];
294 static char outarr
[8][80]; /* Computed output of program goes here */
296 int itercount
; /* The iteration counter for the main loop
297 in the program is made global so that
298 the compiler should not be allowed to
299 optimise out the loop over the ray
303 #define ITERATIONS /*1000*/ /*500000*/ 80000
305 int niter
= ITERATIONS
; /* Iteration counter */
307 static char *refarr
[] = { /* Reference results. These happen to
308 be derived from a run on Microsoft
309 Quick BASIC on the IBM PC/AT. */
311 " Marginal ray 47.09479120920 0.04178472683",
312 " Paraxial ray 47.08372160249 0.04177864821",
313 "Longitudinal spherical aberration: -0.01106960671",
314 " (Maximum permissible): 0.05306749907",
315 "Offense against sine condition (coma): 0.00008954761",
316 " (Maximum permissible): 0.00250000000",
317 "Axial chromatic aberration: 0.00448229032",
318 " (Maximum permissible): 0.05306749907"
321 /* The test case used in this program is the design for a 4 inch
322 achromatic telescope objective used as the example in Wyld's
323 classic work on ray tracing by hand, given in Amateur Telescope
326 static double testcase
[4][4] = {
327 {27.05, 1.5137, 63.6, 0.52},
328 {-16.68, 1, 0, 0.138},
329 {-16.68, 1.6164, 36.7, 0.38},
333 /* Internal trig functions (used only if INTRIG is defined). These
334 standard functions may be enabled to obtain timings that reflect
335 the machine's floating point performance rather than the speed of
336 its trig function evaluation. */
340 /* The following definitions should keep you from getting intro trouble
341 with compilers which don't let you redefine intrinsic functions. */
348 #define atan2 I_atan2
351 #define fabs(x) ((x < 0.0) ? -x : x)
353 #define pic 3.1415926535897932
355 /* Commonly used constants */
357 static double pi
= pic
,
360 fouroverpi
= 4.0 / pic
,
363 /* Coefficients for ATAN evaluation */
365 static double atanc
[] = {
367 0.4636476090008061165,
368 0.7853981633974483094,
369 0.98279372324732906714,
370 1.1071487177940905022,
371 1.1902899496825317322,
372 1.2490457723982544262,
373 1.2924966677897852673,
374 1.3258176636680324644
377 /* aint(x) Return integer part of number. Truncates towards 0 */
379 double aint(double x
)
383 /* Note that this routine cannot handle the full floating point
384 number range. This function should be in the machine-dependent
385 floating point library! */
388 if ((int)(-0.5) != 0 && l
< 0 )
394 /* sin(x) Return sine, x in radians */
396 static double sin(double x
)
401 x
= (((sign
= (x
< 0.0)) != 0) ? -x
: x
);
404 x
-= (aint(x
/ twopi
) * twopi
);
417 r
= y
* (((((((-0.202253129293E-13 * z
+ 0.69481520350522E-11) * z
-
418 0.17572474176170806E-8) * z
+ 0.313361688917325348E-6) * z
-
419 0.365762041821464001E-4) * z
+ 0.249039457019271628E-2) * z
-
420 0.0807455121882807815) * z
+ 0.785398163397448310);
422 y
= (piover2
- x
) * fouroverpi
;
424 r
= ((((((-0.38577620372E-12 * z
+ 0.11500497024263E-9) * z
-
425 0.2461136382637005E-7) * z
+ 0.359086044588581953E-5) * z
-
426 0.325991886926687550E-3) * z
+ 0.0158543442438154109) * z
-
427 0.308425137534042452) * z
+ 1.0;
429 return sign
? -r
: r
;
432 /* cos(x) Return cosine, x in radians, by identity */
434 static double cos(double x
)
436 x
= (x
< 0.0) ? -x
: x
;
437 if (x
> twopi
) /* Do range reduction here to limit */
438 x
= x
- (aint(x
/ twopi
) * twopi
); /* roundoff on add of PI/2 */
439 return sin(x
+ piover2
);
442 /* tan(x) Return tangent, x in radians, by identity */
444 static double tan(double x
)
446 return sin(x
) / cos(x
);
449 /* sqrt(x) Return square root. Initial guess, then Newton-
450 Raphson refinement */
452 double sqrt(double x
)
462 "\nGood work! You tried to take the square root of %g",
465 "\nunfortunately, that is too complex for me to handle.\n");
469 y
= (0.154116 + 1.893872 * x
) / (1.0 + 1.047988 * x
);
471 c
= (y
- x
/ y
) / 2.0;
473 for (n
= 50; c
!= cl
&& n
--;) {
476 c
= (y
- x
/ y
) / 2.0;
481 /* atan(x) Return arctangent in radians,
482 range -pi/2 to pi/2 */
484 static double atan(double x
)
489 x
= (((sign
= (x
< 0.0)) != 0) ? -x
: x
);
506 x
= (x
- z
) / (x
* z
+ 1);
510 b
= ((((893025.0 * z
+ 49116375.0) * z
+ 425675250.0) * z
+
511 1277025750.0) * z
+ 1550674125.0) * z
+ 654729075.0;
512 a
= (((13852575.0 * z
+ 216602100.0) * z
+ 891080190.0) * z
+
513 1332431100.0) * z
+ 654729075.0;
514 a
= (a
/ b
) * x
+ atanc
[y
];
517 return sign
? -a
: a
;
520 /* atan2(y,x) Return arctangent in radians of y/x,
523 static double atan2(double y
, double x
)
528 if (y
== 0.0) /* Special case: atan2(0,0) = 0 */
545 /* asin(x) Return arcsine in radians of x */
547 static double asin(double x
)
551 "\nInverse trig functions lose much of their gloss when");
553 "\ntheir arguments are greater than 1, such as the");
555 "\nvalue %g you passed.\n", x
);
558 return atan2(x
, sqrt(1 - x
* x
));
562 /* Calculate passage through surface
564 If the variable PARAXIAL is true, the trace through the
565 surface will be done using the paraxial approximations.
566 Otherwise, the normal trigonometric trace will be done.
568 This routine takes the following inputs:
570 RADIUS_OF_CURVATURE Radius of curvature of surface
571 being crossed. If 0, surface is
574 OBJECT_DISTANCE Distance of object focus from
575 lens vertex. If 0, incoming
576 rays are parallel and
577 the following must be specified:
579 RAY_HEIGHT Height of ray from axis. Only
580 relevant if OBJECT.DISTANCE == 0
582 AXIS_SLOPE_ANGLE Angle incoming ray makes with axis
585 FROM_INDEX Refractive index of medium being left
587 TO_INDEX Refractive index of medium being
590 The outputs are the following variables:
592 OBJECT_DISTANCE Distance from vertex to object focus
595 AXIS_SLOPE_ANGLE Angle incoming ray makes with axis
596 at intercept after refraction.
600 static void transit_surface() {
601 double iang
, /* Incidence angle */
602 rang
, /* Refraction angle */
603 iang_sin
, /* Incidence angle sin */
604 rang_sin
, /* Refraction angle sin */
605 old_axis_slope_angle
, sagitta
;
608 if (radius_of_curvature
!= 0.0) {
609 if (object_distance
== 0.0) {
610 axis_slope_angle
= 0.0;
611 iang_sin
= ray_height
/ radius_of_curvature
;
613 iang_sin
= ((object_distance
-
614 radius_of_curvature
) / radius_of_curvature
) *
617 rang_sin
= (from_index
/ to_index
) *
619 old_axis_slope_angle
= axis_slope_angle
;
620 axis_slope_angle
= axis_slope_angle
+
622 if (object_distance
!= 0.0)
623 ray_height
= object_distance
* old_axis_slope_angle
;
624 object_distance
= ray_height
/ axis_slope_angle
;
627 object_distance
= object_distance
* (to_index
/ from_index
);
628 axis_slope_angle
= axis_slope_angle
* (from_index
/ to_index
);
632 if (radius_of_curvature
!= 0.0) {
633 if (object_distance
== 0.0) {
634 axis_slope_angle
= 0.0;
635 iang_sin
= ray_height
/ radius_of_curvature
;
637 iang_sin
= ((object_distance
-
638 radius_of_curvature
) / radius_of_curvature
) *
639 sin(axis_slope_angle
);
641 iang
= asin(iang_sin
);
642 rang_sin
= (from_index
/ to_index
) *
644 old_axis_slope_angle
= axis_slope_angle
;
645 axis_slope_angle
= axis_slope_angle
+
646 iang
- asin(rang_sin
);
647 sagitta
= sin((old_axis_slope_angle
+ iang
) / 2.0);
648 sagitta
= 2.0 * radius_of_curvature
*sagitta
*sagitta
;
649 object_distance
= ((radius_of_curvature
* sin(
650 old_axis_slope_angle
+ iang
)) *
651 cot(axis_slope_angle
)) + sagitta
;
655 rang
= -asin((from_index
/ to_index
) *
656 sin(axis_slope_angle
));
657 object_distance
= object_distance
* ((to_index
*
658 cos(-rang
)) / (from_index
*
659 cos(axis_slope_angle
)));
660 axis_slope_angle
= -rang
;
663 /* Perform ray trace in specific spectral line */
665 static void trace_line(int line
, double ray_h
)
669 object_distance
= 0.0;
673 for (i
= 1; i
<= current_surfaces
; i
++) {
674 radius_of_curvature
= s
[i
][1];
677 to_index
= to_index
+ ((spectral_line
[4] -
678 spectral_line
[line
]) /
679 (spectral_line
[3] - spectral_line
[6])) * ((s
[i
][2] - 1.0) /
682 from_index
= to_index
;
683 if (i
< current_surfaces
)
684 object_distance
= object_distance
- s
[i
][4];
688 /* Initialise when called the first time */
690 int main(int argc
, char* argv
[])
693 double od_fline
, od_cline
;
698 spectral_line
[1] = 7621.0; /* A */
699 spectral_line
[2] = 6869.955; /* B */
700 spectral_line
[3] = 6562.816; /* C */
701 spectral_line
[4] = 5895.944; /* D */
702 spectral_line
[5] = 5269.557; /* E */
703 spectral_line
[6] = 4861.344; /* F */
704 spectral_line
[7] = 4340.477; /* G'*/
705 spectral_line
[8] = 3968.494; /* H */
707 /* Process the number of iterations argument, if one is supplied. */
710 niter
= atoi(argv
[1]);
711 if (*argv
[1] == '-' || niter
< 1) {
712 printf("This is John Walker's floating point accuracy and\n");
713 printf("performance benchmark program. You call it with\n");
714 printf("\nfbench <itercount>\n\n");
715 printf("where <itercount> is the number of iterations\n");
716 printf("to be executed. Archival timings should be made\n");
717 printf("with the iteration count set so that roughly five\n");
718 printf("minutes of execution is timed.\n");
723 /* Load test case into working array */
725 clear_aperture
= 4.0;
726 current_surfaces
= 4;
727 for (i
= 0; i
< current_surfaces
; i
++)
728 for (j
= 0; j
< 4; j
++)
729 s
[i
+ 1][j
+ 1] = testcase
[i
][j
];
732 printf("Beginning execution of floating point accuracy test...\n");
735 printf("Ready to begin John Walker's floating point accuracy\n");
736 printf("and performance benchmark. %d iterations will be made.\n\n",
739 printf("\nMeasured run time in seconds should be divided by %.f\n", niter
/ 1000.0);
740 printf("to normalise for reporting results. For archival results,\n");
741 printf("adjust iteration count so the benchmark runs about five minutes.\n\n");
743 //printf("Press return to begin benchmark:");
747 /* Perform ray trace the specified number of times. */
752 if ((passes
% 100L) == 0) {
753 printf("Pass %ld.\n", passes
);
756 for (itercount
= 0; itercount
< niter
; itercount
++) {
759 for (paraxial
= 0; paraxial
<= 1; paraxial
++) {
761 /* Do main trace in D light */
763 trace_line(4, clear_aperture
/ 2.0);
764 od_sa
[paraxial
][0] = object_distance
;
765 od_sa
[paraxial
][1] = axis_slope_angle
;
769 /* Trace marginal ray in C */
771 trace_line(3, clear_aperture
/ 2.0);
772 od_cline
= object_distance
;
774 /* Trace marginal ray in F */
776 trace_line(6, clear_aperture
/ 2.0);
777 od_fline
= object_distance
;
779 aberr_lspher
= od_sa
[1][0] - od_sa
[0][0];
780 aberr_osc
= 1.0 - (od_sa
[1][0] * od_sa
[1][1]) /
781 (sin(od_sa
[0][1]) * od_sa
[0][0]);
782 aberr_lchrom
= od_fline
- od_cline
;
783 max_lspher
= sin(od_sa
[0][1]);
787 max_lspher
= 0.0000926 / (max_lspher
* max_lspher
);
789 max_lchrom
= max_lspher
;
793 //printf("Stop the timer:\007");
797 /* Now evaluate the accuracy of the results from the last ray trace */
799 sprintf(outarr
[0], "%15s %21.11f %14.11f",
800 "Marginal ray", od_sa
[0][0], od_sa
[0][1]);
801 sprintf(outarr
[1], "%15s %21.11f %14.11f",
802 "Paraxial ray", od_sa
[1][0], od_sa
[1][1]);
804 "Longitudinal spherical aberration: %16.11f",
807 " (Maximum permissible): %16.11f",
810 "Offense against sine condition (coma): %16.11f",
813 " (Maximum permissible): %16.11f",
816 "Axial chromatic aberration: %16.11f",
819 " (Maximum permissible): %16.11f",
822 /* Now compare the edited results with the master values from
823 reference executions of this program. */
826 for (i
= 0; i
< 8; i
++) {
827 if (strcmp(outarr
[i
], refarr
[i
]) != 0) {
829 printf("\nError in pass %ld for results on line %d...\n",
832 printf("\nError in results on line %d...\n", i
+ 1);
834 printf("Expected: \"%s\"\n", refarr
[i
]);
835 printf("Received: \"%s\"\n", outarr
[i
]);
837 k
= strlen(refarr
[i
]);
838 for (j
= 0; j
< k
; j
++) {
839 printf("%c", refarr
[i
][j
] == outarr
[i
][j
] ? ' ' : '^');
840 if (refarr
[i
][j
] != outarr
[i
][j
])
850 printf("\n%d error%s in results. This is VERY SERIOUS.\n",
851 errors
, errors
> 1 ? "s" : "");
853 printf("\nNo errors in results.\n");