1 // This small program does some raytracing. It tests Valgrind's handling of
2 // FP operations. It apparently does a lot of trigonometry operations.
4 // Licensing: This program is closely based on the one of the same name from
5 // http://www.fourmilab.ch/. The front page of that site says:
7 // "Except for a few clearly-marked exceptions, all the material on this
8 // site is in the public domain and may be used in any manner without
9 // permission, restriction, attribution, or compensation."
11 /* This program can be used in two ways. If INTRIG is undefined, sin,
12 cos, tan, etc, will be used as supplied by <math.h>. If it is
13 defined, then the program calculates all this stuff from first
14 principles (so to speak) and does not use the libc facilities. For
15 benchmarking purposes it seems better to avoid the libc stuff, so
16 that the inner loops (sin, sqrt) present a workload independent of
17 libc implementations on different platforms. Hence: */
24 John Walker's Floating Point Benchmark, derived from...
26 Marinchip Interactive Lens Design System
28 John Walker December 1980
31 http://www.fourmilab.ch/
33 This program may be used, distributed, and modified freely as
34 long as the origin information is preserved.
36 This is a complete optical design raytracing algorithm,
37 stripped of its user interface and recast into portable C. It
38 not only determines execution speed on an extremely floating
39 point (including trig function) intensive real-world
40 application, it checks accuracy on an algorithm that is
41 exquisitely sensitive to errors. The performance of this
42 program is typically far more sensitive to changes in the
43 efficiency of the trigonometric library routines than the
44 average floating point program.
46 The benchmark may be compiled in two modes. If the symbol
47 INTRIG is defined, built-in trigonometric and square root
48 routines will be used for all calculations. Timings made with
49 INTRIG defined reflect the machine's basic floating point
50 performance for the arithmetic operators. If INTRIG is not
51 defined, the system library <math.h> functions are used.
52 Results with INTRIG not defined reflect the system's library
53 performance and/or floating point hardware support for trig
54 functions and square root. Results with INTRIG defined are a
55 good guide to general floating point performance, while
56 results with INTRIG undefined indicate the performance of an
57 application which is math function intensive.
59 Special note regarding errors in accuracy: this program has
60 generated numbers identical to the last digit it formats and
61 checks on the following machines, floating point
62 architectures, and languages:
64 Marinchip 9900 QBASIC IBM 370 double-precision (REAL * 8) format
66 IBM PC / XT / AT Lattice C IEEE 64 bit, 80 bit temporaries
67 High C same, in line 80x87 code
68 BASICA "Double precision"
69 Quick BASIC IEEE double precision, software routines
71 Sun 3 C IEEE 64 bit, 80 bit temporaries,
72 in-line 68881 code, in-line FPA code.
74 MicroVAX II C Vax "G" format floating point
76 Macintosh Plus MPW C SANE floating point, IEEE 64 bit format
79 Inaccuracies reported by this program should be taken VERY
80 SERIOUSLY INDEED, as the program has been demonstrated to be
81 invariant under changes in floating point format, as long as
82 the format is a recognised double precision format. If you
83 encounter errors, please remember that they are just as likely
84 to be in the floating point editing library or the
85 trigonometric libraries as in the low level operator code.
87 The benchmark assumes that results are basically reliable, and
88 only tests the last result computed against the reference. If
89 you're running on a suspect system you can compile this
90 program with ACCURACY defined. This will generate a version
91 which executes as an infinite loop, performing the ray trace
92 and checking the results on every pass. All incorrect results
95 Representative timings are given below. All have been
96 normalised as if run for 1000 iterations.
98 Time in seconds Computer, Compiler, and notes
101 3466.00 4031.00 Commodore 128, 2 Mhz 8510 with software floating
102 point. Abacus Software/Data-Becker Super-C 128,
103 version 3.00, run in fast (2 Mhz) mode. Note:
104 the results generated by this system differed
105 from the reference results in the 8th to 10th
108 3290.00 IBM PC/AT 6 Mhz, Microsoft/IBM BASICA version A3.00.
109 Run with the "/d" switch, software floating point.
111 2131.50 IBM PC/AT 6 Mhz, Lattice C version 2.14, small model.
112 This version of Lattice compiles subroutine
113 calls which either do software floating point
114 or use the 80x87. The machine on which I ran
115 this had an 80287, but the results were so bad
116 I wonder if it was being used.
118 1598.00 Macintosh Plus, MPW C, SANE Software floating point.
120 1582.13 Marinchip 9900 2 Mhz, QBASIC compiler with software
121 floating point. This was a QBASIC version of the
122 program which contained the identical algorithm.
124 404.00 IBM PC/AT 6 Mhz, Microsoft QuickBASIC version 2.0.
125 Software floating point.
127 165.15 IBM PC/AT 6 Mhz, Metaware High C version 1.3, small
128 model. This was compiled to call subroutines for
129 floating point, and the machine contained an 80287
130 which was used by the subroutines.
132 143.20 Macintosh II, MPW C, SANE calls. I was unable to
133 determine whether SANE was using the 68881 chip or
136 121.80 Sun 3/160 16 Mhz, Sun C. Compiled with -fsoft switch
137 which executes floating point in software.
139 78.78 110.11 IBM RT PC (Model 6150). IBM AIX 1.0 C compiler
142 75.2 254.0 Microsoft Quick C 1.0, in-line 8087 instructions,
143 compiled with 80286 optimisation on. (Switches
144 were -Ol -FPi87-G2 -AS). Small memory model.
146 69.50 IBM PC/AT 6Mhz, Borland Turbo BASIC 1.0. Compiled
147 in "8087 required" mode to generate in-line
148 code for the math coprocessor.
150 66.96 IBM PC/AT 6Mhz, Microsoft QuickBASIC 4.0. This
151 release of QuickBASIC compiles code for the
152 80287 math coprocessor.
154 66.36 206.35 IBM PC/AT 6Mhz, Metaware High C version 1.3, small
155 model. This was compiled with in-line code for the
156 80287 math coprocessor. Trig functions still call
159 63.07 220.43 IBM PC/AT, 6Mhz, Borland Turbo C, in-line 8087 code,
160 small model, word alignment, no stack checking,
163 17.18 Apollo DN-3000, 12 Mhz 68020 with 68881, compiled
164 with in-line code for the 68881 coprocessor.
165 According to Apollo, the library routines are chosen
166 at runtime based on coprocessor presence. Since the
167 coprocessor was present, the library is supposed to
168 use in-line floating point code.
170 15.55 27.56 VAXstation II GPX. Compiled and executed under
173 15.14 37.93 Macintosh II, Unix system V. Green Hills 68020
174 Unix compiler with in-line code for the 68881
175 coprocessor (-O -ZI switches).
177 12.69 Sun 3/160 16 Mhz, Sun C. Compiled with -fswitch,
178 which calls a subroutine to select the fastest
179 floating point processor. This was using the 68881.
181 11.74 26.73 Compaq Deskpro 386, 16 Mhz 80386 with 16 Mhz 80387.
182 Metaware High C version 1.3, compiled with in-line
183 for the math coprocessor (but not optimised for the
184 80386/80387). Trig functions still call library
187 8.43 30.49 Sun 3/160 16 Mhz, Sun C. Compiled with -f68881,
188 generating in-line MC68881 instructions. Trig
189 functions still call library routines.
191 6.29 25.17 Sun 3/260 25 Mhz, Sun C. Compiled with -f68881,
192 generating in-line MC68881 instructions. Trig
193 functions still call library routines.
195 4.57 Sun 3/260 25 Mhz, Sun FORTRAN 77. Compiled with
196 -O -f68881, generating in-line MC68881 instructions.
197 Trig functions are compiled in-line. This used
198 the FORTRAN 77 version of the program, FBFORT77.F.
200 4.00 14.20 Sun386i/25 Mhz model 250, Sun C compiler.
202 4.00 14.00 Sun386i/25 Mhz model 250, Metaware C.
204 3.10 12.00 Compaq 386/387 25 Mhz running SCO Xenix 2.
205 Compiled with Metaware HighC 386, optimized
208 3.00 12.00 Compaq 386/387 25MHZ optimized for 386/387.
210 2.96 5.17 Sun 4/260, Sparc RISC processor. Sun C,
211 compiled with the -O2 switch for global
214 2.47 COMPAQ 486/25, secondary cache disabled, High C,
215 486/387, inline f.p., small memory model.
217 2.20 3.40 Data General Motorola 88000, 16 Mhz, Gnu C.
219 1.56 COMPAQ 486/25, 128K secondary cache, High C, 486/387,
220 inline f.p., small memory model.
222 0.66 1.50 DEC Pmax, Mips processor.
224 0.63 0.91 Sun SparcStation 2, Sun C (SunOS 4.1.1) with
225 -O4 optimisation and "/usr/lib/libm.il" inline
228 0.60 1.07 Intel 860 RISC processor, 33 Mhz, Greenhills
231 0.40 0.90 Dec 3MAX, MIPS 3000 processor, -O4.
233 0.31 0.90 IBM RS/6000, -O.
235 0.1129 0.2119 Dell Dimension XPS P133c, Pentium 133 MHz,
236 Windows 95, Microsoft Visual C 5.0.
238 0.0883 0.2166 Silicon Graphics Indigo², MIPS R4400,
241 0.0351 0.0561 Dell Dimension XPS R100, Pentium II 400 MHz,
242 Windows 98, Microsoft Visual C 5.0.
244 0.0312 0.0542 Sun Ultra 2, UltraSPARC V9, 300 MHz, Solaris
247 0.00862 0.01074 Dell Inspiron 9100, Pentium 4, 3.4 GHz, gcc -O3.
259 #define cot(x) (1.0 / tan(x))
264 #define max_surfaces 10
266 /* Local variables */
268 /* static char tbfr[132]; */
270 static short current_surfaces
;
271 static short paraxial
;
273 static double clear_aperture
;
275 static double aberr_lspher
;
276 static double aberr_osc
;
277 static double aberr_lchrom
;
279 static double max_lspher
;
280 static double max_osc
;
281 static double max_lchrom
;
283 static double radius_of_curvature
;
284 static double object_distance
;
285 static double ray_height
;
286 static double axis_slope_angle
;
287 static double from_index
;
288 static double to_index
;
290 static double spectral_line
[9];
291 static double s
[max_surfaces
][5];
292 static double od_sa
[2][2];
294 static char outarr
[8][80]; /* Computed output of program goes here */
296 int itercount
; /* The iteration counter for the main loop
297 in the program is made global so that
298 the compiler should not be allowed to
299 optimise out the loop over the ray
303 #define ITERATIONS /*1000*/ /*500000*/ 80000
305 int niter
= ITERATIONS
; /* Iteration counter */
307 static char *refarr
[] = { /* Reference results. These happen to
308 be derived from a run on Microsoft
309 Quick BASIC on the IBM PC/AT. */
311 " Marginal ray 47.09479120920 0.04178472683",
312 " Paraxial ray 47.08372160249 0.04177864821",
313 "Longitudinal spherical aberration: -0.01106960671",
314 " (Maximum permissible): 0.05306749907",
315 "Offense against sine condition (coma): 0.00008954761",
316 " (Maximum permissible): 0.00250000000",
317 "Axial chromatic aberration: 0.00448229032",
318 " (Maximum permissible): 0.05306749907"
321 /* The test case used in this program is the design for a 4 inch
322 achromatic telescope objective used as the example in Wyld's
323 classic work on ray tracing by hand, given in Amateur Telescope
326 static double testcase
[4][4] = {
327 {27.05, 1.5137, 63.6, 0.52},
328 {-16.68, 1, 0, 0.138},
329 {-16.68, 1.6164, 36.7, 0.38},
333 /* Internal trig functions (used only if INTRIG is defined). These
334 standard functions may be enabled to obtain timings that reflect
335 the machine's floating point performance rather than the speed of
336 its trig function evaluation. */
340 /* The following definitions should keep you from getting intro trouble
341 with compilers which don't let you redefine intrinsic functions. */
348 #define atan2 I_atan2
351 #define fabs(x) ((x < 0.0) ? -x : x)
353 #define pic 3.1415926535897932
355 /* Commonly used constants */
357 static double pi
= pic
,
360 fouroverpi
= 4.0 / pic
,
363 /* Coefficients for ATAN evaluation */
365 static double atanc
[] = {
367 0.4636476090008061165,
368 0.7853981633974483094,
369 0.98279372324732906714,
370 1.1071487177940905022,
371 1.1902899496825317322,
372 1.2490457723982544262,
373 1.2924966677897852673,
374 1.3258176636680324644
377 /* aint(x) Return integer part of number. Truncates towards 0 */
384 /* Note that this routine cannot handle the full floating point
385 number range. This function should be in the machine-dependent
386 floating point library! */
389 if ((int)(-0.5) != 0 && l
< 0 )
395 /* sin(x) Return sine, x in radians */
403 x
= (((sign
= (x
< 0.0)) != 0) ? -x
: x
);
406 x
-= (aint(x
/ twopi
) * twopi
);
419 r
= y
* (((((((-0.202253129293E-13 * z
+ 0.69481520350522E-11) * z
-
420 0.17572474176170806E-8) * z
+ 0.313361688917325348E-6) * z
-
421 0.365762041821464001E-4) * z
+ 0.249039457019271628E-2) * z
-
422 0.0807455121882807815) * z
+ 0.785398163397448310);
424 y
= (piover2
- x
) * fouroverpi
;
426 r
= ((((((-0.38577620372E-12 * z
+ 0.11500497024263E-9) * z
-
427 0.2461136382637005E-7) * z
+ 0.359086044588581953E-5) * z
-
428 0.325991886926687550E-3) * z
+ 0.0158543442438154109) * z
-
429 0.308425137534042452) * z
+ 1.0;
431 return sign
? -r
: r
;
434 /* cos(x) Return cosine, x in radians, by identity */
439 x
= (x
< 0.0) ? -x
: x
;
440 if (x
> twopi
) /* Do range reduction here to limit */
441 x
= x
- (aint(x
/ twopi
) * twopi
); /* roundoff on add of PI/2 */
442 return sin(x
+ piover2
);
445 /* tan(x) Return tangent, x in radians, by identity */
450 return sin(x
) / cos(x
);
453 /* sqrt(x) Return square root. Initial guess, then Newton-
454 Raphson refinement */
467 "\nGood work! You tried to take the square root of %g",
470 "\nunfortunately, that is too complex for me to handle.\n");
474 y
= (0.154116 + 1.893872 * x
) / (1.0 + 1.047988 * x
);
476 c
= (y
- x
/ y
) / 2.0;
478 for (n
= 50; c
!= cl
&& n
--;) {
481 c
= (y
- x
/ y
) / 2.0;
486 /* atan(x) Return arctangent in radians,
487 range -pi/2 to pi/2 */
489 static double atan(x
)
495 x
= (((sign
= (x
< 0.0)) != 0) ? -x
: x
);
512 x
= (x
- z
) / (x
* z
+ 1);
516 b
= ((((893025.0 * z
+ 49116375.0) * z
+ 425675250.0) * z
+
517 1277025750.0) * z
+ 1550674125.0) * z
+ 654729075.0;
518 a
= (((13852575.0 * z
+ 216602100.0) * z
+ 891080190.0) * z
+
519 1332431100.0) * z
+ 654729075.0;
520 a
= (a
/ b
) * x
+ atanc
[y
];
523 return sign
? -a
: a
;
526 /* atan2(y,x) Return arctangent in radians of y/x,
529 static double atan2(y
, x
)
535 if (y
== 0.0) /* Special case: atan2(0,0) = 0 */
552 /* asin(x) Return arcsine in radians of x */
554 static double asin(x
)
559 "\nInverse trig functions lose much of their gloss when");
561 "\ntheir arguments are greater than 1, such as the");
563 "\nvalue %g you passed.\n", x
);
566 return atan2(x
, sqrt(1 - x
* x
));
570 /* Calculate passage through surface
572 If the variable PARAXIAL is true, the trace through the
573 surface will be done using the paraxial approximations.
574 Otherwise, the normal trigonometric trace will be done.
576 This routine takes the following inputs:
578 RADIUS_OF_CURVATURE Radius of curvature of surface
579 being crossed. If 0, surface is
582 OBJECT_DISTANCE Distance of object focus from
583 lens vertex. If 0, incoming
584 rays are parallel and
585 the following must be specified:
587 RAY_HEIGHT Height of ray from axis. Only
588 relevant if OBJECT.DISTANCE == 0
590 AXIS_SLOPE_ANGLE Angle incoming ray makes with axis
593 FROM_INDEX Refractive index of medium being left
595 TO_INDEX Refractive index of medium being
598 The outputs are the following variables:
600 OBJECT_DISTANCE Distance from vertex to object focus
603 AXIS_SLOPE_ANGLE Angle incoming ray makes with axis
604 at intercept after refraction.
608 static void transit_surface() {
609 double iang
, /* Incidence angle */
610 rang
, /* Refraction angle */
611 iang_sin
, /* Incidence angle sin */
612 rang_sin
, /* Refraction angle sin */
613 old_axis_slope_angle
, sagitta
;
616 if (radius_of_curvature
!= 0.0) {
617 if (object_distance
== 0.0) {
618 axis_slope_angle
= 0.0;
619 iang_sin
= ray_height
/ radius_of_curvature
;
621 iang_sin
= ((object_distance
-
622 radius_of_curvature
) / radius_of_curvature
) *
625 rang_sin
= (from_index
/ to_index
) *
627 old_axis_slope_angle
= axis_slope_angle
;
628 axis_slope_angle
= axis_slope_angle
+
630 if (object_distance
!= 0.0)
631 ray_height
= object_distance
* old_axis_slope_angle
;
632 object_distance
= ray_height
/ axis_slope_angle
;
635 object_distance
= object_distance
* (to_index
/ from_index
);
636 axis_slope_angle
= axis_slope_angle
* (from_index
/ to_index
);
640 if (radius_of_curvature
!= 0.0) {
641 if (object_distance
== 0.0) {
642 axis_slope_angle
= 0.0;
643 iang_sin
= ray_height
/ radius_of_curvature
;
645 iang_sin
= ((object_distance
-
646 radius_of_curvature
) / radius_of_curvature
) *
647 sin(axis_slope_angle
);
649 iang
= asin(iang_sin
);
650 rang_sin
= (from_index
/ to_index
) *
652 old_axis_slope_angle
= axis_slope_angle
;
653 axis_slope_angle
= axis_slope_angle
+
654 iang
- asin(rang_sin
);
655 sagitta
= sin((old_axis_slope_angle
+ iang
) / 2.0);
656 sagitta
= 2.0 * radius_of_curvature
*sagitta
*sagitta
;
657 object_distance
= ((radius_of_curvature
* sin(
658 old_axis_slope_angle
+ iang
)) *
659 cot(axis_slope_angle
)) + sagitta
;
663 rang
= -asin((from_index
/ to_index
) *
664 sin(axis_slope_angle
));
665 object_distance
= object_distance
* ((to_index
*
666 cos(-rang
)) / (from_index
*
667 cos(axis_slope_angle
)));
668 axis_slope_angle
= -rang
;
671 /* Perform ray trace in specific spectral line */
673 static void trace_line(line
, ray_h
)
679 object_distance
= 0.0;
683 for (i
= 1; i
<= current_surfaces
; i
++) {
684 radius_of_curvature
= s
[i
][1];
687 to_index
= to_index
+ ((spectral_line
[4] -
688 spectral_line
[line
]) /
689 (spectral_line
[3] - spectral_line
[6])) * ((s
[i
][2] - 1.0) /
692 from_index
= to_index
;
693 if (i
< current_surfaces
)
694 object_distance
= object_distance
- s
[i
][4];
698 /* Initialise when called the first time */
705 double od_fline
, od_cline
;
710 spectral_line
[1] = 7621.0; /* A */
711 spectral_line
[2] = 6869.955; /* B */
712 spectral_line
[3] = 6562.816; /* C */
713 spectral_line
[4] = 5895.944; /* D */
714 spectral_line
[5] = 5269.557; /* E */
715 spectral_line
[6] = 4861.344; /* F */
716 spectral_line
[7] = 4340.477; /* G'*/
717 spectral_line
[8] = 3968.494; /* H */
719 /* Process the number of iterations argument, if one is supplied. */
722 niter
= atoi(argv
[1]);
723 if (*argv
[1] == '-' || niter
< 1) {
724 printf("This is John Walker's floating point accuracy and\n");
725 printf("performance benchmark program. You call it with\n");
726 printf("\nfbench <itercount>\n\n");
727 printf("where <itercount> is the number of iterations\n");
728 printf("to be executed. Archival timings should be made\n");
729 printf("with the iteration count set so that roughly five\n");
730 printf("minutes of execution is timed.\n");
735 /* Load test case into working array */
737 clear_aperture
= 4.0;
738 current_surfaces
= 4;
739 for (i
= 0; i
< current_surfaces
; i
++)
740 for (j
= 0; j
< 4; j
++)
741 s
[i
+ 1][j
+ 1] = testcase
[i
][j
];
744 printf("Beginning execution of floating point accuracy test...\n");
747 printf("Ready to begin John Walker's floating point accuracy\n");
748 printf("and performance benchmark. %d iterations will be made.\n\n",
751 printf("\nMeasured run time in seconds should be divided by %.f\n", niter
/ 1000.0);
752 printf("to normalise for reporting results. For archival results,\n");
753 printf("adjust iteration count so the benchmark runs about five minutes.\n\n");
755 //printf("Press return to begin benchmark:");
759 /* Perform ray trace the specified number of times. */
764 if ((passes
% 100L) == 0) {
765 printf("Pass %ld.\n", passes
);
768 for (itercount
= 0; itercount
< niter
; itercount
++) {
771 for (paraxial
= 0; paraxial
<= 1; paraxial
++) {
773 /* Do main trace in D light */
775 trace_line(4, clear_aperture
/ 2.0);
776 od_sa
[paraxial
][0] = object_distance
;
777 od_sa
[paraxial
][1] = axis_slope_angle
;
781 /* Trace marginal ray in C */
783 trace_line(3, clear_aperture
/ 2.0);
784 od_cline
= object_distance
;
786 /* Trace marginal ray in F */
788 trace_line(6, clear_aperture
/ 2.0);
789 od_fline
= object_distance
;
791 aberr_lspher
= od_sa
[1][0] - od_sa
[0][0];
792 aberr_osc
= 1.0 - (od_sa
[1][0] * od_sa
[1][1]) /
793 (sin(od_sa
[0][1]) * od_sa
[0][0]);
794 aberr_lchrom
= od_fline
- od_cline
;
795 max_lspher
= sin(od_sa
[0][1]);
799 max_lspher
= 0.0000926 / (max_lspher
* max_lspher
);
801 max_lchrom
= max_lspher
;
805 //printf("Stop the timer:\007");
809 /* Now evaluate the accuracy of the results from the last ray trace */
811 sprintf(outarr
[0], "%15s %21.11f %14.11f",
812 "Marginal ray", od_sa
[0][0], od_sa
[0][1]);
813 sprintf(outarr
[1], "%15s %21.11f %14.11f",
814 "Paraxial ray", od_sa
[1][0], od_sa
[1][1]);
816 "Longitudinal spherical aberration: %16.11f",
819 " (Maximum permissible): %16.11f",
822 "Offense against sine condition (coma): %16.11f",
825 " (Maximum permissible): %16.11f",
828 "Axial chromatic aberration: %16.11f",
831 " (Maximum permissible): %16.11f",
834 /* Now compare the edited results with the master values from
835 reference executions of this program. */
838 for (i
= 0; i
< 8; i
++) {
839 if (strcmp(outarr
[i
], refarr
[i
]) != 0) {
841 printf("\nError in pass %ld for results on line %d...\n",
844 printf("\nError in results on line %d...\n", i
+ 1);
846 printf("Expected: \"%s\"\n", refarr
[i
]);
847 printf("Received: \"%s\"\n", outarr
[i
]);
849 k
= strlen(refarr
[i
]);
850 for (j
= 0; j
< k
; j
++) {
851 printf("%c", refarr
[i
][j
] == outarr
[i
][j
] ? ' ' : '^');
852 if (refarr
[i
][j
] != outarr
[i
][j
])
862 printf("\n%d error%s in results. This is VERY SERIOUS.\n",
863 errors
, errors
> 1 ? "s" : "");
865 printf("\nNo errors in results.\n");