2 Some notes concerning the PPC version:
4 - Most of the C routines that were replaced by assembly where
5 put back; some have been done as ppc assembly, most notably,
6 the fixed point math. The assembly routines show how to mix
7 C and assembly in PPC projects. The amiga_draw.s assembly
8 works, but was not used as it didn't improve the speed; more
9 work on optimizing amiga_draw.s should result in a little
10 better performance. The file is included so that readers can
11 see how to access global variables from within ppc assembly files.
13 - The fixed point routines use an integer multiply and a floating
14 point divide. The time needed to convert an integer to a floating
15 point number makes it quicker to use an integer multiply than to
16 use the floating point multiply. The integer divide takes long
17 enough that it is better to do the int -> fp conversion and do
18 a floating point divide. Look at the divide code to see how to
19 convert an integer to a double precision floating point number.
21 - I use the ppc's timebase for all timing in DOOM. You want to
22 avoid having to switch to the 68K side as much as possible. The
23 time function in DOOM is called so often that using the C time
24 function or calling the timer.device for the system time slows
25 the game down (3 FPS instead of 30 FPS). Do whatever you can to
26 avoid calling the 68K side; if you have to, try to avoid cache
27 flushing. Notice in amiga_sound.c that the doomsound.library
28 functions only flush the cache if needed. This gains you an extra
29 15% in speed compared to if the functions all flushed the caches.
31 - In the amiga_sound.c file, notice that I use PPCCallM68k instead
32 of PPCCallOS for the doomsound.library; due to a bug in ppc.library
33 v45.17 and earlier, if the caos structure is not 32 byte aligned,
34 PPCCallOS() will crash if the caches are not flushed by specifying
35 IF_CACHEFLUSHALL. PPCCallM68k() does not have this problem. Either
36 require people to use v45.20 or newer, or use PPCCallM68k() to call
37 functions that don't need the caches flushed.
39 - Using a 68K library or code segment is a great way to make use
40 of the 68K side from a PPC program. The library is not much harder
41 to code than a code segment and is easier to initialize and cleanup
42 from the PPC side; just OpenLibrary()/CloseLibrary(). In the
43 doomsound.library, I setup an audio interrupt driven 68K routine
44 that handles the audio mixing and output. The PPC is free to do
45 other things while the 68K side handles all the audio computation.
46 This is very helpful, especially if one is trying to do 3D audio
47 in a game. The doomsound.library does stereo panning on up to 16
48 sound effects and plays 16 channel stereo music with almost no effect
49 on the performance of the PPC side. Adding full 3D audio would use
50 more 68K processing time, but would have NO additional effect on the
51 PPC side. Notice in the FillBuffer routine in the 060 version how
52 a quad multiply should be replaced in an interrupt handler. It is
53 not as fast as a floating-point multiply, but avoids the problems
54 associated with trying to do floating-point arithmetic in an interrupt.
56 - Look at amiga_main.c for an example of using WB support provided
57 by the ppc.library and ELFLoadSeg. This file also contains the
58 code needed to compute the bus clock for use in converting the
59 ppc timebase to microseconds. Look at this file and amiga_system.c
60 to see how to scale the timebase values; look at amiga_timer.s to
61 see how to read the timebase.
63 - Look at amiga_net.c to see an example of how to force the alignment
64 of structures and includes. The readme for SAS/C PPC does not state
65 what happens to a file included from somewhere other than INCLUDE: or
66 PPCINCLUDE:. What happens is that it takes the current alignment,
67 which is usually ppc; if you need includes that are not in INCLUDE:
68 to have a 68K alignment, be SURE to use the alignment pragmas.
70 - amiga_c2p.s shows how I do chunky to planar on the PPC; you'll
71 notice I do it one line at a time. This allows the use of screenmodes
72 where the BitMap BytesPerRow is not the same as the screen width with
73 less hassle. I use the brute force conversion method because it is
74 still faster than chip memory; using r0 and r21 for every other long
75 allows the accumulation of the next bitplane while the previous is
76 being written to chip memory. As a result of the scheduled PPC C2P
77 and triple buffering, I get AGA speed nearly equal to a video card.
79 - Notice in the makefile how large numbers of files must be linked
80 (if using ELF format); thanks go to Frank Wille for this tip.
82 - I use PASM to do the PPC assembly. This is a great assembler!
83 PASM is written by Frank Wille and is available on AmiNet. I use
84 SAS/C PPC by Steve Krueger for C compilations. I tried vbcc, but
85 it was too buggy for my tastes. If you have SAS/C 6.5, get the
86 PPC updater! A link to Steve's site can be found on the CyberGraphX
87 web site (www.vgr.com) in the PPC page.