fix: opcode numbering
[zpu.git] / zpu / docs / zpu_arch.html
blob15b9ccfcfb6e83cd57324ffec9c8e525dc3fa296
1 <html>
2 <body>
3 <h1>This Document</h1>
4 This is a snapshot of the zpu/zpu/docs/zpu_arch.html document in CVS.
5 <p>
6 Several of the links will only work if you have checked out the zpu/zpu tree from opencores CVS. See <a href="#download">Download</a> below.
7 <h1>Index</h1>
8 <ul>
9 <li> <a href="#introduction">Introduction</a>
10 <ul>
11 <li> <a href="#license">License</a>
12 <li> <a href="#features">Features</a>
13 <li> <a href="#status">Status</a>
14 <li> <a href="#download">Download</a>
15 <li> <a href="#patch">Creating a patch</a>
16 <li> <a href="#mailinglist">Getting help - mailing list</a>
17 </ul>
18 <li> <a href="#architecture">Core Architecture</a>
19 <ul>
20 <li> <a href="#instructionset">Instruction set</a>
21 <li> <a href="#interrupts">Interrupts</a>
22 <li> <a href="#startup">Startup code (aka crt0.s)</a>
23 <li> <a href="#vectors">Jump vectors</a>
24 </ul>
25 <li> <a href="#implementations">Core Implementations</a>
26 <ul>
27 <li> <a href="#performance">Performance Summary</a>
28 <li> <a href="#zpu4_small">zpu4 small</a>
29 <li> <a href="#zpu4_medium">zpu4 medium</a>
30 <li> <a href="#alzpu_pipe">alzpu pipelined</a>
31 <li> <a href="#zealot">Zealot medium and small</a>
32 <li> <a href="#zy2000">ZY2000 SOC</a>
33 <li> <a href="#verilogwip">Un-named verilog translation</a>
34 <li> <a href="#implementing">Implementing your own ZPU</a>
35 </ul>
36 <li> <a href="#refdesign">Reference Designs</a>
37 <ul>
38 <li> <a href="#ref_min">SOC - Minimal (core+RAM)</a>
39 <li> <a href="#ref_basic">SOC - Basic (core+RAM+UART)</a>
40 <li> <a href="#ref_soc">SOC - Board (core+RAM+Wishbone+++)</a>
41 <li> <a href="#rams">Common - RAM models</a>
42 <li> <a href="#wishbone">Common - Wishbone</a>
43 <li> <a href="#uart">Common - UART</a>
44 <li> <a href="#spicontroller">Common - SPI flash controller</a>
45 </ul>
46 <li> <a href="#tools">Working with tools and core</a>
47 <ul>
48 <li> <a href="#setuplinux">Setup - Linux toolchain</a>
49 <li> <a href="#setupcygwin">Setup - Cygwin toolchain</a>
50 <li> <a href="#gcc2ram">GCC to RAM</a>
51 <li> <a href="#hdlsim">HDL simulation (ZPU4)</a>
52 <li> <a href="#gdbsim">GDB simulation (ZPU4)</a>
53 <li> <a href="#simulator">Instruction Set Simulator</a>
54 </ul>
55 <li> <a href="#misc">Miscellaneous</a>
56 <ul>
57 <li> <a href="#tuning">Speeding up the ZPU</a>
58 <li> <a href="#codesize">Optimizing for code size</a>
59 <li> <a href="#ecos">Installing eCos build tools</a>
60 <li> <a href="#memorymap">Memory map</a>
61 </ul>
62 <li> <a href="#todo">TODO</a>
63 <ul>
64 <li> <a href="#todolist">TODO list</a>
65 <li> <a href="#repository">Repository Re-org</a>
66 <li> <a href="#nextgen">Next generation ZPU</a>
67 <li> <a href="#float">Floating point support</a>
68 </ul>
69 </ul>
71 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
73 <a name="introduction"/>
74 <h1>Introduction</h1>
75 <P>The worlds smallest 32 bit CPU with GCC toolchain.
76 <P>The ZPU is a small CPU in two ways: it takes up very little resources and
77 the architecture itself is small. The latter can be important when learning
78 about CPU architectures and implementing variations of the ZPU where
79 aspects of CPU design is examined. In academia students can learn VHDL,
80 CPU architecture in general and complete exercises in the course of a year.</P>
81 <P>
82 The current ZPU instruction set and architecture has not changed for
83 the last couple of years and can be considered quite stable. There is
84 a lot of discussion about various modifications to the ZPU architecture
85 in the zylin-zpu mailing list, but currently no actual modifications are
86 planned as the improvements that have been identified are relatively
87 slight(&lt;30% performance/size improvement).
88 </P>
89 <P>
90 There are a handful of implementations of the ZPU. Most of these usually
91 have some strong points and there is some movement in the direction of
92 consolidating improvements into a few officially recommended ZPU
93 implementations.
94 </P>
95 <P>
96 For those that are interested in the Zylin ZPU, I recommend joining
97 up on the zylin-zpu mailing list and participating in the discussion
98 there. The zylin-zpu is a friendly place where people of different
99 skills, hardware, software, tools meet to exchange ideas about the ZPU
100 and microprocessor architecture in general.
101 </P>
103 <P>Sincerely,</P>
104 <P>&Oslash;yvind Harboe <BR>Zylin AS
105 </P>
107 <a name="license"/>
108 <h2>License</h2>
109 <P>The project includes HDL, GCC toolchain and eCos HAL.
111 <P>The ZPU has a BSD license for the HDL and GPL for the rest.
112 This allows users to implement any version of the ZPU they want in
113 commercial products, but if improvements are done to the architecture
114 as such, then they need to be contributed back.
115 </P>
117 <P>Per Jan 1. 2008, Zylin has the Copyright for the ZPU, i.e. Zylin
118 is free to decide that the ZPU shall have a BSD license for HDL + GPL
119 for the rest.</P>
121 <a name="features"/>
122 <h2>Features</h2>
123 <UL>
124 <LI>Small size: (See <a href="#implementations">performance summary</a>)
125 <LI>Code size 80% of ARM Thumb
126 <LI>GCC toolchain(GDB, newlib, libstdc+)
127 <LI>eCos embedded operating system support
128 </UL>
130 <a name="status"/>
131 <h2>Status</h2>
132 <UL>
133 <LI>HDL works
134 <LI>GCC toolchain works
135 <LI>eCos HAL works
136 </UL>
137 <P>... but there is a long <a href="#todo">TODO</a> list</P>
138 <P>Expect churn as we converge onto a shorter list of <a href="#implementations">implementations</a>.
140 <a name="download"/>
141 <h2>Download source code</h2>
142 The ZPU HDL source code is available as a GIT repository from <a href="http://repo.or.cz/w/zpu.git" target="_blank">http://repo.or.cz/w/zpu.git</a>.
143 You can download the latest sourcecode as a snapshot without installing GIT.
145 Previously the ZPU repository was hosted as a CVS repository at www.opencores.org,
146 but that ZPU CVS repository is there only for historical reference at this point.
147 Once www.opencores.org grows a GIT hosting service, the plan is to replicate
148 the GIT repository there.
151 The GCC ZPU toolchain is available from <a href ="http://repo.or.cz/w/zpugcc.git" target ="_blank">http://repo.or.cz/w/zpugcc.git</a>. The ZPU GCC toolchain is BIG (over 100 MBytes).
152 <a name="patch"/>
153 <h2>GIT</h2>
154 For more advanced use of GIT, you will need to hit the books and read up
155 on the GIT documentation.
156 <p/>
157 That said, you can ask "silly" newbie questions about GIT on the <a href="#mailinglist">zylin-zpu mailing
158 list</a> and you should receive some friendly prodding in the right direction
159 w.r.t. finding reading material.
160 <a name="mailinglist"/>
161 <h2>Getting help - mailing list</h2>
162 <P>The place to get help is the <a href="http://www.zylin.com/mailinglist.html">zylin-zpu mailing list</a>
165 The ZPU is an open source project and if you demonstrate that you have
166 made an effort to read the documentation and googled, then you will
167 normally get some help from this list if you ask clear questions.
169 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
172 <a name="architecture"/>
173 <h1>Architecture</h1>
174 The ZPU is a zero operand, or stack based CPU. The opcodes have a fixed width of 8 bits.
176 Example:
178 <div style="white-space:pre;background-color:#dddddd;">
179 <code style="white-space:pre;background-color:#dddddd;">
180 IM 5 ; push 5 onto the stack
181 LOADSP 20 ; push value at memory location SP+20
182 ADD ; pop 2 values on the stack and push the result
183 </code>
184 </div>
185 As can be seen, a lot of information is packed into the 8 bits, e.g. the IM instruction pushes a 7 bit signed integer onto the stack.
187 The choice of opcodes is intimately tied to the GCC toolchain capabilities.
189 <div style="white-space:pre;background-color:#dddddd;">
190 <code style="white-space:pre;background-color:#dddddd;">
191 /* simple program showing some interesting qualities of the ZPU toolchain */
192 void bar(int);
193 int j;
194 void foo(int a, int b, int c)
196 a++;
197 b+=a;
198 j=c;
199 bar(b);
202 foo:
203 loadsp 4 ; a is at memory location SP+4
204 im 1
206 loadsp 12 ; b is now at memory location SP+12
208 loadsp 16 ; c is now at memory location SP+16
209 im 24 ; «j» is at absolute memory location 24.
210 ; Notice how the ZPU toolchain is using link-time relaxation
211 ; to squeeze the address into a single no-op
212 store
213 im 22 ; the fn bar is at address 22
214 call
215 im 12
216 return ; 12 bytes of arguments + return from fn
217 </code>
218 </div>
220 <a name="instructionset"/>
221 <h2>Instruction set</h2>
222 <p>A base set of instructions must be implemented in RTL, but the rest may be implemented as RTL or as microcode. This allows a tradeoff of core size vs code size and performance.
223 <p>The instructions that may be implemented in RTL or microcode are referred to as emulated instructions. The microcode is in crt0.s. The <a href="#implementations">implementation</a> determines which instructions run as microcode.
224 <p>All operations are 32 bit wide.
225 <p>TODO Is the table broken? Fix it.
227 <table border="1">
228 <tr><td>Name</td><td>Opcode</td><td>Description</td><td>Definition</td></tr>
229 <tr>
230 <td>
231 BREAKPOINT
232 </td>
233 <td>
234 00000000
235 </td>
236 <td>
237 The debugger sets a memory location to this value to set a breakpoint. Once a JTAG-like
238 debugger interface is added, it will be convenient to be able to distinguish
239 between a breakpoint and an illegal(possibly emulated) instruction.
240 </td>
241 <td>
242 No effect on registers
243 </td>
244 </tr>
245 <tr>
246 <td>
248 </td>
249 <td>
250 1xxx xxxx
251 </td>
252 <td>
253 Pushes 7 bit sign extended integer and sets the a «instruction decode interrupt mask» flag(IDIM).
254 <p>
255 If the IDIM flag is already set, this instruction shifts the value on the stack left by 7 bits and stores the 7 bit immediate value into the lower 7 bits.
256 <p>
257 Unless an instruction is listed as treating the IDIM flag specially, it should be assumed to clear the IDIM flag.
258 <p>
259 To push a 14 bit integer onto the stack, use two consecutive IM instructions.
260 <p>
261 If multiple immediate integers are to be pushed onto the stack, they must be interleaved with another instruction, typically NOP.
262 </td>
263 <td>
264 <code style="white-space:pre;">
265 pc <= pc + 1 <br>
266 idim <= 1 <br>
267 if (idim=0) then <br>
268 sp <= sp - 1; <br>
269 for i in wordSize-1 downto 7 loop <br>
270 mem(sp)(i) <= opcode(6) <br>
271 end loop <br>
272 mem(sp)(6 downto 0) <= opcode(6 downto 0) <br>
273 else <br>
274 mem(sp)(wordSize-1 downto 7) <= mem(sp)(wordSize-8 downto 0) <br>
275 mem(sp)(6 downto 0) <= opcode(6 downto 0) <br>
276 end if
277 </code>
279 </td>
280 </tr>
281 <tr>
282 <td>
283 STORESP
284 </td>
285 <td>
286 010x xxxx
287 </td>
288 <td>
289 Pop value off stack and store it in the SP+xxxxx*4 memory location, where xxxxx is a positive integer.
290 </td>
291 <td>
292 </td>
293 </tr>
294 <tr>
295 <td>
296 LOADSP
297 </td>
298 <td>
299 011x xxxx
300 </td>
301 <td>
302 Push value of memory location SP+xxxxx*4, where xxxxx is a positive integer, onto stack.
303 </td>
304 <td>
306 </td>
307 </tr>
308 <tr>
309 <td>
310 ADDSP
311 </td>
312 <td>
313 0001 xxxx
314 </td>
315 <td>
316 Add value of memory location SP+xxxx*4 to value on top of stack.
317 </td>
318 <td>
320 </td>
321 </tr>
322 <tr>
323 <td>
324 EMULATE
325 </td>
326 <td>
327 001x xxxx
328 </td>
329 <td>
330 Push PC to stack and set PC to 0x0+xxxxx*32. This is used to emulate opcodes. See
331 zpupgk.vhd for list of emulate opcode values used. zpu_core.vhd contains
332 reference implementations of these instructions rather than letting the ZPU execute the EMULATE instruction
334 One way to improve performance of the ZPU is to implement some of
335 the EMULATE instructions.
337 </td>
338 <td>
340 </td>
341 </tr>
342 <tr>
343 <td>
344 PUSHPC
345 </td>
346 <td>
347 emulated
348 </td>
349 <td>
350 Pushes program counter onto the stack.
351 </td>
352 <td>
354 </td>
355 </tr>
356 <tr>
357 <td>
358 POPPC
359 </td>
360 <td>
361 0000 0100
362 </td>
363 <td>
364 Pops address off stack and sets PC
365 </td>
366 <td>
368 </td>
369 </tr>
370 <tr>
371 <td>
372 LOAD
373 </td>
374 <td>
375 0000 1000
376 </td>
377 <td>
378 Pops address stored on stack and loads the value of that address onto stack.
380 Bit 0 and 1 of address are always treated as 0(i.e. ignored) by
381 the HDL implementations and C code is guaranteed by the programming
382 model never to use 32 bit LOAD on non-32 bit aligned addresses(i.e.
383 if a program does this, then it has a bug).
384 </td>
385 <td>
387 </td>
388 </tr>
389 <tr>
390 <td>
391 STORE
392 </td>
393 <td>
394 0000 1100
395 </td>
396 <td>
397 Pops address, then value from stack and stores the value into the memory location of the address.
399 Bit 0 and 1 of address are always treated as 0
400 </td>
401 <td>
403 </td>
404 </tr>
405 <tr>
406 <td>
407 PUSHSP
408 </td>
409 <td>
410 0000 0010
411 </td>
412 <td>
413 Pushes stack pointer.
414 </td>
415 <td>
417 </td>
418 </tr>
419 <tr>
420 <td>
421 POPSP
422 </td>
423 <td>
424 0000 1101
425 </td>
426 <td>
427 Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads.
428 </td>
429 <td>
431 </td>
432 </tr>
433 <tr>
434 <td>
436 </td>
437 <td>
438 0000 0101
439 </td>
440 <td>
441 Pops two values on stack adds them and pushes the result
442 </td>
443 <td>
445 </td>
446 </tr>
447 <tr>
448 <td>
450 </td>
451 <td>
452 0000 0110
453 </td>
454 <td>
455 Pops two values off the stack and does a bitwise-and & pushes the result onto the stack
456 </td>
457 <td>
459 </td>
460 </tr>
461 <tr>
462 <td>
464 </td>
465 <td>
466 0000 0111
467 </td>
468 <td>
469 Pops two integers, does a bitwise or and pushes result
470 </td>
471 <td>
473 </td>
474 </tr>
475 <tr>
476 <td>
478 </td>
479 <td>
480 0000 1001
481 </td>
482 <td>
483 Bitwise inverse of value on stack
485 </td>
486 <td>
488 </td>
489 </tr>
490 <tr>
491 <td>
492 FLIP
493 </td>
494 <td>
495 0000 1010
496 </td>
497 <td>
498 Reverses the bit order of the value on the stack, i.e. abc->cba, 100->001, 110->011, etc.
500 The raison d'etre for this instruction is mainly to emulate other instructions.
501 </td>
502 <td>
504 </td>
505 </tr>
506 <tr>
507 <td>
509 </td>
510 <td>
511 0000 1011
512 </td>
513 <td>
514 No operation, clears IDIM flag as side effect, i.e. used between two
515 consecutive IM instructions to push two values onto the stack.
516 </td>
517 <td>
519 </td>
520 </tr>
521 <tr>
522 <td>
523 PUSHSPADD
524 </td>
525 <td>
527 </td>
528 <td>
529 a=sp; <br>
530 b=popIntStack()*4;<br>
531 pushIntStack(a+b);<br>
532 </td>
533 <td>
535 </td>
536 </tr>
538 <tr>
539 <td>
540 POPPCREL
541 </td>
542 <td>
544 </td>
545 <td>
546 setPc(popIntStack()+getPc());
547 </td>
548 <td>
550 </td>
551 </tr>
552 <tr>
553 <td>
555 </td>
556 <td>
558 </td>
559 <td>
560 int a=popIntStack();<br>
561 int b=popIntStack();<br>
562 pushIntStack(b-a);<br>
563 </td>
564 <td>
566 </td>
567 </tr>
568 <tr>
569 <td>
571 </td>
572 <td>
574 </td>
575 <td>
576 pushIntStack(popIntStack() ^ popIntStack());
577 </td>
578 <td>
580 </td>
581 </tr>
582 <tr>
583 <td>
584 LOADB
585 </td>
586 <td>
588 </td>
589 <td>
590 8 bit load instruction. Really only here for compatibility with
591 C programming model. Also it has a big impact on DMIPS test.
593 pushIntStack(cpuReadByte(popIntStack())&0xff);
594 </td>
595 <td>
597 </td>
598 </tr>
599 <tr>
600 <td>
601 STOREB
602 </td>
603 <td>
605 </td>
606 <td>
607 8 bit store instruction. Really only here for compatibility with
608 C programming model. Also it has a big impact on DMIPS test.
610 addr = popIntStack();<br>
611 val = popIntStack();<br>
612 cpuWriteByte(addr, val);
613 </td>
614 <td>
616 </td>
617 </tr>
618 <tr>
619 <td>
620 LOADH
621 </td>
622 <td>
624 </td>
625 <td>
627 16 bit load instruction. Really only here for compatibility with
628 C programming model.
631 pushIntStack(cpuReadWord(popIntStack()));
632 </td>
633 <td>
635 </td>
636 </tr>
637 <tr>
638 <td>
639 STOREH
640 </td>
641 <td>
643 </td>
644 <td>
645 16 bit store instruction. Really only here for compatibility with
646 C programming model.
648 addr = popIntStack();<br>
649 val = popIntStack();<br>
650 cpuWriteWord(addr, val);
651 </td>
652 <td>
654 </td>
655 </tr>
656 <tr>
657 <td>
658 LESSTHAN
659 </td>
660 <td>
662 </td>
663 <td>
664 Signed comparison<br>
665 a = popIntStack();<br>
666 b = popIntStack();<br>
667 pushIntStack((a < b) ? 1 : 0);<br>
668 </td>
669 <td>
671 </td>
672 </tr>
673 <tr>
674 <td>
675 LESSTHANOREQUAL
676 </td>
677 <td>
679 </td>
680 <td>
681 Signed comparison<br>
682 a = popIntStack();<br>
683 b = popIntStack();<br>
684 pushIntStack((a <= b) ? 1 : 0);
685 </td>
686 <td>
688 </td>
689 </tr>
690 <tr>
691 <td>
692 ULESSTHAN
693 </td>
694 <td>
696 </td>
697 <td>
698 Unsigned comparison<br>
699 long a;//long is here 64 bit signed integer<br>
700 long b;<br>
701 a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff<br>
702 b = ((long) popIntStack()) & INTMASK;<br>
703 pushIntStack((a < b) ? 1 : 0);
704 </td>
705 <td>
707 </td>
708 </tr>
709 <tr>
710 <td>
711 ULESSTHANOREQUAL
712 </td>
713 <td>
715 </td>
716 <td>
717 Unsigned comparison<br>
718 long a;//long is here 64 bit signed integer<br>
719 long b;<br>
720 a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff<br>
721 b = ((long) popIntStack()) & INTMASK;<br>
722 pushIntStack((a <= b) ? 1 : 0);
723 </td>
724 <td>
726 </td>
727 </tr>
728 <tr>
729 <td>
730 EQBRANCH
731 </td>
732 <td>
734 </td>
735 <td>
736 int compare;<br>
737 int target;<br>
738 target = popIntStack() + pc;<br>
739 compare = popIntStack();<br>
740 if (compare == 0)<br>
741 {<br>
742 setPc(target);<br>
743 } else<br>
744 {<br>
745 setPc(pc + 1);<br>
747 </td>
748 <td>
750 </td>
751 </tr>
752 <tr>
753 <td>
754 NEQBRANCH
755 </td>
756 <td>
758 </td>
759 <td>
760 int compare;<br>
761 int target;<br>
762 target = popIntStack() + pc;<br>
763 compare = popIntStack();<br>
764 if (compare != 0)<br>
765 {<br>
766 setPc(target);<br>
767 } else<br>
768 {<br>
769 setPc(pc + 1);<br>
770 }<br>
771 </td>
772 <td>
774 </td>
775 </tr>
776 <tr>
777 <td>
778 MULT
779 </td>
780 <td>
782 </td>
783 <td>
784 Signed 32 bit multiply <br>
785 pushIntStack(popIntStack() * popIntStack());
786 </td>
787 <td>
789 </td>
790 </tr>
791 <tr>
792 <td>
794 </td>
795 <td>
797 </td>
798 <td>
799 Signed 32 bit integer divide.<br>
800 a = popIntStack();<br>
801 b = popIntStack();<br>
802 if (b == 0)<br>
803 {<br>
804 // undefined<br>
806 pushIntStack(a / b);<br>
807 </td>
808 <td>
810 </td>
811 </tr>
812 <tr>
813 <td>
815 </td>
816 <td>
818 </td>
819 <td>
820 Signed 32 bit integer modulo.<br>
821 a = popIntStack(); <br>
822 b = popIntStack();<br>
823 if (b == 0)<br>
824 {<br>
825 // undefined <br>
826 }<br>
827 pushIntStack(a % b); <br>
828 </td>
829 <td>
831 </td>
832 </tr>
833 <tr>
834 <td>
835 LSHIFTRIGHT
836 </td>
837 <td>
839 </td>
840 <td>
841 unsigned shift right.<br>
842 long shift;<br>
843 long valX;<br>
844 int t;<br>
845 shift = ((long) popIntStack()) & INTMASK;<br>
846 valX = ((long) popIntStack()) & INTMASK;<br>
847 t = (int) (valX >> (shift & 0x3f));<br>
848 pushIntStack(t);<br>
849 </td>
850 <td>
852 </td>
853 </tr>
854 <tr>
855 <td>
856 ASHIFTLEFT
857 </td>
858 <td>
860 </td>
861 <td>
862 arithmetic(signed) shift left.<br>
864 long shift;<br>
865 long valX;<br>
866 shift = ((long) popIntStack()) & INTMASK;<br>
867 valX = ((long) popIntStack()) & INTMASK;<br>
868 int t = (int) (valX << (shift & 0x3f));<br>
869 pushIntStack(t);<br>
870 </td>
871 <td>
873 </td>
874 </tr>
875 <tr>
876 <td>
877 ASHIFTRIGHT
878 </td>
879 <td>
881 </td>
882 <td>
883 arithmetic(signed) shift left.<br>
884 long shift;<br>
885 int valX;<br>
886 shift = ((long) popIntStack()) & INTMASK;<br>
887 valX = popIntStack();<br>
888 int t = valX >> (shift & 0x3f);<br>
889 pushIntStack(t);<br>
891 </td>
892 <td>
894 </td>
895 </tr>
897 <tr>
898 <td>
899 CALL
900 </td>
901 <td>
903 </td>
904 <td>
905 call procedure.<br>
906 <br>
907 int address = pop();<br>
908 push(pc + 1);<br>
909 setPc(address); <br>
910 </td>
911 <td>
913 </td>
914 </tr>
915 <tr>
916 <td>
917 CALLPCREL
918 </td>
919 <td>
921 </td>
922 <td>
923 call procedure pc relative<br>
924 <br>
925 int address = pop();<br>
926 push(pc + 1);<br>
927 setPc(address+pc); </td>
928 <td>
930 </td>
931 </tr>
934 <tr>
935 <td>
937 </td>
938 <td>
940 </td>
941 <td>
942 pushIntStack((popIntStack() == popIntStack()) ? 1 : 0); <td>
944 </td>
945 </tr>
946 <tr>
947 <td>
949 </td>
950 <td>
952 </td>
953 <td>
954 pushIntStack((popIntStack() != popIntStack()) ? 1 : 0); <td>
956 </td>
957 </tr>
958 <tr>
959 <td>
961 </td>
962 <td>
964 </td>
965 <td>
966 pushIntStack(-popIntStack());<td>
968 </td>
969 </tr>
972 </table>
974 <a name="interrupts"/>
975 <h2>Interrupts</h2>
976 The ZPU supports interrupts.
978 To trigger an interrupt, the interrupt signal must be asserted. The ZPU does
979 not define any interrupt disabling mechanism, this must be implemented by the
980 interrupt controller and controlled via memory mapped IO.
982 Interrupts are masked when the IDIM flag is set, i.e.
983 with consecutive IM instructions.
985 The ZPU has an edge triggered interrupt. As the ZPU notices that the interrupt
986 is asserted, it will execute the interrupt instruction. The interrupt signal
987 must stay asserted until the ZPU acknowledges it.
989 When the interrupt instruction is executed, the PC will be pushed onto the
990 stack and the PC will be set to the interrupt vector address (0x20).
992 Note that the GCC compiler requires three registers r0,r1,r2,r3 for some
993 rather uncommon operations. These 32 registers are mapped to memory locations 0x0,
994 0x4, 0x8, 0xc. The default interrupt vector at address 0x20 will load the
995 value of these memory locations onto the stack, call _zpu_interrupt and
996 restore them.
998 See <a href="../hdl/zpu4/test/interrupt/">zpu/hdl/zpu4/test/interrupt/</a> for C code and <a href ="../hdl/example/simzpu_interrupt.do">zpu/hdl/example/simzpu_interrupt.do</a>
999 for simulation example.
1001 <a name="startup"/>
1002 <h2>Custom startup code (aka crt0.s)</h2>
1003 To minimize the size of an application, one important trick is to
1004 strip down the startup code. The startup code contains microcode for emulation
1005 of instructions that may never be used by a particular application, or are made redundant because the instructions are implemented in RTL.
1007 The startup code is found in the GCC source code under gcc/libgloss/zpu,
1008 but to make the startup code more available, it has been duplicated
1009 into <a href="../sw/startup">zpu/sw/startup</a>
1011 On the <a href="#todo">TODO</a> list is work to make it easier to reduce code size.
1013 TODO is the following actually useful? if not remove or elaborate.
1015 To minimize startup size, see <a href="../roadshow/roadshow/codesize/">codesize</a>
1016 demo. This is pretty standard GCC stuff and simple enough once you've
1017 been over it a couple of times.
1020 <a name="vectors"/>
1021 <h3>Vectors</h3>
1022 <table border="1">
1023 <tr><td>Address</td><td>Name</td><td>Description</td></tr>
1024 <tr>
1025 <td>0x000</td>
1026 <td>Reset</td>
1027 <td>
1028 1.When the ZPU boots, this is the first instruction to be executed.
1029 <br>
1030 2.The stack pointer is initialised to maximum RAM address
1031 </td>
1032 </tr>
1033 <tr>
1034 <td>0x020</td>
1035 <td>Interrupt</td>
1036 <td>
1037 This is the entry point for interrupts.
1038 </td>
1039 </tr>
1040 <tr>
1041 <td>0x040-</td>
1042 <td>Emulated instructions</td>
1043 <td>
1044 Emulated opcode 34. Note that opcode 32 and opcode 33 are not normally used to emulate instructions as these memory addresses are already used by boot vector, GCC registers and the interrupt vector.
1045 </td>
1046 </tr>
1047 </table>
1049 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
1051 <a name="implementations"/>
1052 <h1>Core Implementations</h1>
1053 zpu4 (superseding zpu3) are original work by &Oslash;yvind Harboe. All other implementations derive from zpu4.
1055 High on the <a href="#todo">TODO</a> list is to reduce the number of implementations taking the best from all. For example interrupts are not universally implemented, IO naming is inconsistent and memory architectures differ.
1057 Ultimately we should try to get closer to the opencores coding standard. You can find the document in the opencores cvsroot/common.
1059 For now if you are starting a design, zpu4 or zealot are probably the safest. zealot offers more customization through generics, but lacks interrupts. zpu4 gets more attention. Take your pick.
1061 <a name="performance"/>
1062 <h2>Performance Summary</h2>
1064 <a href="#todo">TODO</a> fill in performance table for Altera and Lattice.
1066 Tests are done with the <a href="#zealot">Zealot</a>
1067 SoC-System and Xilinx ISE 12.2 with standard settings.
1069 <TABLE WIDTH=604 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=7 CELLSPACING=0 STYLE="page-break-after: avoid">
1070 <TR VALIGN=TOP>
1071 <TD WIDTH=85> <P><B>CORE/Config</B></P> </TD>
1072 <TD WIDTH=85> <P><B>Spartan-3</B></P> </TD>
1073 <TD WIDTH=85> <P><B>Spartan-3E</B></P> </TD>
1074 <TD WIDTH=85> <P><B>Spartan-6</B></P> </TD>
1075 <TD WIDTH=85> <P><B>Virtex-5</B></P> </TD>
1076 <TD WIDTH=85> <P><B>Cyclone-3</B></P> </TD>
1077 <TD WIDTH=85> <P><B>DMIPS</B></P> </TD>
1078 </TR>
1080 <TR VALIGN=TOP>
1081 <TD WIDTH=85> <P>
1082 zpu4 small
1083 maxAddrBit=16
1084 </P> </TD>
1085 <TD WIDTH=85> <PRE>
1086 <!-- Spartan-3 -->
1087 591 LUT
1088 389 REG
1089 0 MULT18x18
1090 16 BRAM
1091 90 fmax
1092 </PRE> </TD>
1093 <TD WIDTH=85> <PRE>
1094 <!-- Spartan-3E -->
1095 626 LUT
1096 389 REG
1097 0 MULT18x18
1098 16 BRAM
1099 100 fmax
1100 </PRE> </TD>
1101 <TD WIDTH=85> <PRE>
1102 <!-- Spartan-6 -->
1103 639 LUT
1104 372 REG
1105 0 MULT18x18
1106 16 BRAM
1107 100 fmax
1108 </PRE> </TD>
1109 <TD WIDTH=85> <PRE>
1110 <!-- Virtex-5 -->
1111 561 LUT
1112 391 REG
1113 0 MULT18x18
1114 8 BRAM (RAMB36)
1115 175 fmax
1116 </PRE> </TD>
1117 <TD WIDTH=85> <PRE>
1118 <!-- Cyclone -->
1119 ? LUT
1120 ? REG
1121 ? MULT18x18
1122 ? M4K
1123 ? fmax
1124 </PRE> </TD>
1125 <TD WIDTH=85> <!-- DMIPS --> <P>0.5</P> </TD>
1126 </TR>
1128 <TR VALIGN=TOP> <TD WIDTH=85> <P>zpu4 medium</P> </TD>
1129 <TD WIDTH=85> <PRE>
1130 <!-- Spartan-3 -->
1131 1760 LUT
1132 514 REG
1133 3 MULT18x18
1134 16 BRAM (RAMB16)
1135 75 fmax
1136 </PRE> </TD>
1137 <TD WIDTH=85> <PRE>
1138 <!-- Spartan-3E -->
1139 1754 LUT
1140 509 REG
1141 3 MULT18x18
1142 16 BRAM (RAMB16)
1143 75 fmax
1144 </PRE> </TD>
1145 <TD WIDTH=85> <PRE>
1146 <!-- Spartan-6 -->
1147 1162 LUT
1148 481 REG
1149 3 MULT (DSP48A1)
1150 16 BRAM (RAMB16)
1151 80 fmax
1152 </PRE> </TD>
1153 <TD WIDTH=85> <PRE>
1154 <!-- Virtex-5 -->
1155 1299 LUT
1156 490 REG
1157 3 MULT (DSP48E)
1158 8 BRAM (RAMB36)
1159 125 fmax
1160 </PRE> </TD>
1161 <TD WIDTH=85> <PRE>
1162 <!-- Cyclone -->
1163 ? LUT
1164 ? REG
1165 ? MULT18x18
1166 ? M4K
1167 ? fmax
1168 </PRE> </TD>
1169 <TD WIDTH=85><!-- DMIPS --><P>2.6</P> </TD>
1170 </TR>
1172 </TABLE>
1174 <a name="zpu4_small"/>
1175 <h2>zpu4 small</h2>
1176 Found in <a href="../hdl/zpu4/core/zpu_core_small.vhd">zpu/zpu/hdl/zpu4/core/zpu_core_small.vhd</a>
1178 The small ZPU4 implements the minimum instruction set. It is optimized for size and simplicity
1179 serving as a reference in both regards.
1181 It uses a RAM (dual port RAM w/read/write to both ports) as data & code storage and
1182 is implemented as a simple state machine.
1184 Essentially it has three states:
1185 <ol>
1186 <li>Fetch - starts fetch of next instruction
1187 <li>FetchNext - sets up operands for execute cycle
1188 <li>Decode - decodes instruction
1189 <li>Execute - well.. executes instruction
1190 </ol>
1191 The tricky bit is that there is a tiny bit of interleaving of
1192 states since the BRAM takes a cycle to perform a fetch/store. The above is the
1193 normal states the ZPU cycles through unless memory fetch, jumps, etc. take
1194 place.
1196 <a name="zpu4_medium"/>
1197 <h2>zpu4 medium</h2>
1198 Found in <a href="../hdl/zpu4/core/zpu_core.vhd">zpu/zpu/hdl/zpu4/core/zpu_core.vhd</a>
1200 The medium ZPU4 has a single port memory interface. All data, code and IO is
1201 accessed through this memory interface.
1203 It performs better(despite having less memory bandwidth than zpu_core_small.vhd)
1204 since it implements many more instructions.
1206 <a name="alzpu_pipe"/>
1207 <h2>Alvaro's pipelined ZPU</h2>
1208 All the rave in the mailing list. TBA.
1210 <a name="zealot"/>
1211 <h2>Zealot</h2>
1212 Small found in <a href="../hdl/zealot/zpu_small.vhdl">zpu/zpu/hdl/zealot/zpu_small.vhdl</a>
1214 Medium found in <a href="../hdl/zealot/zpu_medium.vhdl">zpu/zpu/hdl/zealot/zpu_medium.vhdl</a>
1216 README found in <a href="../hdl/zealot/0README.txt">zpu/zpu/hdl/zealot/0README.txt</a>
1218 The Zealot version of ZPU was contributed by Salvador E. Tropea.
1220 The key features are:
1223 <ul>
1224 <li>Includes a very basic <a href="#memorymap">PHI I/O</a> synthesizable core.
1225 It implements the 64 bits clocks counter (timer), GPIO and the UART. This is enough
1226 to run the DMIPS benchmark and a hello world application. I tested the UART
1227 @ 9600 bps and @ 115200 bps.</li>
1228 <li>The ZPU can be customized using generics. It allows the use of more
1229 than one core in the same project without problems.</li>
1230 <li>Implements the lshiftright instruction in hardware, this gives around
1231 10% boost in the DMIPS benchmark (Medium version).</li>
1232 <li>You can disable various instructions groups and let them to the
1233 emulation soft, so you can experiment with various LUTs vs DMIPS
1234 configurations (Medium version).</li>
1235 <li>The medium version provides aprox. 2.6 DMIPS @ 50 MHz and the small
1236 0.5 DMIPS @ 50 MHz.</li>
1237 <li>Enhanced trace module, it includes the assembler for the executed
1238 instruction and can also measure how much stack was consumed during the
1239 execution.</li>
1240 <li>Includes ready to use memory images for a hello world program and the
1241 DMIPS benchmark.</li>
1242 <li>Memory and trace blocks outside ZPU. This provides better modularity.</li>
1243 <li>Much better documented code than the original version.</li>
1244 </ul>
1246 Simulation and implementation files are provided. You need 16 kB of BRAMs
1247 for the "hello world" example and 32 kB for the DMIPS benchmark. The medium
1248 version takes around 1030 slices and 3 multipliers and the small version
1249 around 430 slices.<p>
1251 The generics for the Zealot Medium ZPU are:<p>
1253 <ul>
1254 <li><b>WORD_SIZE</b> (integer:=32) Data width, only 32 bits are really
1255 tested/supported. Adding support for 16 bits should be simple, but the
1256 toolchain needs to support it.</li>
1257 <li><b>ADDR_W</b> (integer:=16) Address bus width memory+I/O space. The MSB
1258 selects the address space (1=I/O).</li>
1259 <li><b>MEM_W</b> (integer:=15) Memory address bus width. It includes program,
1260 data and stack sections.</li>
1261 <li><b>D_CARE_VAL</b> (std_logic:='X') Value used to fill the unsused bits.
1262 For simulations this should be '0', for synthesis this is a value that your
1263 tools interprets as "don't care". Xilinx tools could get benefit from using
1264 'X'. This is particularly true to assign default values and for unreached
1265 cases. Note that I didn't find it useful.</li>
1266 <li><b>MULT_PIPE</b> (boolean:=false) Enables the multiplication pipeline.
1267 This can allow faster clocks but will make the mult instruction slower (more
1268 clocks consumed).</li>
1269 <li><b>BINOP_PIPE</b> (integer range 0 to 2:=0) Enables the pipeline for
1270 the -, =, &lt; and &lt;= operations. This can allow faster clocks but will
1271 make these instruction slower (more clocks consumed). This value is the
1272 amount of extra clocks added.</li>
1273 <li><b>ENA_LEVEL0</b> (boolean:=true) Enables the hardware implementation of
1274 eq, neqbranch, loadb and pushspadd instructions.</li>
1275 <li><b>ENA_LEVEL1</b> (boolean:=true) Enables the hardware implementation of
1276 lessthan, ulessthan, mult, storeb, callpcrel and sub instructions.</li>
1277 <li><b>ENA_LEVEL2</b> (boolean:=false) Enables the hardware implementation of
1278 lessthanorequal, ulessthanorequal, call and poppcrel instructions.</li>
1279 <li><b>ENA_LSHR</b> (boolean:=true) Enables the hardware implementation of
1280 lshiftright instruction.</li>
1281 <li><b>ENA_IDLE</b> (boolean:=false) Enables the enable_i usage. This signal
1282 can hold the CPU in an idle state if after reset this signal remains active.
1283 When disabled the enable_i signal isn't used and the idle state is removed.</li>
1284 <li><b>FAST_FETCH</b> (boolean:=true) This version of the ZPU fetches 4
1285 instructions at ones (32 bits), then they are decoded (2 cycles) and finally
1286 executed. The decoded instructions are stored in a "decode cache", the first
1287 instruction is immediately moved to the "current instruction" register and a
1288 "special instruction" replaces the first slot. This "special instruction"
1289 makes the CPU go to the fetch state. When you enable this generic the FSM
1290 does the fetch instead of waiting one clock cycle to go to the fetch state.
1291 This makes instructions run a little bit faster, but it can cost area and/or
1292 frequency.</li>
1293 </ul>
1296 <a name="zy2000"/>
1297 <h2>ZY2000</h2>
1298 Found in <a href="../hdl/zy2000/zpu_core.vhd">zpu/zpu/hdl/zy2000/zpu_core.vhd</a>
1299 Modified version of zpu4 medium for use with a wishbone bridge.
1301 The ZY2000 is a complete implementation including: ZPU, DRAM, soft-MAC, wishbone bridges, GPIO subsystem, etc. This also included an eCos HAL w/TCP/IP support.
1303 <a name="verilogwip"/>
1304 <h2>Verilog translation</h2>
1305 Found in <a href="../../wip/ZPU_CORE/src/zpu_core.v">zpu/wip/ZPU_CORE/src/zpu_core.v</a>
1307 The verilog version of ZPU (zpu4) was contributed by Jurij Kostasenko. No-one appears to be maintaining it, but it should be a useful starting point for further work. There are some useful scripts there.
1309 <a name="implementing"/>
1310 <h2>Implementing your own ZPU</h2>
1311 One of the neat things about the ZPU is that the instruction set and architecture
1312 is very small and it is easy to implement a ZPU from scratch or modify the
1313 existing ZPU implementations.
1315 Implementing a ZPU can be done without understanding the toolchain in
1316 detail, i.e. using exclusively HDL skills and only a rudimentary
1317 understanding of standard GCC/GDB usage is sufficient.
1319 A few tips:
1320 <ul>
1321 <li>Run zpu_core.vhd or zpu_core_small.vhd and generate an instruction trace
1322 from ModelSim or similar. To check that you own implementation is correctly
1323 implemented, verify that the instruction trace for the new and old
1324 ZPU implementations match. This gives you a simple way to do regression
1325 tests as you develop your ZPU.
1326 <li>To improve performance, you can add more instructions. The EMULATE instructions
1327 are optional in HDL since they will be emulated in software if they are not
1328 implemented in HDL. This allows you to run the ZPU executables unmodified
1329 regardless of which EMULATE instructions you implement.
1330 <li>Run the DMIPS test to measure your overall performance
1331 <li>Run the histogram.perl script on the instruction trace to generate
1332 histograms of the instructions. Profiling is essential to making
1333 the right choices w.r.t. optimization for your application.
1334 </ul>
1336 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
1339 <a name="refdesign"/>
1340 <h1>Reference Designs</h1>
1341 The zpu core is independent of IO and memory architecture. Here are three levels of reference designs a user can refer to in order to get started in their own design, regardless of chosen core.
1343 TODO converge on a single IO structure for core implementations.
1345 TODO re-org CVS to make it easy to keep appropriate SW, RTL(verilog and VHDL) , scripts, verification stuff together.
1348 <a name="ref_min"/>
1349 <h2>Minimal (core+RAM)</h2>
1350 The minimum design is a zpu core with true dual port RAMs attached. This is handy for size/fmax trial in a particular FPGA, and maybe HDL regression. Maybe not a very useful starting point, unless you can DMA all you IO.
1352 TODO provide FPGA scripts.
1354 TODO provide HDL regression environment.
1356 <a name="ref_basic"/>
1357 <h2>Basic (core+RAM+UART+Timer)</h2>
1358 The minimum design required for hello_world and DMIPS applications. Requires more RAM and a UART (or something) for stdio. This is handy as a starting point for a new users design, and to run DMIPS evaluation, and maybe HDL regression.
1360 TODO provide FPGA scripts.
1362 TODO provide HDL regression environment.
1364 <a name="ref_soc"/>
1365 <h2>SOC (core+RAM+Wishbone+++)</h2>
1366 Large design(s) for one or more chosen eval board. Features dictated by board and available IP.
1368 <a name="rams"/>
1369 <h2>Common - RAM models</h2>
1370 single (1RW), simple dual(1R+1W), true dual(1RW+1RW), and xilinx distributed dual(1RW+1R) RAM models. Parameterized depth / width, and loadable from file. The goal is that ROM be independent of verilog/VHDL implementation of RAM.
1372 TODO RAM model contribution needed. What is in opencore/common is not adequate.
1374 <a name="wishbone"/>
1375 <h2>Common - Wishbone</h2>
1376 In <a href="../hdl/wishbone" target="_blank">hdl/wishbone</a> there is an implementation
1377 of a wishbone bridge. It was designed to work with <a href="#zy2000">ZY2000</a>
1379 TODO make wishbone bridge re-usable with all cores
1381 <a name="uart"/>
1382 <h2>Common - UART</h2>
1384 All self respecting embedded projects should have a debug channel
1385 to print stuff to. Typically this is a standard RS232 or UART, but
1386 it can also be something more exotic like a DCC JTAG channel.
1388 The point is that characters(bytes) are sent to/from the ZPU
1389 via some terminal.
1391 The ZPU defines in the memory map a UART / debug channel. This
1392 should be implemented by some suitable debug channel for
1393 the device in which the ZPU is implemented.
1395 www.opencores.org has several UART implementations. This is one
1396 of the simpler ones:
1398 <a href="http://www.opencores.org/projects.cgi/web/uart/overview">
1399 http://www.opencores.org/projects.cgi/web/uart/overview</a>
1400 <h3>Implementing your own UART / debug channel</h3>
1401 The first thing you need to do is to choose a debug channel for your
1402 hardware. This could be a UART, but it doesn't have to be.
1404 Secondly you should write a small HDL module that interface between
1405 the ZPU memory map of debug channel to the UART. This should
1406 be relatively simple as all you need to do is to let the ZPU
1407 query the FIFO in/out for busy flag and allow the ZPU to read/write
1408 data to the UART via the memory map.
1411 TODO explicit example with UART from opencores in the above ref designs.
1413 <!-- SPI controller -->
1414 <a name="spicontroller">
1415 <h2>SPI flash controller (read-only)</h2>
1416 This is a simple read-only SPI flash controller, with the following characteristics:
1418 <dl>
1419 <li>Fast-READ only implementation.
1420 <li>32-bit only access
1421 <li>Fast sequential read access - Uses low-clock approach</li>
1422 </dl>
1424 <h3>Version</h3>
1425 The current version is 1.2. This is also the first public version available.
1427 <h3>Timing overview</h3>
1429 <p>Simple timing overview, with one nonsequential access to address 0x0, followed by a sequential access to address 0x4.
1430 This simulation was done with Xilinx tools, after post-routing, and using a ZPU to access the SPI</p>
1431 <div>
1432 <img src="images/spi_timing_overview.png">
1433 </a>
1434 <p>Image 1: Timing overview</p>
1435 </div>
1437 On Image 2, you can see the clock almost perfectly centered on data, when we write to the SPI flash.
1439 <div>
1440 <img src="images/spi_readfast_timing.png">
1441 <p>Image 2: Issuing commands to the SPI</p>
1442 </div>
1444 As you can see from Image 3, I assume the worst-case read delay from SPI (which is 15ns, as you can see from the marker).
1446 <div>
1447 <img src="images/spi_read_timing.png">
1448 <p>Image 3: Reading from the SPI</p>
1449 </div>
1451 <h3>Usage</h3>
1453 Simple description of SPI controller interface:
1455 <table border="1">
1456 <tr>
1457 <th>Symbol</th>
1458 <th>Direction</th>
1459 <th>Bit width</th>
1460 <th>Purpose</th>
1461 </tr>
1462 <tr><td>adr</td><td>Input</td><td>24</td><td>Address where to read from SPI</td></tr>
1463 <tr><td>dat_o</td><td>Output</td><td>32</td><td>Data read from SPI</td></tr>
1464 <tr><td>clk</td><td>Input</td><td>1</td><td>Input clock. Used for both interface and SPI</td></tr>
1465 <tr><td>ce</td><td>Input</td><td>1</td><td>Chip Enable</td></tr>
1466 <tr><td>rst</td><td>Input</td><td>1</td><td>Asynchronous reset</td></tr>
1467 <tr><td>ack</td><td>Output</td><td>1</td><td>Data valid ACK</td></tr>
1468 <tr><td>SPI_CLK</td><td>Output</td><td>1</td><td>SPI output clock</td></tr>
1469 <tr><td>SPI_MOSI</td><td>Output</td><td>1</td><td>SPI output data from controller to chip</td></tr>
1470 <tr><td>SPI_MISO</td><td>Input</td><td>1</td><td>SPI input data from chip to controller</td></tr>
1471 <tr><td>SPI_SELN</td><td>Output</td><td>1</td><td>SPI nSEL (deselect, active low) signal</td></tr>
1472 </table>
1474 <h3>License</h3>
1475 The Verilog implementation is released under BSD license. See the file itself for more licensing details.
1477 <h3>Dowload</h3>
1478 Download the Verilog code here: <a href="/files/electronics/spi/spi_controller.v">spi_controller.v</a>
1480 <h3>Troubleshooting</h3>
1481 The current implementation is timed and optimized for myself. Your parameters might not be the same
1482 as those I defaulted, so read the code carefully. If you have any issue let me know.
1487 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
1489 <a name="tools"/>
1490 <h1>Working with the tools and core</h1>
1491 TODO discussion of tools needed and choose some to be supported by project. Need to deal with cygwin vs linux, VHDL vs verilog, open vs closed.... plus language support in simulators is sometimes lacking.
1493 Xilinx ISE webpack is available for windows and linux
1494 <br>
1495 Altera Quartus web edition is windows only.
1496 <br>
1497 Lattice ispLEVER starter edition is windows only.
1499 None appear to come with a standalone simulator anymore. Not sure if any built in simulators are worth looking at... never have been in the past.
1502 Popular Simulation tools for this kind of project: Modelsim, GHDL, veriwell, cver, icarus, gtkwave... others?
1505 <a name="setuplinux"/>
1506 <h2>Setup - Linux toolchain</h2>
1507 You will need Java installed to run the simulator and some other stuff.
1509 TODO setup.sh script needs to detect linux/cygwin, and should have install path option.
1510 <pre>
1511 $ cd zpu/zpu/sw # path as appropriate
1512 $ sh setup.sh # untars the tool chain to ... TODO
1513 $ . env.sh # puts the tools in you path
1514 </pre>
1516 <a name="setupcygwin"/>
1517 <h2>Setup - Cygwin toolchain</h2>
1518 Install <a href="http://www.cygwin.com">Cygwin</a>
1519 You will need Java installed to run the simulator and some other stuff.
1520 <pre>
1521 $ cd zpu/zpu/sw # path as appropriate
1522 $ sh setup.sh # unzips the tool chain to /tmp/zpu/install/bin
1523 $ . env.sh # puts the tools in you path
1524 </pre>
1526 <a name="gcc2ram"/>
1527 <h2>GCC to RAM</h2>
1528 TODO some of this is generic, some is zpu4 specific. Should move to refdesign section when ref designs exist.
1530 The instructions are stored big endian. That is the first instruction is stored in the most significant byte, and the forth is in the least significant byte.
1532 <h3>Generating VHDL BRAM initialization </h3>
1533 <pre>
1534 $ zpu-elf-objcopy -O binary hello.elf hello.bin
1535 $ java -classpath ../simulator/zpusim.jar com.zylin.zpu.simulator.tools.MakeRam hello.bin &gt;hello.bram
1536 </pre>
1537 <h3>Build another test application for example simulation</h3>
1538 Here is how to build a rom image for an application using the
1539 zpu/example simulation files.
1540 <pre>
1541 $ cd zpu/roadshow/roadshow/dhrystone
1542 $ sh build.sh
1543 $ cd zpu/hdl/example
1544 $ gcc zpuromgen.c
1545 $ ./a
1546 Usage: ./a binary_file
1547 $ ./a ../../roadshow/roadshow/dhrystone/dhrystone.bin >app.txt
1548 </pre>
1549 Copy and paste app.txt into helloworld.vhd.
1552 TODO need to merge following with above.
1555 The ZPU comes with a standard GCC toolchain and an instruction set simulator. This allows compiling, running & debugging simple test programs. The Simulator has
1556 some very basic peripherals defined: counter, timer interrupt and a debug output port.
1558 <h3>Hello world example</h3>
1559 The ZPU toolchain comes with newlib & libstdc++ support which means that many C/C++ programs can be compiled without modification.
1560 <p>
1561 <pre>
1562 $ cd zpu/sw/helloworld
1563 $ zpu-elf-gcc -Os -phi hello.c -o hello.elf -Wl,--relax -Wl,--gc-sections
1564 or ? TODO which one
1565 $ zpu-elf-gcc -phi hello.c -o hello.elf
1566 $ zpu-elf-size hello.elf
1567 </pre>
1570 <a name="hdlsim"/>
1571 <h2>HDL simulation (ZPU4)</h2>
1572 TODO some of this is generic, some is zpu4 specific. Should move to refdesign section when ref design exists.
1574 For new users you will also find scripts in the zealot area that may be useful.
1576 You'll find a working simulation script in hdl/example/simzpu_small.do and hdl/example_medium/simzpu_medium.do, which
1577 show simulation of the small(zpu_core_small.vhd) and medium sized ZPU(zpu_core.vhd). hdl/example/simzpu_interrupt.do
1578 shows use of interrupts.
1580 When implementing the ZPU, copy the following files and modify them to your needs:
1581 <ol>
1582 <li>hdl/example/zpu_config.vhd - set up RAM size here
1583 <li>hdl/example/helloworld.vhd - dual port BRAM implementation.
1584 </ol>
1585 Obviously you must also connect the ZPU to the rest of your IO subsystem. IO is memory mapped(read/write) in the ZPU.
1587 <h3>Running example simulation</h3>
1588 The hdl/example directory has a simulation written for Xilinx WebPack ModelSim. From the ModelSim command prompt:
1589 <ol>
1590 <li>cd c:/&lt;installfolder&gt;/hdl/example
1591 <li>do zpusim_small.do
1592 </ol>
1594 After running the hello world simulation (see zpusim.do), two files are written to the hdl/example directory:
1595 <ol>
1596 <li>log.txt - contains the "Hello world!" text written to the debug channel/simplified UART.
1597 <li>trace.txt - a trace file for the CPU. The instruction set simulator has the capability of taking
1598 this file as input in order to verify that the HDL implementation matches the instruction set simulator.
1599 When a mismatch is found, the GDB debugger will break. Very handy for debugging custom ZPU implementations.
1600 </ol>
1603 <a name="gdbsim"/>
1604 <h2>GDB simulation</h2>
1605 <ol>
1606 <li>cd zpu/sw/helloworld
1607 <li>Launch the simulator from a seperate bash shell:<p>
1608 java -classpath ../simulator/zpusim.jar -Xmx512m com.zylin.zpu.simulator.Phi 4444
1610 <img src="images/zpusim.PNG" border=0>
1611 <li>Launch GDB:<p>
1612 ../install/bin/zpu-elf-gdb hello.elf
1613 <li>Connect to target, load and run application:<p>
1614 <pre>
1615 (gdb) target remote localhost:4444<br>
1616 (gdb) load<br>
1617 (gdb) continue<br>
1618 </pre>
1620 <img src="images/gccgdb.PNG">
1622 </ol>
1625 <a name="simulator"/>
1626 <h1>Simulator</h1>
1627 <P>The ZPU simulator is integrated into the Zylin Embedded CDT plugin
1628 to ease debugging of ZPU applications:</P>
1629 <P><A HREF="http://www.zylin.com/embeddedcdt.html">http://www.zylin.com/embeddedcdt.html</A></P>
1630 <P>The ZPU simulator has many features besides debugging an
1631 application:</P>
1632 <UL>
1633 <LI><P STYLE="margin-bottom: 0in">taking output from simulation(e.g.
1634 ModelSim) and matching that against the Java simulator, thus making
1635 it much easier to debug HDL implementations and also getting real
1636 world timing information
1637 </P>
1638 <LI><P STYLE="margin-bottom: 0in">can generate gprof output
1639 </P>
1640 <LI><P>generate various statistics
1641 </P>
1642 </UL>
1643 <P>The plugin is still pretty rough around the edges, and needs to
1644 get GUI support for enabling the ModelSim trace input feature.</P>
1645 <P ALIGN=CENTER><IMG SRC="images/compile.PNG" NAME="graphics7" ALIGN=BOTTOM WIDTH=669 HEIGHT=302 BORDER=0><BR><I>Compiling
1646 ZPU application</I></P>
1647 <P ALIGN=CENTER><IMG SRC="images/simulator.PNG" NAME="graphics9" ALIGN=BOTTOM WIDTH=722 HEIGHT=583 BORDER=0><BR><I>Setting
1648 up the simulator</I></P>
1649 <P ALIGN=CENTER><IMG SRC="images/simulator2.PNG" NAME="graphics11" ALIGN=BOTTOM WIDTH=722 HEIGHT=583 BORDER=0><BR><I>Choosing
1650 ZPU executable</I></P>
1651 <P ALIGN=CENTER STYLE="margin-bottom: 0in"><IMG SRC="images/simulator3.PNG" NAME="graphics13" ALIGN=BOTTOM WIDTH=1100 HEIGHT=720 BORDER=0><BR><I>Debug
1652 session</I></P>
1653 <P STYLE="margin-bottom: 0in"><BR>
1654 </P>
1657 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
1659 <a name="misc"/>
1660 <h1>Misc</h1>
1661 TODO Stuff that could probably find a better home.
1663 <a name="tuning"/>
1664 <h2>Speeding up the ZPU</h2>
1665 There are two aspects of speeding up the ZPU: making it perform better
1666 for a particular application and toying around with the ZPU architecture.
1667 <h3>Performance tips</h3>
1668 <ol>
1669 <li>Profile. Create a small sample and run in a simulator that is as close
1670 to the real deployment as possible. zpu4/core/histogram.perl is a script
1671 that will tell you which instructions take the most time.
1672 <li> Using the profile output, decide on which emulated instructions that
1673 it makes sense to implement in HDL for your particular application. Modifying
1674 zpu_core_small.vhd is not particularly hard. Most instructions can be
1675 transliterated into zpu_core_small.vhd from zpu_core.vhd without too much
1676 problem.
1677 <li>The memory subsystem may well turn out to be where you should concentrate
1678 your efforts.
1679 </ol>
1680 <h3>Toying around with the architecture</h3>
1681 Again: profile 90% of the time and spend the remaining 10% tinkering
1682 with the architecture.
1683 <ul>
1684 <li>There is a DMIPS program you can use to measure the performance of
1685 the ZPU in lieu of profiling a real application. The latter is obviously
1686 a superior solution.
1687 <li>Again: use histogram.perl to figure out which instructions you should add
1688 in HDL.
1689 <li>Tinker a bit with Fmax to find the maximum speed rating for your design.
1690 <li>zpu_core_small.vhd should be ca. 1 DMIPS and zpu_core.vhd should yield
1691 about 5-10 DMIPS before adding instructions runs out of steam.
1692 </ul>
1693 If you need to get ca. 20-50 DMIPS out of the ZPU you will have to
1694 write a heavily pipelined architecture with caches(if you are running
1695 against DRAM). This is *tricky*, but some proof of concept work was
1696 done to show 20 DMIPS w/the ZPU(the actual result was discarded since
1697 it was not complete and contained fatal flaws).
1699 Achieving above 50-100 DMIPS with the current ZPU architecture is probably
1700 a non-starter and a more conventional RISC design makes more sense here.
1702 The unique advantages of the ZPU is size in terms of HDL & code size.
1706 <a name="codesize"/>
1707 <h2>Optimizing for code size</h2>
1708 The ZPU toolchain produces highly compact code.
1709 <ol>
1710 <li>Since the ZPU GCC toolchain supports standard ANSI C, it is easy to stumble across
1711 functionality that takes up a lot of space. E.g. the standard printf() function is a beast. Some compilers drop e.g. floating point support
1712 from the printf() function and thus boast a "smaller" printf() when in fact they have a non-standard printf(). newlib has a standard printf() function
1713 and an alternative iprintf() function that works only on integers.
1714 <li>The ZPU ships with default startup code that works across various configurations of the ZPU, so be warned that there is some overhead that will
1715 not occur in the final application(anywhere between 1-4kBytes).
1716 <li>Compilation and linker options matter. The ZPU benefits greatly from the "-Wl,--relax -Wl,--gc-sections" options which is not used by
1717 all architectures(e.g. GCC ARM does not implement/need -Wl,--relax).
1718 </ol>
1719 <h3>Small code example</h3>
1720 <code>
1721 zpu-elf-gcc -Os -abel smallstd.c -o smallstd.elf -Wl,--relax -Wl,--gc-sections<br>
1722 zpu-elf-size small.elf<br>
1723 <br>
1724 $ zpu-elf-size small.elf<br>
1725 text data bss dec hex filename<br>
1726 2845 952 36 3833 ef9 small.elf<br>
1727 <br>
1728 </code>
1730 <h3>Even smaller code example</h3>
1731 If the ZPU implements the optional instructions, the RAM overhead can be reduced significantly.
1733 <code>
1734 zpu-elf-gcc -Os -abel crt0_phi.S small.c -o small.elf -Wl,--relax -Wl,--gc-sections -nostdlib <br>
1735 zpu-elf-size small.elf<br>
1736 <br>
1737 $ zpu-elf-size small.elf<br>
1738 text data bss dec hex filename<br>
1739 56 8 0 64 40 small.elf<br>
1740 <br>
1741 </code>
1743 <a name="ecos"/>
1744 <h2>Installing eCos build tools</h2>
1745 <code>
1746 tar -xjvf ecossnapshot.tar.bz2<br>
1747 tar -xjvf repository.tar.bz2<br>
1748 tar -xjvf ecostools.tar.bz2<br>
1749 # run this every time you open the shell<br>
1750 export PATH=$PATH:`pwd`/ecos-install<br>
1751 export ECOS_REPOSITORY=`pwd`/ecos/packages:`pwd`/repository<br>
1752 </code>
1753 <h3>Compiling eCos tests</h3>
1754 <code>
1755 ecosconfig new phi default<br>
1756 ecosconfig tree<br>
1757 make<br>
1758 cd kernel/current<br>
1759 make tests<br>
1760 </code>
1762 <h2>Code size ZPU</h2>
1763 <pre>
1764 $ zpu-elf-size *
1765 text data bss dec hex filename
1766 15761 1504 12060 29325 728d bin_sem0
1767 16907 1512 14436 32855 8057 bin_sem1
1768 17105 1524 30032 48661 be15 bin_sem2
1769 17186 1512 14436 33134 816e bin_sem3
1770 18986 1500 12036 32522 7f0a clock0
1771 15812 1504 13236 30552 7758 clock1
1772 25095 1972 13224 40291 9d63 clockcnv
1773 16437 1500 13224 31161 79b9 clocktruth
1774 15762 1504 12060 29326 728e cnt_sem0
1775 17124 1512 14436 33072 8130 cnt_sem1
1776 35947 1564 22512 60023 ea77 dhrystone
1777 16428 1500 13228 31156 79b4 except1
1778 15751 1504 12052 29307 727b flag0
1779 19145 1512 15624 36281 8db9 flag1
1780 20053 1516 102908 124477 1e63d fptest
1781 15998 1496 12092 29586 7392 intr0
1782 16080 1496 12200 29776 7450 kalarm0
1783 15327 1496 12036 28859 70bb kcache1
1784 15549 1496 13224 30269 763d kcache2
1785 18291 1500 12260 32051 7d33 kclock0
1786 16231 1500 13232 30963 78f3 kclock1
1787 16572 1496 13228 31296 7a40 kexcept1
1788 15618 1496 12060 29174 71f6 kflag0
1789 19287 1500 15624 36411 8e3b kflag1
1790 16887 1516 15628 34031 84ef kill
1791 16186 1496 12128 29810 7472 kintr0
1792 19724 1504 14516 35744 8ba0 klock
1793 18283 1500 14592 34375 8647 kmbox1
1794 15539 1496 12064 29099 71ab kmutex0
1795 16524 1504 15664 33692 839c kmutex1
1796 18272 1712 20348 40332 9d8c kmutex3
1797 18682 1608 20352 40642 9ec2 kmutex4
1798 15619 1496 14412 31527 7b27 ksched1
1799 15567 1496 12060 29123 71c3 ksem0
1800 17063 1500 14436 32999 80e7 ksem1
1801 15504 1496 13228 30228 7614 kthread0
1802 16167 1496 14412 32075 7d4b kthread1
1803 18281 1512 14580 34373 8645 mbox1
1804 20611 1508 14940 37059 90c3 mqueue1
1805 15672 1504 12064 29240 7238 mutex0
1806 16678 1516 15664 33858 8442 mutex1
1807 17694 1508 16868 36070 8ce6 mutex2
1808 18203 1720 20344 40267 9d4b mutex3
1809 16352 1508 14428 32288 7e20 release
1810 15890 1500 14412 31802 7c3a sched1
1811 44196 1612 286332 332140 5116c stress_threads
1812 17891 1524 16864 36279 8db7 sync2
1813 16943 1512 15644 34099 8533 sync3
1814 15467 1496 13064 30027 754b thread0
1815 16134 1496 14420 32050 7d32 thread1
1816 17560 1512 15636 34708 8794 thread2
1817 16279 1500 24028 41807 a34f thread_gdb
1818 17051 1504 20376 38931 9813 timeslice
1819 17146 1504 21564 40214 9d16 timeslice2
1820 37313 1512 422380 461205 70995 tm_basic
1821 </pre>
1822 <h3>Code size ARM (non-thumb)</h3>
1823 Thumb does not compile out of the box w/AT91 EB40a for which this test was made.<p>
1824 <pre>
1825 $ arm-elf-size *
1826 text data bss dec hex filename
1827 25204 692 16976 42872 a778 bin_sem0
1828 26644 700 22096 49440 c120 bin_sem1
1829 26996 712 55584 83292 1455c bin_sem2
1830 27008 700 22100 49808 c290 bin_sem3
1831 28992 688 16944 46624 b620 clock0
1832 25456 692 19532 45680 b270 clock1
1833 34572 1160 19520 55252 d7d4 clockcnv
1834 26224 688 19508 46420 b554 clocktruth
1835 25204 692 16976 42872 a778 cnt_sem0
1836 26888 700 22108 49696 c220 cnt_sem1
1837 44180 752 27416 72348 11a9c dhrystone
1838 26088 688 19520 46296 b4d8 except1
1839 25236 692 16968 42896 a790 flag0
1840 29532 700 24668 54900 d674 flag1
1841 29508 704 109652 139864 22258 fptest
1842 25932 684 17016 43632 aa70 intr0
1843 25824 684 17112 43620 aa64 kalarm0
1844 24728 684 16956 42368 a580 kcache1
1845 25168 684 19512 45364 b134 kcache2
1846 28112 688 17168 45968 b390 kclock0
1847 25976 688 19524 46188 b46c kclock1
1848 26372 684 19512 46568 b5e8 kexcept1
1849 25140 684 16968 42792 a728 kflag0
1850 29824 688 24660 55172 d784 kflag1
1851 26896 704 24656 52256 cc20 kill
1852 26088 684 17028 43800 ab18 kintr0
1853 30812 692 22176 53680 d1b0 klock
1854 28504 688 22260 51452 c8fc kmbox1
1855 24984 684 16984 42652 a69c kmutex0
1856 26504 692 24704 51900 cabc kmutex1
1857 28792 900 34892 64584 fc48 kmutex3
1858 29264 796 34896 64956 fdbc kmutex4
1859 25240 684 22084 48008 bb88 ksched1
1860 25044 684 16968 42696 a6c8 ksem0
1861 26988 688 22100 49776 c270 ksem1
1862 25028 684 19512 45224 b0a8 kthread0
1863 25996 684 22080 48760 be78 kthread1
1864 28552 700 22252 51504 c930 mbox1
1865 31324 696 22612 54632 d568 mqueue1
1866 25108 692 16980 42780 a71c mutex0
1867 26464 704 24700 51868 ca9c mutex1
1868 27624 696 27280 55600 d930 mutex2
1869 28596 908 34884 64388 fb84 mutex3
1870 26156 696 22100 48952 bf38 release
1871 25460 688 22084 48232 bc68 sched1
1872 56356 828 45892 103076 192a4 stress_threads
1873 27900 712 27288 55900 da5c sync2
1874 26760 700 24692 52152 cbb8 sync3
1875 24924 684 19356 44964 afa4 thread0
1876 25868 684 22084 48636 bdfc thread1
1877 27452 700 24680 52832 ce60 thread2
1878 26136 688 42704 69528 10f98 thread_gdb
1879 27212 692 34916 62820 f564 timeslice
1880 52728 700 123332 176760 2b278 tm_basic
1881 </pre>
1883 <a name="memorymap"/>
1884 <h2>Phi memory map</h2>
1885 TODO This probably belongs in the refdesign section. For now leaving it here because zealot refers to it. Not sure what else uses it.
1887 The ZPU architecture does not define a memory map as such, but the GCC + libgloss + ecos hal library uses the
1888 memory map below. "Phi" is just a three letter word for the particular memory layout below that came about
1889 while developing the ZPU.
1891 <TABLE WIDTH=604 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=7 CELLSPACING=0 STYLE="page-break-after: avoid">
1892 <COL WIDTH=85>
1893 <COL WIDTH=42>
1894 <COL WIDTH=136>
1895 <COL WIDTH=283>
1896 <TR VALIGN=TOP>
1897 <TD WIDTH=85>
1898 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Address</B></FONT></FONT></P>
1899 </TD>
1900 <TD WIDTH=42>
1901 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Type</B></FONT></FONT></P>
1902 </TD>
1903 <TD WIDTH=136>
1904 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Name</B></FONT></FONT></P>
1905 </TD>
1906 <TD WIDTH=283>
1907 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Description</B></FONT></FONT></P>
1908 </TD>
1909 </TR>
1911 <TR VALIGN=TOP>
1912 <TD WIDTH=85>
1913 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0000</FONT></FONT></P>
1914 </TD>
1915 <TD WIDTH=42>
1916 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
1917 </TD>
1918 <TD WIDTH=136>
1919 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
1920 enable</FONT></FONT></P>
1921 </TD>
1922 <TD WIDTH=283>
1923 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1924 [31:1] Not used</FONT></FONT></P>
1925 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1926 [0] Enable ZPU operations</FONT></FONT></P>
1927 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 ZPU
1928 is held in Idle mode</FONT></FONT></P>
1929 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 ZPU
1930 running</FONT></FONT></P>
1931 </TD>
1932 </TR>
1935 <TR VALIGN=TOP>
1936 <TD WIDTH=85>
1937 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0004</FONT></FONT></P>
1938 </TD>
1939 <TD WIDTH=42>
1940 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
1941 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
1942 </TD>
1943 <TD WIDTH=136>
1944 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">GPIO data</FONT></FONT></P>
1945 </TD>
1946 <TD WIDTH=283>
1947 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit [31:0] input data 31:0</FONT></FONT></P>
1948 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit [31:0] output data 31:0</FONT></FONT></P>
1949 </TD>
1950 </TR>
1952 <TR VALIGN=TOP>
1953 <TD WIDTH=85>
1954 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0008</FONT></FONT></P>
1955 </TD>
1956 <TD WIDTH=42>
1957 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
1958 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
1959 </TD>
1960 <TD WIDTH=136>
1961 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">GPIO direction</FONT></FONT></P>
1962 </TD>
1963 <TD WIDTH=283>
1964 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit [31:0] data direction 31:0</FONT></FONT></P>
1965 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0 output</FONT></FONT></P>
1966 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">1 input (default)</FONT></FONT></P>
1967 </TD>
1968 </TR>
1970 <TR VALIGN=TOP>
1971 <TD WIDTH=85>
1972 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A000C</FONT></FONT></P>
1973 </TD>
1974 <TD WIDTH=42>
1975 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
1976 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
1977 </TD>
1978 <TD WIDTH=136>
1979 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
1980 Debug channel / UART to ARM7 TX</FONT></FONT></P>
1981 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
1982 ZPU side</B></FONT></FONT></P>
1983 </TD>
1984 <TD WIDTH=283>
1985 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1986 [31:9] Not used</FONT></FONT></P>
1987 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1988 [8] TX buffer ready (valid on ready)</FONT></FONT></P>
1989 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 TX
1990 buffer not ready (full)</FONT></FONT></P>
1991 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 TX
1992 buffer ready</FONT></FONT></P>
1993 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1994 [7:0] TX byte (valid on write)</FONT></FONT></P>
1995 </TD>
1996 </TR>
1997 <TR VALIGN=TOP>
1998 <TD WIDTH=85>
1999 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0010</FONT></FONT></P>
2000 </TD>
2001 <TD WIDTH=42>
2002 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
2003 </TD>
2004 <TD WIDTH=136>
2005 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
2006 Debug channel / UART to ARM7 RX</FONT></FONT></P>
2007 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
2008 ZPU side</B></FONT></FONT></P>
2009 </TD>
2010 <TD WIDTH=283>
2011 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2012 [31:9] Not used</FONT></FONT></P>
2013 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2014 [8] RX buffer data valid</FONT></FONT></P>
2015 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 RX
2016 buffer not valid</FONT></FONT></P>
2017 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 RX
2018 buffer valid</FONT></FONT></P>
2019 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2020 [7:0] RX byte (when valid)</FONT></FONT></P>
2021 </TD>
2022 </TR>
2023 <TR VALIGN=TOP>
2024 <TD WIDTH=85>
2025 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0014</FONT></FONT></P>
2026 </TD>
2027 <TD WIDTH=42>
2028 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
2029 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2030 </TD>
2031 <TD WIDTH=136>
2032 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Counter(1)</FONT></FONT></P>
2033 </TD>
2034 <TD WIDTH=283>
2035 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2036 [0] Reset counter (valid for write)</FONT></FONT></P>
2037 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2038 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Reset
2039 counter</FONT></FONT></P>
2040 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2041 [1] Sample counter (valid for write)</FONT></FONT></P>
2042 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2043 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Sample
2044 counter</FONT></FONT></P>
2045 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2046 [31:0] Counter bit 31:0</FONT></FONT></P>
2047 </TD>
2048 </TR>
2049 <TR VALIGN=TOP>
2050 <TD WIDTH=85>
2051 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0018</FONT></FONT></P>
2052 </TD>
2053 <TD WIDTH=42>
2054 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
2055 </TD>
2056 <TD WIDTH=136>
2057 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Counter(2)</FONT></FONT></P>
2058 </TD>
2059 <TD WIDTH=283>
2060 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2061 [31:0] Counter bit 63:32</FONT></FONT></P>
2062 </TD>
2063 </TR>
2064 <TR VALIGN=TOP>
2065 <TD WIDTH=85>
2066 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0020</FONT></FONT></P>
2067 </TD>
2068 <TD WIDTH=42>
2069 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read
2070 / Write</FONT></FONT></P>
2071 </TD>
2072 <TD WIDTH=136>
2073 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Global_Interrupt_mask</FONT></FONT></P>
2074 </TD>
2075 <TD WIDTH=283>
2076 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2077 [31:1] Not used</FONT></FONT></P>
2078 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2079 [0] Global intr. Mask</FONT></FONT></P>
2080 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 Interrupts
2081 enabled</FONT></FONT></P>
2082 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupts
2083 disabled</FONT></FONT></P>
2084 </TD>
2085 </TR>
2086 <TR VALIGN=TOP>
2087 <TD WIDTH=85>
2088 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0024</FONT></FONT></P>
2089 </TD>
2090 <TD WIDTH=42>
2091 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2092 </TD>
2093 <TD WIDTH=136>
2094 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">UART_INTERRUPT_ENABLE</FONT></FONT></P>
2095 </TD>
2096 <TD WIDTH=283>
2097 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2098 [31:1] Not used</FONT></FONT></P>
2099 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2100 [0] Debug channel / UART RX interrupt enable</FONT></FONT></P>
2101 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 Interrupt
2102 disable</FONT></FONT></P>
2103 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2104 enable</FONT></FONT></P>
2105 </TD>
2106 </TR>
2107 <TR VALIGN=TOP>
2108 <TD WIDTH=85>
2109 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0028</FONT></FONT></P>
2110 </TD>
2111 <TD WIDTH=42>
2112 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
2113 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2114 </TD>
2115 <TD WIDTH=136>
2116 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">UART_interrupt</FONT></FONT></P>
2117 </TD>
2118 <TD WIDTH=283>
2119 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2120 [31:1] Not used</FONT></FONT></P>
2121 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2122 [0] Debug channel / UART RX interrupt pending (Read)</FONT></FONT></P>
2123 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 No
2124 interrupt pending</FONT></FONT></P>
2125 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2126 pending</FONT></FONT></P>
2127 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2128 [0] Clear UART interrupt (Write)</FONT></FONT></P>
2129 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2130 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2131 cleared</FONT></FONT></P>
2132 </TD>
2133 </TR>
2134 <TR VALIGN=TOP>
2135 <TD WIDTH=85>
2136 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A002C</FONT></FONT></P>
2137 </TD>
2138 <TD WIDTH=42>
2139 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2140 </TD>
2141 <TD WIDTH=136>
2142 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Interrupt_enable</FONT></FONT></P>
2143 </TD>
2144 <TD WIDTH=283>
2145 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2146 [31:1] Not used</FONT></FONT></P>
2147 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2148 [0] Timer interrupt enable</FONT></FONT></P>
2149 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 Interrupt
2150 disable</FONT></FONT></P>
2151 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2152 enable</FONT></FONT></P>
2153 </TD>
2154 </TR>
2155 <TR VALIGN=TOP>
2156 <TD WIDTH=85>
2157 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0030</FONT></FONT></P>
2158 </TD>
2159 <TD WIDTH=42>
2160 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read
2161 /</FONT></FONT></P>
2162 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2163 </TD>
2164 <TD WIDTH=136>
2165 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_interrupt</FONT></FONT></P>
2166 </TD>
2167 <TD WIDTH=283>
2168 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2169 [31:2] Not used</FONT></FONT></P>
2170 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2171 [0] Timer interrupt pending (Read)</FONT></FONT></P>
2172 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 No
2173 interrupt pending</FONT></FONT></P>
2174 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2175 pending</FONT></FONT></P>
2176 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2177 [1] Reset Timer counter (Write)</FONT></FONT></P>
2178 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2179 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Timer
2180 counter reset</FONT></FONT></P>
2181 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2182 [0] Clear Timer interrupt (Write)</FONT></FONT></P>
2183 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2184 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2185 cleared</FONT></FONT></P>
2186 </TD>
2187 </TR>
2188 <TR VALIGN=TOP>
2189 <TD WIDTH=85>
2190 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0034</FONT></FONT></P>
2191 </TD>
2192 <TD WIDTH=42>
2193 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2194 </TD>
2195 <TD WIDTH=136>
2196 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Period</FONT></FONT></P>
2197 </TD>
2198 <TD WIDTH=283>
2199 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2200 [31:0] Interrupt period (write)</FONT></FONT></P>
2201 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> Number
2202 of clock cycles</FONT></FONT></P>
2203 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> between
2204 timer interrupts</FONT></FONT></P>
2205 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
2206 </B>The timer will start at Timer_Periode value and count <B>down</B>
2207 to zero, and generate an interrupt</FONT></FONT></P>
2208 </TD>
2209 </TR>
2210 <TR VALIGN=TOP>
2211 <TD WIDTH=85>
2212 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">.0x080A0038</FONT></FONT></P>
2213 </TD>
2214 <TD WIDTH=42>
2215 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
2216 </TD>
2217 <TD WIDTH=136>
2218 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Counter</FONT></FONT></P>
2219 </TD>
2220 <TD WIDTH=283>
2221 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2222 [31:0] Timer counter (read)</FONT></FONT></P>
2223 <P LANG="en-US" CLASS="western"><BR>
2224 </P>
2225 </TD>
2226 </TR>
2227 <TR VALIGN=TOP>
2228 <TD WIDTH=85>
2229 <P LANG="en-US" CLASS="western"><BR>
2230 </P>
2231 </TD>
2232 <TD WIDTH=42>
2233 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2234 </P>
2235 </TD>
2236 <TD WIDTH=136>
2237 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2238 </P>
2239 </TD>
2240 <TD WIDTH=283>
2241 <P LANG="en-US" CLASS="western"><BR>
2242 </P>
2243 </TD>
2244 </TR>
2245 <TR VALIGN=TOP>
2246 <TD WIDTH=85>
2247 <P LANG="en-US" CLASS="western"><BR>
2248 </P>
2249 </TD>
2250 <TD WIDTH=42>
2251 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2252 </P>
2253 </TD>
2254 <TD WIDTH=136>
2255 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2256 </P>
2257 </TD>
2258 <TD WIDTH=283>
2259 <P LANG="en-US" CLASS="western"><BR>
2260 </P>
2261 </TD>
2262 </TR>
2263 <TR VALIGN=TOP>
2264 <TD WIDTH=85>
2265 <P LANG="en-US" CLASS="western"><BR>
2266 </P>
2267 </TD>
2268 <TD WIDTH=42>
2269 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2270 </P>
2271 </TD>
2272 <TD WIDTH=136>
2273 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2274 </P>
2275 </TD>
2276 <TD WIDTH=283>
2277 <P LANG="en-US" CLASS="western"><BR>
2278 </P>
2279 </TD>
2280 </TR>
2281 <TR VALIGN=TOP>
2282 <TD WIDTH=85>
2283 <P LANG="en-US" CLASS="western"><BR>
2284 </P>
2285 </TD>
2286 <TD WIDTH=42>
2287 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2288 </P>
2289 </TD>
2290 <TD WIDTH=136>
2291 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2292 </P>
2293 </TD>
2294 <TD WIDTH=283>
2295 <P LANG="en-US" CLASS="western"><BR>
2296 </P>
2297 </TD>
2298 </TR>
2299 </TABLE>
2301 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
2303 <a name="todo"/>
2304 <h1>TODO</h1>
2306 <a name="todolist"/>
2307 <h2>TODO list</h2>
2308 <ul>
2309 <li>fix the TODO in this doc that are just doc fixes
2310 <li>organize the TODO list by priority and assign responsibility... if there are takers.
2311 <li>converge on a single IO for core implementations.
2312 <li>fill in performance table for Altera and Lattice.
2313 <li>re-org CVS to make it easy to keep appropriate SW, RTL(verilog and VHDL) , scripts, verification stuff together. separation of tools, core, common, and ref design
2314 <li>provide FPGA scripts.
2315 <li>provide HDL regression environment.
2316 <li>RAM model contribution needed. What is in opencore/common is not adequate.
2317 <li>make wishbone bridge re-usable with all cores
2318 <li>explicit example with UART from opencores in the above ref designs.
2319 <li>discussion of tools needed and choose some to be supported by project. Need to deal with cygwin vs linux, VHDL vs verilog, open vs closed.... plus language support in simulators is sometimes lacking.
2320 <li>setup.sh script needs to detect linux/cygwin, and should have install path option.
2321 <li>shaping up the www.opencores.org pages.
2322 <li>BSD and GPL licenses in the appropriate places.
2323 <li>Currently there exists some pages at <A HREF="http://www.zylin.com/zpu.htm">http://www.zylin.com/zpu.htm</A> that explains about the ZPU. According to OpenCores policy this information should be moved to www.opencores.org. Patches gratefully accepted to do so!
2324 <li>eCos HAL could be less RAM hungry
2325 <li>Needs GDB stub support in eCos
2326 <li>Could do with a Verilog implementation(ca. 600 lines to translate)
2327 <li>Make little endian throughout. Currently instructions are stored big endian, loadb and storeb are big endian, but the data bus is treated as little endian. Creates some problems in type conversion.
2328 </ul>
2330 <a name="repository"/>
2331 <h2>Repository Re-org</h2>
2332 I am proposing the following structure for the repository. It follows somewhat the way I've organized this document with seperation of core, common, and three SOC ref designs. New users go straight to the SOC that best matches their needs.
2333 <pre>
2334 zpu/bin # scripts and toolchain? Want toolchain installed with project. Tidier when working in multi user / multi project environment
2335 zpu/doc #
2336 zpu/core/rtl # RTL for the various core implementations.
2337 zpu/core/sw # crt0.s ?
2338 zpu/common/rtl # Re-use RTL such as RAM and UART
2339 zpu/common/sim # Re-use RTL and tools for regresion testing
2340 zpu/common/sw # ?
2341 zpu/soc/minimal # Three levels of ref designs described above
2342 /basic
2343 /board
2344 zpu/soc/*/rtl # top level, arbiter, etc
2345 zpu/soc/*/sw # helloworld, dmips, etc. makefile/ROMS
2346 zpu/soc/*/sim # regression test area. makefile/scripts
2347 zpu/soc/*/fpga # syn and par area. makefile/scripts
2348 zpu/tools # zip/tarball of tool chains, simulator
2349 </pre>
2350 Not sure where ecos fits.
2352 <a name="nextgen"/>
2353 <h2>Next generation ZPU</h2>
2354 Based on feedback here is a list of a tenuous "consensus" for the next generation
2355 of the ZPU with some tentative ideas on implementation.
2356 <h3>Goals</h3>
2357 <ol>
2358 <li>Reduce minimum code size footprint, i.e. BRAM code overhead. Non-trivial
2359 usable applications in 4kBytes of BRAM (single BRAM block).
2360 <li>Reduce minimum FPGA logic footprint by 20% or more. Goal &lt;300 LUT for
2361 32 bit ZPU
2362 <li>Weed out unnecessary ZPU variations and merge in useful
2363 features to a few recommeneded ZPU implementations.
2364 <li>Will someone be willing to contribute a heavily pipelined ZPU?
2365 Performance goal of 10 DMIPS w/DRAM & cache.
2366 This ZPU could run a TCP/IP stack with relevant performance to compete
2367 with stripped down ARM7 type systems.
2368 </ol>
2369 <h2>GCC changes</h2>
2370 The GCC changes planned are 100% backwards compatible with default
2371 options. However, a raft of options will be added to disable
2372 functionality so as to allow study and experimentation with the
2373 ZPU architecture.
2374 <ol>
2375 <li>Add options that allow defining single entry for all unknown instructions. Precisely
2376 how unknown instructions are handled will be defined by the HDL implementation.
2377 Currently the GCC backend places relatively strict limitations on how unknown/emulated
2378 instructions are handled. This will allow HDL implementations to have
2379 sparser instruction set support. Also this can allow sparse implementations
2380 of emualted instructions. This is especially important to reduce minimal
2381 BRAM requirements for small applications.
2382 <li>GCC needs 4 "hard" registers. These are today mapped to memory. GCC
2383 will allow specifying what address to use or alternatively not to use
2384 memory mapped hard registers at all.
2385 <li>Strip away unused instructions from GCC and add options to GCC for not
2386 emitting more advanced instructions. This will e.g. convert MULT/DIV into
2387 function calls to libgcc and thus make it easier to determine that
2388 microcode is not needed.
2389 </ol>
2391 <a name="float"/>
2392 <h1>Floating point support</h1>
2393 The ZPU does not currently have floating point support. Feedback
2394 from users indicates that single precision floating point support for
2395 addition, multiplication and float-to-integer convesion would
2396 be useful for small ZPU programs that sit in a tight control
2397 loop. Essentially the ZPU is then measuring something, doing a
2398 few calculations and then modifying the control signal.
2400 Such control loops can be written in fixed point math, but that
2401 adds to the engineering effort and reduces clarity of the software
2402 implementation and the performance will probably be worse than
2403 for a hardware floating point version.
2404 <h2>Pipelined floating point module</h2>
2405 Design needs to be nailed down.
2406 <b>Goals:</b>
2407 <ul>
2408 <li> 32 bit single precision floating point
2409 <li> FADD => add two floats
2410 <li> FMULT => multiply two floats
2411 <li> FINT => convert float to int
2412 </ul>
2413 The problem is divided into two:
2415 <ol>
2416 <li>One top level VHDL module for each of the operations above.
2417 <li>Integration into ZPU's are a separate problem that will not be
2418 addressed in this project.
2419 <li>add a memory mapped coprocessor interface to the above. This
2420 yields an example of a coprocessor which can be used for any
2421 custom calculations and allows interest to be gauged.
2422 </ol>
2424 Throughput:
2426 <ol>
2427 <li>pipelined design where throughput is one operation per cycle
2428 with a fixed number of cycles delay.
2429 <li>there is no flow control or enable signal.
2430 </ol>
2434 GCC support is not hard, but modifying GCC should considered after
2435 interest in this feature beyond a coprocessor has been gauged.
2437 <h2>VHDL module interface</h2>
2439 Patches anyone???
2441 </body>
2442 <html>