3 * Improve cycle time; attack the current critical path which is in
4 ME. First try switching from a tag-sequental cache to the more
5 conventional way select (using a 32-bit 4-mux). This gives us
6 another stage to do the load alignment and sign-extension.
8 * Implement D$ flushing!
10 * Implement a slew of performance counters, most specifically, count
17 - waiting for mult hazards
18 - waiting for div hazards
20 * Implemented DIV and DIVU
22 * Added a 16-bit async sram controller for LPRP.
24 * sram_ctrl.v: burst reads can now be issued back to back, fully
25 saturating the memory.
27 * Majorily redid the memory protocol. It now supports pipelining and
28 burst reads. To simplity, *all* reads are burst reads of a fixed
29 burst length (currently 4) and *all* writes are single, non-burst.
31 * Rewrote and chopped off early history. George Orwell would have been
34 * Separate the memory bus from the peripheral bus.
36 Things gets interesting with peripherals that need DMA-like access
37 to memory, such as video interfaces.
39 A consequence would be that Flash etc would be on the peripheral
40 bus, so if we wish to cache this, we'd have to distinguish IO from
43 * Find a way to preload the RTL simulated SRAM (loading the kernel
44 with the bootloader under RTL simulation takes *WAY* too long).
48 * Once a sufficient testing structure is in place, start replacing the
49 pipeline with a high performance one. [DONE]
50 * Fix the arrays that can't be inferred as RAM. Notably the register
52 * Restart the pipeline instead of stalling [DONE]
53 * Forward results instead of restarting [DONE]
54 * Move configuration parameters (such as cache size, etc) to a global
55 configuration file. [DONE]
56 * Wizzard generated RAM blocks (to know exactly what I get) [DONE]
57 * Extend the forwarding in DE and use dual-port memory w/o bypassing
59 * Handle uncached loads (uncached stores works as a side effect of the
62 1. Move the x_res mux up to just after the data cache out and make
63 sure that x_res falls through the shifter network when not
66 2. Add another memory event (uncached_load), make sure it doesn't
67 accidentally writes to the data cache array or tags.
69 3. Make uncached loads kill the pipe and set up the uncached_load
72 4. When the data for uncached_load comes in, thread it through the
73 shifter network and restart the next instruction.
75 5. Test by making all loads uncachable.
77 * Restart on load-use hazards. [DONE]
78 * Restart branches whose delay slots gets delayed (ie. due to cache
81 * Fix the current strange behavior of tinymon. Fault points to a
82 serial port problem. Prefer a software workaround, as the
83 peripherals are going to get an overhaul later anyway. [DONE - it
84 was two actual bugs in the pipeline]
85 * Found a new home: git://repo.or.cz/yari.git
90 * Matching IO behaviour like I do now is unsustainable. As a general
91 principle, while cosimulating, higher level models could take the IO
92 events from the lower levels (which can ultimately be the running