initial
[fpgammix.git] / rtl / interconnect.v
blobc9714c73adce182782d028647e35c102a78e4c3d
1 /* Interconnect support
3 TODO:
4 - Demultiplexing; connecting one master to several targets
6 The interesting issue here is arrises all my targets use
7 variable-length pipelined reads, thus multiple outstanding requests
8 can result in data arriving back out of order, or even colliding in
9 the same cycle. This can be solved, albeit a bit involved, with
10 a FIFO that remembers the order in which data is expected back and
11 an overflow FIFO to save data that's not expected yet.
13 I'll delay working on this until I start actually having multiple
14 outstanding read requests to multiple targets (currently I only
15 have either multiple targets [Core -> memory] or multiple requests
16 for a single target [Core & framebuffer -> SRAM]).
18 - Burst support (may be useful once I start supporting the SDRAM).
20 Intro
21 ~~~~~
23 I've rejected WISHBONE as being too low performance. Altera's Avalon
24 represents the golden standard for me, but it's _very_ general and
25 would take much effort to replicate.
27 For my needs I only need:
28 - all reads are pipelined
29 - multi master support
30 - burst support
33 Basics
34 ~~~~~~
36 Master raises transfer_request to initial a transfer. Slave
37 asynchronously raises wait_request when it's not ready. The master is
38 expected to hold the request and associated parameters stable until
39 wait goes down.
41 Example, a master side state machine issuing two requests (no bust)
42 will look like:
44 always @(posedge clock) case (state)
45 S1: begin
46 ... <parameters> ... <= .... params for 1st req ....
47 transfer_request <= 1;
48 state <= S2;
49 end
51 S2: if (~wait_request) begin
52 ... <parameters> ... <= .... params for 1st req ....
53 state <= S3;
54 end
56 S3: if (~wait_request) begin
57 transfer_request <= 0;
58 ....
59 end
62 Pipelined Reading
63 ~~~~~~~~~~~~~~~~~
65 Read requests are just like write requests, except that one or more
66 cycles in the future replies will arrive, in the order issued, raising
67 data_valid for one cycle for each reply.
69 Extending the above example, we need a separate state machine
70 collecting the data as we can't assume anything about how long we will
71 have to wait in S2.
73 always @(posedge clock) begin
74 if (data_valid)
75 if (waiting_for_1st) begin
76 first <= port_read_data;
77 waiting_for_1st <= 0;
78 end else begin
79 second <= port_read_data;
80 waiting_for_2nd <= 0;
81 end
83 case (state)
84 S1: begin
85 ... <parameters> ... <= .... params for 1st req ....
86 transfer_request <= 1;
87 waiting_for_1st <= 1;
88 state <= S2;
89 end
91 S2: if (~wait_request) begin
92 ... <parameters> ... <= .... params for 2nd req ....
93 waiting_for_2nd <= 1;
94 state <= S3;
95 end
97 S3: if (~wait_request) begin
98 transfer_request <= 0;
99 state <= S4;
102 S4: if (~waiting_for_1st & ~waiting_for_2nd) begin
103 ... process data
108 In many cases burst transfers can replace the need for multiple
109 issues.
112 Multi Master
113 ~~~~~~~~~~~~
115 I really like the shared based approach to arbitration that Avalon
116 has. Let's see if we can replicate it.
118 Constants
119 A_SHARES, B_SHARES
121 if (transfer_request_a & (shares_left_for_a | ~transfer_request_b))
122 go ahead and let A get access
123 if (shares_left_for_a == 0)
124 shares_left_for_a = A_SHARES
125 else
126 --shares_left_for_a
127 else if (transfer_request_b & (shares_left_for_b | ~transfer_request_a))
128 go ahead and let B get access
129 if (shares_left_for_b == 0)
130 shares_left_for_b = A_SHARES
131 else
132 --shares_left_for_b
133 else // ~transfer_request_a & ~transfer_request_b
134 shares_left_for_a = A_SHARES
135 shares_left_for_b = A_SHARES
136 do nothing else
138 Puh, not tooo complicated? Just wait until we throw burst support into the mix.
140 Also, I failed to show how to support routing the read data back to
141 theirs masters. A simple 1-bit FIFO of length A_SHARES + B_SHARES
142 (??) should be sufficient. (Careful about the latency of the FIFO
143 itself.) However, the arbitration must know which requests will
144 result in a reply. (Should simply all requests result in a reply to
145 simplify?)
148 Burst support
149 ~~~~~~~~~~~~~
151 I haven't found documentation for how Avalon handles this, but it
152 seems that simply extending the basic strategy with a burst count
153 should suffice.
155 Example: lets show the client side for a change (as the master looks
156 almost exactly the same as for a single request). This simple client
157 can only handle one outstanding request at a time, but it does support
158 bursts.
160 wire wait_request = count != 0;
162 always @(posedge clock) begin
163 pipeline[N:1] <= pipeline[N-1:0];
164 pipeline[0] <= 0;
165 if (count) begin
166 addr <= addr + 1;
167 pipeline[0] <= 1;
168 count <= count - 1;
169 else if (transfer_request) begin
170 count <= burst_length;
171 addr <= request_addr
175 (Of course the counting down to -1 and using the sign bit would be
176 better).
179 The only extension needed for the arbitration is knowing about the
180 burst size.
183 module arbitration
184 (input clock
186 // Master port 1
187 ,input transfer_request1
188 ,input [31:0] address1
189 ,input wren1
190 ,input [31:0] wrdata1
191 ,input [ 3:0] wrmask1
192 ,output wait_request1
193 ,output read_data_valid1
194 ,output [31:0] read_data1
196 // Master port 2
197 ,input transfer_request2
198 ,input [31:0] address2
199 ,input wren2
200 ,input [31:0] wrdata2
201 ,input [ 3:0] wrmask2
202 ,output wait_request2
203 ,output read_data_valid2
204 ,output [31:0] read_data2
206 // Target port
207 ,output transfer_request
208 ,output [31:0] address
209 ,output wren
210 ,output [31:0] wrdata
211 ,output [ 3:0] wrmask
212 ,input wait_request
213 ,input read_data_valid
214 ,input [31:0] read_data
217 // My preference, but the others could be provided also as alternative
218 // arbitration modules.
219 `define SHARE_BASED 1
220 `define AVG_RATIO_MODEL 1
223 * Data routing fifo. Size must cover all potential outstanding
224 * transactions.
226 parameter FIFO_SIZE_LG2 = 4;
227 parameter debug = 0;
229 reg data_for_1[(1 << FIFO_SIZE_LG2) - 1:0];
230 reg [FIFO_SIZE_LG2-1:0] rp = 0;
231 reg [FIFO_SIZE_LG2-1:0] wp = 0;
233 wire [FIFO_SIZE_LG2-1:0] wp_next = wp + 1;
234 wire [FIFO_SIZE_LG2-1:0] rp_next = rp + 1;
236 assign transfer_request = transfer_request1 | transfer_request2;
237 assign read_data1 = read_data;
238 assign read_data2 = read_data;
239 assign read_data_valid1 = read_data_valid & data_for_1[rp];
240 assign read_data_valid2 = read_data_valid & ~data_for_1[rp];
242 wire en1 = transfer_request1 & ~wait_request1;
243 wire en2 = transfer_request2 & ~wait_request2;
244 assign address = en1 ? address1 : address2;
245 assign wren = en1 ? wren1 : wren2;
246 assign wrdata = en1 ? wrdata1 : wrdata2;
247 assign wrmask = en1 ? wrmask1 : wrmask2;
249 always @(posedge clock) begin
250 if ((en1 | en2) & ~wren) begin
251 data_for_1[wp] <= en1;
252 wp <= wp_next;
253 if (wp_next == rp)
254 if(debug)$display("%05d ARB: FIFO OVERFLOW! wp: %d rp: %d", $time, wp_next, rp);
255 else
256 if(debug)$display("%05d ARB: FIFO remembered a read req wp: %d rp: %d", $time, wp_next, rp);
259 if (read_data_valid) begin
260 rp <= rp_next;
261 if (rp == wp)
262 if(debug)$display("%05d ARB: FIFO UNDERFLOW! wp: %d rp: %d", $time, wp, rp_next);
263 else
264 if(debug)$display("%05d ARB: FIFO routed read data wp: %d rp: %d", $time, wp_next, rp);
268 `ifdef UNFAIR
270 * Quick and unfair arbitration. Master 1 can quite easily starve
271 * out master 2
273 assign wait_request2 = wait_request | transfer_request1;
274 assign wait_request1 = wait_request;
275 `endif
277 `ifdef ROUND_ROBIN
279 * Very simple round robin
281 reg current_master = 0;
282 assign wait_request1 = wait_request | transfer_request2 & current_master;
283 assign wait_request2 = wait_request | transfer_request1 & ~current_master;
285 always @(posedge clock)
286 if (transfer_request1 & transfer_request2 & ~wait_request)
287 current_master <= ~current_master;
288 `endif
290 `ifdef SHARE_BASED
292 * Share based
294 parameter SHARES_1 = 5; // > 0
295 parameter SHARES_2 = 10; // > 0
296 parameter LIKE_AVALON = 1;
298 parameter OVERFLOW_BIT = 6;
300 reg current_master = 0;
301 reg [OVERFLOW_BIT:0] countdown = SHARES_1 - 2;
302 assign wait_request1 = wait_request | transfer_request2 & current_master;
303 assign wait_request2 = wait_request | transfer_request1 & ~current_master;
305 reg [31:0] count1 = 1, count2 = 1;
307 always @(posedge clock) begin
308 if (transfer_request1 | transfer_request2)
309 if(debug)
310 $display("%05d ARB: Req %d/%d Arbit %d/%d W:%d %d (shares left %d, cummulative ratio %f)",
311 $time,
312 transfer_request1, transfer_request2,
313 transfer_request1 & ~wait_request1,
314 transfer_request2 & ~wait_request2,
315 wren1, wren2,
316 countdown + 2,
317 1.0 * count1 / count2);
319 /* statistics */
320 count1 <= count1 + (transfer_request1 & ~wait_request1);
321 count2 <= count2 + (transfer_request2 & ~wait_request2);
323 `ifdef AVG_RATIO_MODEL
324 /* The arbitration is only relevant when two masters try to
325 * initiate at the same time. We swap priorities when the
326 * current master runs out of shares.
328 * Notice, unlike Avalon, a master does not forfeit its shares if
329 * it temporarily skips a request. IMO this leads to better QOS
330 * for a master that initiates on a less frequent rate.
332 * In this model, the arbitration tries to approximate a
333 * SHARES_1 : SHARE_2 ratio for Master 1 and Master 2
334 * transactions (as much as the available requests will allow it).
337 if (~wait_request) begin
338 if (transfer_request1 | transfer_request2) begin
339 countdown <= countdown - 1;
340 if (countdown[OVERFLOW_BIT]) begin
341 current_master <= ~current_master;
342 countdown <= (current_master ? SHARES_1 - 2 : SHARES_2 - 2);
346 `endif // `ifdef AVG_RATIO_MODEL
348 `ifdef WORST_CASE_LATENCY_MODEL
350 * This version tries to be more like Avalon.
352 * In this model SHARE_1 essentially defines the worst case time
353 * that Master 2 can wait, likewise for SHARE_2.
355 if (~wait_request) begin
356 if (transfer_request1 & transfer_request2) begin
357 countdown <= countdown - 1;
358 if (countdown[OVERFLOW_BIT]) begin
359 if(debug)$display("Swap priority to master %d", 2 - current_master);
360 current_master <= ~current_master;
361 countdown <= (current_master ? SHARES_1 - 2 : SHARES_2 - 2);
363 end else if (transfer_request1 & current_master) begin
364 if(debug)$display("Master 2 forfeits its remaining %d shares", countdown + 2);
365 current_master <= 0;
366 countdown <= SHARES_1 - 3;
367 end else if (transfer_request2 & ~current_master) begin
368 if(debug)$display("Master 1 forfeits its remaining %d shares", countdown + 2);
369 current_master <= 1;
370 countdown <= SHARES_2 - 3;
373 `endif // `ifdef WORST_CASE_LATENCY_MODEL
375 `endif // `ifdef SHARE_BASED
376 endmodule
379 `ifdef TESTING
380 module master
381 (input clock
382 ,output transfer_request
383 ,input wait_request
384 ,input [31:0] read_data
385 ,input read_data_valid
388 parameter ID = 0;
389 parameter WAIT_STATES = 0;
391 reg [31:0] state = 0;
392 reg got1 = 0;
393 reg [31:0] no1, no2;
395 reg transfer_request = 0;
397 always @(posedge clock) begin
398 case (state)
399 0:begin
400 if(0)$display("%05d Master%1d state 0", $time, ID);
401 transfer_request <= 1;
402 state <= 1;
405 1:if (~wait_request) begin
406 state <= 2;
407 if(0)$display("%05d Master%1d scored", $time, ID);
410 2:if (~wait_request) begin
411 if(0)$display("%05d Master%1d scored again", $time, ID);
412 if (WAIT_STATES == 0)
413 state <= 1;
414 else begin
415 transfer_request <= 0;
416 state <= (WAIT_STATES <= 1) ? 0 : 3;
420 3: state <= (WAIT_STATES <= 2) ? 0 : 4;
422 4: state <= 0;
423 endcase
425 if (read_data_valid) begin
426 got1 <= ~got1;
427 if (got1) begin
428 no2 <= read_data;
429 if(0)$display("%05d Master%1d got data for no2: %x", $time, ID, read_data);
430 end else begin
431 no1 <= read_data;
432 if(0)$display("%05d Master%1d got data for no1: %x", $time, ID, read_data);
437 endmodule
440 * Trivial target that returns serial numbers. To make it interesting
441 * it takes two cycles (thus one wait state) to accept a request and
442 * delivers the reply five cycles later.
444 module target
445 (input clock
446 ,input transfer_request
447 ,output wait_request
448 ,output [31:0] read_data
449 ,output read_data_valid
452 parameter NOWAIT = 1;
453 parameter LATENCY = 5; // >= 0
455 reg got_request;
456 reg [32:0] read_data_pipeline [4:0];
457 reg [31:0] serial_no = 0;
459 assign wait_request = transfer_request & ~got_request;
460 assign {read_data_valid,read_data} = read_data_pipeline[LATENCY-1];
462 always @(posedge clock) begin
463 serial_no <= serial_no + 1;
464 read_data_pipeline[0] <= 33'd0;
465 read_data_pipeline[1] <= read_data_pipeline[0];
466 read_data_pipeline[2] <= read_data_pipeline[1];
467 read_data_pipeline[3] <= read_data_pipeline[2];
468 read_data_pipeline[4] <= read_data_pipeline[3];
470 got_request <= NOWAIT | transfer_request & ~got_request;
472 if (got_request & transfer_request) begin
473 read_data_pipeline[0] <= {1'd1, serial_no};
474 if(0)$display("%05d Target got a request", $time);
477 endmodule
479 /* Multi master */
480 module main();
481 reg clock = 1;
482 always # 5 clock = ~clock;
484 wire transfer_request, wait_request, read_data_valid;
485 wire [31:0] read_data;
486 wire transfer_request1, wait_request1, read_data_valid1;
487 wire [31:0] read_data1;
488 wire transfer_request2, wait_request2, read_data_valid2;
489 wire [31:0] read_data2;
491 arbitation arbitation_inst
492 (clock
493 ,transfer_request
494 ,wait_request
495 ,read_data
496 ,read_data_valid
498 ,transfer_request1
499 ,wait_request1
500 ,read_data1
501 ,read_data_valid1
503 ,transfer_request2
504 ,wait_request2
505 ,read_data2
506 ,read_data_valid2
509 target target_inst
510 (clock
511 ,transfer_request
512 ,wait_request
513 ,read_data
514 ,read_data_valid
517 master #(1, 0) master1
518 (clock
519 ,transfer_request1
520 ,wait_request1
521 ,read_data1
522 ,read_data_valid1
525 master #(2, 2) master2
526 (clock
527 ,transfer_request2
528 ,wait_request2
529 ,read_data2
530 ,read_data_valid2
533 initial #400000 $finish;
534 endmodule
536 `ifdef MAIN_SIMPLE
537 module main_simple();
538 reg clock = 1;
539 always # 5 clock = ~clock;
541 wire transfer_request, wait_request, read_data_valid;
542 wire [31:0] read_data;
544 target target_inst
545 (clock
546 ,transfer_request
547 ,wait_request
548 ,read_data
549 ,read_data_valid
552 master #(1) master_inst
553 (clock
554 ,transfer_request
555 ,wait_request
556 ,read_data
557 ,read_data_valid
559 endmodule
560 `endif
562 `endif