self.gpio_csn = Const(17)
but in top_st7789.v lcd_clk = wifi_gpio17;
@daveshah1 I'm trying to understand the speed of certain circuits. In a write-back cache I have to maintain a 'dirty bit' for each cache line.
If I implement that as:
reg [SETS-1:0] dirty[0:WAYS-1];
dirty[way][set] <= dirty[way][set] | wr;
I seem to get to about 14ns delay, Fmax ~70MHz.
Using a one-dimensional, wide vector seems a bit better
reg [WAYS*SETS-1:0] dirty;
dirty[{way, set}] <= dirty[{way, set}] | wr;
and reports about 10ns, Fmax ~100MHz.
Using 4 different 16-bit vectors is better again (1 of 4 shown):
reg [SETS-1:0] dirty0;
if (way==0) dirty0[set] <= dirty0[set] | wr;
and reports about 8ns, Fmax ~125MHz.
These are all NextPNR reports for an 85F as part of a larger circuit, not an isolated example. Are the above outcomes in line with your expectations and is there an even faster way to express the same?
@emard Thanks for pointing that out, I had not considered the routing aspect yet. So far I figured that only the last form would be recognized as synthesizable to a physical DPR16X4C block (i.e. using the LUT config bits as distributed ram) -- but my understanding of Yosys/NextPNR is too limited to be sure about this.
I don't have a lot of time right now, but I'm working through variations to make the cache design as timing independent as possible, so that it 'just works' for various bus designs. In the case of Oberon this is a little complicated as the RISC5 CPU has several (bus-)multiplexers after its registered signals, meaning that it takes until well into the clock cycle before its bus signals are stable. It does not help that the "memory not ready" input is indirectly also driving the address multiplexer leading to a logic loop in a straightforward asynchronous cache hit circuit.
The current, working Next186-based Oberon design solves this by gating the Oberonstation system clock, which has its own issues as we discovered a few weeks ago.