I have done new cpu single cycle rv32i https://twitter.com/splinedrive/status/1577734635307651073
Should be the base for pipelined version as next
Hi all - after a long pause I am thinking about FPGA projects again. I have not had much hobby time in the interim period, but what time I had went to working with System III Unix and the Sipeed Riscv board ( see here ). That is working now.
For that I am using the Plan 9 C compilers by Ken Thompson and Rob Pike, the Riscv backend was done by Richard Miller. It compiles both RV32 and RV64 code and the compilers are small enough (~200KB) that they can run native easily. They can be found here: https://gitlab.com/pnru/riscv-kencc
It is actually a family of compilers, that can also do x86, ARM and M68K (and several more).
I was looking at ways to run SysIII on the ULX3S as well and came across the RVSoc project, which can be found here: https://www.arch.cs.titech.ac.jp/wk/rvsoc/doku.php
It is only some 5,000 lines of plain Verilog and Linux ('buildroot') capable - but undoubtedly it will be slow, because I don't think it has a memory cache. The CPU is RV32IMAC with a 32 bit MMU.
Has anybody tried to get RVSoc running on an ULX3S board?
I was thinking about adding the cache that I did for Oberon and never got quite working. How has Yosys/NextPNR improved over the past two years in this area (large, dual port block ram usage)?? Pre-pandemic it was 'being worked on' I remember.
I noticed that Orangecrab is no longer doing nightly builds. Is YosysHQ now the preferred source for those? (I mean https://github.com/YosysHQ/oss-cad-suite-build).
@e2kgh: Thanks for the feedback.
I only started looking around yesterday evening. Yeah, I agree that "12 stage multicycle" doesn't read well. What I find cool is that the CPU is just 1,200 lines and the MMU 800 lines. Plain Verilog code of that simplicity is easy to read. (There seems to be a more traditional 5 stage pipelined cousin, the RVCoreP, but I have not looked at that one).
My context for this are the CPU's of the early 80's and I think that this CPU may be competitive with a VAX, a M68010 or a Bellmac-32 -- all running at about 10Mhz at the time. However, I've only just read the paper and had a quick glance at the code; maybe I will be disappointed when I look deeper.
The @splinedrive implementation looks cool too. From the .sv extensions I think it is System Verilog? Does Yosys cover that now? I maybe able to get around not having the atomic instructions, but having the compressed ones is important to me (with these, the generated code is similar in size to the ancient VAX, etc. binaries which gives me more room for compare & contrast). Also, having a paged MMU is needed for my projects.
I would have to revisit my old Oberon code, but I think my issue was different; maybe the fix gives me another path to try.
I'm using a single clock, single port BRAM for cache. Maybe I went that way because I could not get dual ported working, I don't remember at this remove.
To compensate I used multiplexers and partial writes to simulate a 16b x 8K port and a 32b x 4K port (https://gitlab.com/pnru/ulx3s-misc/-/blob/master/oberon_pnr/cache.v#L142-165) This never worked, no matter how much I pipelined or tweaked clocks, it would always have one unreliable nibble (4 bits). Always a nibble, although which one varied from attempt to attempt.
Either I never quite understood what the real timing constraint was, or NextPNR would always route one nibble with impossibly long lines. More likely the first.
Maybe I should give a simple dual-ported design another go with the current Yosys.
Looked a bit more at the RVSoC source. It does have a 32b x 1K directly mapped cache, with a 4 word (128 bit) cache line. It is a bit hard to follow, as it was generated and then wrapped to match it to the rest. The way it is implemented requires more BRAM than there is on a ULX3S.
Running at 100MHz, it is claimed to be about 7 times faster than a 1980 VAX (i.e. a model 11/780), and similar to a 486DX (but unclear at what clock). Buildroot Linux boots in 12 seconds from a ramdisk it is claimed.
@e2kgh: Did not try that. I only work with Yosys/ECP5, I do not have the vendor tools installed for any vendor.
I did spend a lot more time reading the code yesterday. It makes a bit more sense now. The main CPU operates in 8-12 clocks per instruction (mostly 8, more for load/store/atomic etc.; div and mul take >30 clocks and pause the main state engine whilst doing the calc). The MMU is integrated in the main CPU state engine. One could say it is like a Linux-capable picoRV (but not a derivative, of course).
Also tried to compile the C code for its I/O controller with the Plan 9 tool chain: that worked without issue. The code is slightly larger than with gcc (~5%), both are around 3KB.
Next weekend I will probably try to run a "hello world" in Icarus, I think. Porting to Yosys/ULX3S comes after that.
Actually, @michaelengel 's port of xv6 to RV32 would be a nice first target for this SoC:
Quote from 9 Oct 2020 on this Gitter:
@lawrie I ported xv6 to RV32 - https://github.com/michaelengel/xv6-rv32
I have already ported regular xv6 to the Plan9 tool chain, so repeating that for the 32 bit version should not be a lot of work. The cool bit is that stock xv6 expects a virtio disk, and this SoC presents a virtio disk.
Ah, stop, I misread the code. In fact, it uses a RAM disk at address 0x90000000 (main.c:367)
@michaelengel: yes, it uses ram disk, which is okay for test setup. I was thinking that a simple SPI interface to the SD card could be added (in the style of Oberon's RISC5 spi controller: https://gitlab.com/pnru/ulx3s-misc/-/blob/master/oberon_pnr/SPI.v) and that main.c could be modified to use that interface instead of a ram disk :^)
With that out of the way, the full 32MB sdram on the ULX3S board can be allocated to Linux.
I'm wondering if adding a real block dev interface for xv6 (and then mount functionality and perhaps different file systems) is worth the effort...
@michaelengel: I already have 64-bit SysIII Unix running on the D1 board. Kernel is ~60KB, and it works with the SD card for disk. Richard's compiler runs native, and it can self-compile. Kernel is about 7,000 lines, excluding the SD Card driver which is ~1,000 lines (from the Allwinner SDK).