Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 03 16:31

    github-actions[bot] on gh-pages

    update 4d5952af04cc5c52b90c977d… (compare)

  • Oct 03 16:29

    stnolting on main

    [README] minor edits [sw] cleanup makefiles (compare)

  • Sep 29 02:46

    github-actions[bot] on gh-pages

    update e94713ea01cab2044dc7163c… (compare)

  • Sep 29 02:44

    stnolting on main

    [docs] add release badge (compare)

  • Sep 28 18:16

    github-actions[bot] on gh-pages

    update 5b3978b6c487c3166642eb37… (compare)

  • Sep 28 18:13

    stnolting on main

    [README] minor edits [sw/common] add ASM and ELF mak… [dcos] add new makefile targets (compare)

  • Sep 27 07:55
    LucaZulberti closed #418
  • Sep 27 07:55
    LucaZulberti commented #418
  • Sep 27 07:39
    stnolting commented #418
  • Sep 27 07:12
    LucaZulberti commented #418
  • Sep 26 17:47

    github-actions[bot] on gh-pages

    update a978b64097ac68a50a213b58… (compare)

  • Sep 26 17:45

    stnolting on main

    [sw/example/blink_led] minor ed… [rtl] add menvcfg CSR dummies (compare)

  • Sep 26 17:07
    stnolting labeled #418
  • Sep 26 17:07
    stnolting commented #418
  • Sep 26 14:20
    LucaZulberti opened #418
  • Sep 24 08:42

    github-actions[bot] on gh-pages

    update d540612fde954ca2a54e4117… (compare)

  • Sep 24 08:40

    stnolting on v1.7.7

    (compare)

  • Sep 24 08:23

    github-actions[bot] on gh-pages

    update d540612fde954ca2a54e4117… (compare)

  • Sep 24 08:21

    stnolting on main

    :rocket: release v1.7.7 (compare)

  • Sep 23 21:48

    github-actions[bot] on gh-pages

    update d6d94c66c0d7b80700961947… (compare)

betocool-prog
@betocool-prog
I see what you mean... I found out about the additional clock by reading the forums on Digilent and the schematic to confirm. I actually managed to get the XIP bootloader running on the A7 using regular SPI, turns out the commands for reading x1 lines are the same. Different commands set it to x2 or x4, but I avoided them for the sake of simplicity.
I haven't uploaded the A7 XIP bootloader to Github because I have not yet made another application for it, I'm busy with the DE0-Nano board at the moment.
And yes, to answer the question, I've had success with the A7, there's a lot of development going on there also in the open source communities, Litex, F4PGA and those, unfortunately it's all Verilog or Python(Migen) based, and I wanted to get started quicker rather than learn a new way of doing things. I got the Neorv32 running on an A7 in a matter of minutes. It took me longer to set up the project than anything else.
stnolting
@stnolting
Hey @betocool-prog ! Sorry for the late answer... So far I had no problems with Quartus inferring BRAM for the NEORV32 FIFO component - but most of the time I use the "asynchronous" mode (= data written to the FIFO appears at its output automatically in the next cycle).
@marker5a I am not sure if this is really relevant, but I know from some other FPGA (I think the Lattice ice40 ultra plus) that you need to set some constraint / or click some box in the device menu to be able to use the configuration port (where the bitstream flash is attached to) as "general purpose IO" for connecting the NEORV32 SPI and/or XIP signals
betocool-prog
@betocool-prog
Hey @stnolting , no problem! I kind of figured it out, but I will suss out a simple just-fifo instance as VHD, I'm curious to see what's all the registers that the compiler tool puts around the fifo module. I did get it to work eventually, it's very picky in terms of signals and syntax it seems.
stnolting
@stnolting
🙌 some news: the NEORV32 processor now provides an optional 1-Wire controller (-> https://github.com/stnolting/neorv32/discussions/403)
Mohammed Alshomrany
@MohammedAlshomrany

Hi stnolting, First of all, I would like to thank you for your great contributions to the RISC-V community. I am really impressed by your work and really want to contribute to this project to make it a great reference for researchers to play with RISC-V. So, I am interested in making changes to the neorv32 microarchitecture by making it one cycle and having 3 or 5 stages rather than the 2 stage multi-cycles. In terms of complexity, I'm curious about the visibility of these kinds of changes. If it is easier to start from scratch or with your design, take consideration of the possibility of changing microarchitecture later.

Again, thank you for your great work.

stnolting
@stnolting
Hey Mohammed! Thank you very much for the nice words :)
Changing the general CPU architecture is quite complex I think... The whole design was designed explicitly for the current 2-stages-multi-cycle architecture. I tried to address some of my design considerations in the datasheet's "rationale" section (-> https://stnolting.github.io/neorv32/#_a_multi_cycle_architecture).
stnolting
@stnolting
Put simple, the design aims to be a small as possible. That's why I have chosen the multi-cycle approach. Changing the CPU into an all-pipelined design is quite complex and I think a complete rebuilt from scratch would be easier than changing the current setup. However, I have also thought about this idea... Maybe one could add - without a complete re-design - some "partial pipelining" (for example just for ALU operations).
Anyway, you are welcome to make proposals and to improve the design. ;)
Mohammed Alshomrany
@MohammedAlshomrany
Hey @stnolting, From my understanding, your design strategy of keeping the CPU small, simple, and FPGA-friendly while supporting many RISC-V ISA extensions and adding common peripherals to be a full-scale microcontroller-like SoC platform is a brilliant idea. As you said, it is quite complex to make changes to the CPU. So, here is a new challenge: how can we improve the performance while maintaining the design strategy? One thing you mentioned is partial pipelining for ALU operations. So, could we start a new dissertation that aims for others to put ideas and implement them to increase processor performance?
I mean new discussion, not "dissertation". ;)
stnolting
@stnolting
I wouldn't call is brilliant. I just thought that many (most?) people could relate to this strategy. But thank you anyway! :)
Furthermore, I am big fan of the tiny Lattice ice40 Ultra-Plus FPGAs and I wanted the processor to fit into them 😅
Adding pipelining for ALU operations might not be (too) complicated. But it would have a lot of "side effects" that need to be taken care of. For example: right now the CPU front end (= the instruction) fetch takes a least 2 cycles for fetching a new 32-bit instruction. If the ALU only requires one cycle to complete operations (when using pipeling) there would be no benefit as the CPU would idle for another cycle until a new instruction word is available...
Of course this changes when using the C ISA extension, where 2 compressed instructions can be fetched within two instruction-fetch cycles. But still...
I have a list of features I would like to implement one day. One thing is a "memory crossbar". Right now, the CPU's instruction and data bus interfaces are multiplexed into a single processor bus. Obviously, there is high traffic on this bus. I tried to relax that by implementing a CPU-internal instruction prefetch buffer and a "prioritizing bus mux" that gives higher priority to data accesses. Anyway, having a real crossbar and two processor busses (one for instructions, one for data) might improve performance.
stnolting
@stnolting
But before implementing something like that it would be good to benchmark the current bottlenecks. The CPU's HPMs (hardware performance monitors) are a good thing to start with, because you can track bus traffic, pipeline stalls, etc.
Mohammed Alshomrany
@MohammedAlshomrany
Thank you for your assistance, @stnolting. As you suggested, I will start by benchmarking the current bottlenecks, and if I identify any interesting results, I will share them with you.
stnolting
@stnolting
👍👍👍
Ahmed Charles
@acharles:matrix.org
[m]
Just curious, is there any interest in implementing RV64I?
stnolting
@stnolting
Hey! Yes, I am planning to add rv64 support some day. But before that, I need to "generalize" the whole system a little bit more (i.e. use data_width_c constant when defining native-sized data signals).
Ahmed Charles
@acharles:matrix.org
[m]
Ah, so PR's to do that will be accepted, generally speaking?
I was thinking of making data_width_c a generic parameter (I originally called it CPU_XLEN due to the spec using XLEN). There are places in the package that use data_width_c in a way that isn't easy to convert to a generic, so not sure how to handle those.
betocool-prog
@betocool-prog
What's the benefit of having a 64bit architecture, when we're talking about bare metal? Surely 32 bit is enough? Having said that, I'm all for trying it out :wink:
Just out of curiosity, is there a Linux example for the Neorv32?
I should not get ahead of myself... I'm still working on my DAC for the Nano board...
Ahmed Charles
@acharles:matrix.org
[m]
My interest in neorv32 is mostly because it's the only core in VHDL. It also seems to have good docs, though I haven't really figured out how to run it in practice on an OrangeCrab I recently got. (I loaded the bitstream but I'm not sure how to 'test' it.)
stnolting
@stnolting
@acharles:matrix.org Sure, PRs are always welcome! :wink:
At it first, I think it is better to keep the whole system a plain 32-bit setup. Speaking about the CPU core, we could add initial support for 64-bit instructions, but supporting 64-bit data access is a little bit more complex as we would need to update the whole SoC including all peripherals.
@betocool-prog I agree, even 8-bit is enough for many embedded applications (toasters and coffee machines) :D
What do you mean with "Linux example"? You cannot run Linux (natively) on the NEORV as it lacks supervisor mode and a MMU.
stnolting
@stnolting
@acharles:matrix.org The easiest thing to check would be to use the default build-in bootloader. Just connect two UART signals (RX and TX) and you should be able to talk to the processor - even without all the more sophisticated peripherals.
Here is the link to the project's user guide showing how to interact with the bootloader (uploading a proram): https://stnolting.github.io/neorv32/ug/#_uploading_and_starting_of_a_binary_executable_image_via_uart
stnolting
@stnolting
@acharles:matrix.org I have started to cleanup the processor's data width configuration in stnolting/neorv32#417.
The code from the PR will provide a generic XLEN to configure the CPU data width selecting either rv32 ISA or rv64 ISA.
Adapting the core's data path is not really complex and that might already work with the 64-bit configuration. However, the instruction decoding (and all the CSR stuff) will take more time (and more PRs :sweat_smile: ) before there is a first experimental support of the rv64 ISA.
Ahmed Charles
@acharles:matrix.org
[m]
Cool. I'll take a look at instruction decoding perhaps.
On the OrangeCrab test, I suppose the complexity is that it's only connected to USB and testing UART directly would require an external chip. Luckily I have a tigard, so I can use that easily for UART to USB.
Ahmed Charles
@acharles:matrix.org
[m]
I guess the ideal OrangeCrab 'getting started' guide would setup a blink demo that ran on load, just to prove that it works. Unfortunately, I'm not familiar enough to even get to that point. :/
stnolting
@stnolting
We already have something like that :wink: Just use this https://github.com/stnolting/neorv32/blob/main/rtl/test_setups/neorv32_test_setup_approm.vhd as top entity for your board and connect at least one LED to gpio_o.
This setup requires the default IMEM initialization file (neorv32_application_image.vhd) from the repository as it is pre-initialized with the "blink_led" example program (-> https://github.com/stnolting/neorv32/tree/main/sw/example/blink_led).
Ahmed Charles
@acharles:matrix.org
[m]
Thanks, I'll try that.
Ahmed Charles
@acharles:matrix.org
[m]
I'm looking at neorv32_cpu_control.vhd and I'm noticing that some of the generic parameters are tested using generate and some using regular if. Is that difference significant?
stnolting
@stnolting
if ... generate is mainly used to conditionally instantiate submodules (like the C compressed instructions decoder) and to do all the wiring of those optional modules (for example all the HPM counter stuff).
Ahmed Charles
@acharles:matrix.org
[m]
I noticed that it's not used for some of them, like CPU_EXPTENSION_RISCV_C on line 525. Perhaps it can't be used here because it's inside a process?
stnolting
@stnolting
Oh, right. Maybe it would be cleaner to have two processes here and select the right one using if generate. But I wanted to save some lines of code :wink:
However, synthesis and simulation works like a charm. At some point, Quartus might inform/warn you with something like "... evaluates to a constant" but that is just fine.