Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    splinedrive
    @splinedrive
    When we can order ulx4m?
    Paul Ruiz
    @pnru_gitlab

    Of course, I can test on my boards, just prepare the bitstream.

    Bit streams here:

    https://gitlab.com/r1809/rvsoc/-/tree/main/binary

    emard
    @emard
    HI! On 12F and 85F I get prompt, can dump mem and 't' shows this
    > t
    Mem test:
    End run
    Goran Mahovlic
    @goran-mahovlic
    @splinedrive ulx4m is still not available to order, but good news is that I have ordered PCB panels for new LS and LD versions
    1 reply
    splinedrive
    @splinedrive
    @goran-mahovlic I subscribed to the funding list! Why the DDR variant has an 45F, not 85F?
    Goran Mahovlic
    @goran-mahovlic
    45F is for now as I got just few of those and it is cheaper if I make mistake in design :)
    As for CS I think I will go with 12F with SDRAM and 100Mb Ethernet, and 85F UM or 85F UM5G on DDR3 version
    splinedrive
    @splinedrive
    understood :)
    nice
    Paul Ruiz
    @pnru_gitlab
    @emard Thank you for the test. That output means that mem test did not encounter any errors. It would seem that the memory controller is now working fine. When I get time I will revisit the Oberon design.
    emard
    @emard
    @pnru_gitlab ooh yes if oberon can be made a bit more stable that would be nice. Even diamond build is rather on the edge of chip's capability. Maybe we can also try to make RAM driver dualported somehow so that graphics runs from SDRAM, not BRAM
    Paul Ruiz
    @pnru_gitlab
    @emard The redesigned cache already uses dual ported ram now, so it becomes a lot easier to run the video from SDRAM (it can multiplex into the SDRAM controller directly, instead of being mixed up with the CPU data path). However, first I want to get the Riscv system ready for tinkering by Michael Engel and his students. And maybe FER students?
    emard
    @emard
    @pnru_gitlab FER students, it would be mirracle if they use boards for own projects instead of just passing exams. If a FER student is reading this please join. 200 studens per year got boards we don't see much contributions if any. Marko has tred to "convice" them by providing exam benefits and success was very "limited" I'd say :)
    Paul Ruiz
    @pnru_gitlab

    @emard @kost: I've added support for the fujprog / f32c packet upload of a binary file to the RVSoc system. It now works at low (115K2) and high (1M) speeds, but not at very high (3M) speeds. Did 3Mb/s work in other contexts?

    Also, it only works if the terminal speed and the upload speed are set to the same value. I think that switching the speed on the ftdi uart may introduce noise characters on the line, and the upload protocol has no way to re-sync after such noise that I could see. Maybe I am missing something?

    However, it seems that f32c also does not support split speeds (see https://github.com/f32c/f32c/blob/master/src/boot/sio/binboot.c). Maybe that never worked?
    On the other hand running both the terminal and the upload at 1M is a minor nuisance, just a lot of options on the command line....

    Paul Ruiz
    @pnru_gitlab
    Also, support for a RISCV header seems unfinished in fujprog, see here: https://github.com/kost/fujprog/blob/master/fujprog.c#L3427-L3434
    Any point in submitting a patch?
    Tim 'mithro' Ansell
    @mithro
    emard
    @emard
    @pnru_gitlab great that you got it working,!! 3Mbps worked with f32c fpgarduino mips and riscv with somewhat more crc errors than 115200 or 1M. Most f32c binaries are around 5-20KB, longest was basic 300KB thats what we have tested. Protocol itself is simplistic and maybe somewhere retry cycle fails because of 1 byte extra in unflushed buffer, I agree retry logic in fujprog may be buggy or needs rework/rewrite. How about trying python code for upload? f32c should support command to enter faster speed after prompt and negotiating on default 115200 then switch to 3M - all this is simple implementation and in practice I sometimes must reset or reload bitstream to f32c to upload again.
    I guess patch should be welcome :) if we break f32c noobody will notice as marko still uses ujprog without f I think :)
    emard
    @emard
    f32c has few versions of booloader, some minimalistic one for low-lut platform to some multi-stage luxury edition which can change baudrate https://github.com/f32c/f32c/blob/master/src/boot/rom/loader.c#L185
    Paul Ruiz
    @pnru_gitlab

    @emard Thanks for the feedback! The verilog loader to match fujprog is here:
    https://gitlab.com/r1809/rvsoc/-/blob/42fcea43/src/loader.v#L147-212

    I'm currently using that with this command line:
    fujprog -t -e tst.bin -x 1000000 -b 1000000 sys.bit1
    and that is a practical workflow for me: the bit file gets loaded to board, immediately uploads the test program and drops into a terminal for testing. At the moment it only seems to work when the upload and terminal speed are the same, but maybe that is because there is an error in my verilog for the upload receiver. It would be nice if I could drop the -x and -b options and just use the defaults.

    In its current form it reliably uploads even large files (9MB) without error at an acceptable speed (at 1Mb/s that takes 90 s).

    If 115K2 + 3M it works for the f32c, it must be possible to get it to work for RVSoc as well.

    emard
    @emard
    @pnru_gitlab well I'd say if 1Mbps makes 1.5min upload acceptable for testing it is just ok. 3Mbps has more error rate so crc block retry will loose some speed. For larger and faster uploads I may recomend using US2 DFU bootloater (we have ready binaries https://github.com/emard/ulx3s-bin/tree/master/fpga/dfu) to write to SPI FLASH, and assuming RISCV with some additional effort can preload RAM from flash on boot. DFU uploads binary first reading flash and writing only differences so if you debug and change 1 byte in the code, it will be written in few seconds and tested much faster
    emard
    @emard
    I think DFU bootloader project itself has already scripts to embed executable portion of code into BRAM preloads for RISCV-USB it uses, when I recompile its C code bitstream doesn't need to be recompiled and upload is real fast. DFU source is "hidden" here with my unremergeable=dirty changes to adapt for ULX3S https://github.com/emard/had2019-playground
    Paul Ruiz
    @pnru_gitlab

    @emard @lawrie I've completed my baseline work on the RVSoc system. Code is here:
    https://gitlab.com/r1809

    It successfully runs xv6. I've also tried running Linux again using the binaries provided by the original project from the Uni of Tokyo, but it seems to get stuck after it switches from the Berkeley Boot Loader into Linux itself. Maybe the issue is the different memory layout, I'm not interested enough in Linux to dive into this and compile from source.

    In a quick Dhrystone measurement I get to about 2 DMips performance, which is less than the 3 DMips that a was expecting based on the claims in the paper (~7 x 40Mhz/104MHz = ~ 3). Not sure how that compares with VexRiscv (I assume that one will be well above 3 DMips on the ULX3S).

    All in all I am quite impressed with this project. Who would have thought that one could run Linux on top of just 4000 lines of plain Verilog, consuming less than half of a 12F (ie. 25F) chip?

    emard
    @emard
    Fantastic results! 2 DMips/MHz it is very good speed! We at f32c get about 1.5-1.8 DMips/MHz
    Paul Ruiz
    @pnru_gitlab

    I meant 2 DMips at 40MHz. This makes sense: an instruction takes 12 clocks (+ cache misses), so 40MHz / 12 = 3.3 million instructions per second. Taking into account cache misses and normalised (CISC) instructions, the reported 2 to 3 million instructions per second seems to be the right magnitude.

    The f32c appears to be 30...35 times faster, doing 1.5 to 1.8 instructions per clock. Maybe I am totally misunderstanding Dhrystones and DMips?

    Alastair M. Robinson
    @robinsonb5
    f32c is impressively fast, so don't feel bad if you're not matching it! 2 DMIPS at 40MHz (assuming you're running from zero-wait-state block RAM) would put you in a similar performance bracket to a small ZPU configuration.
    (I can get about 13 DMIPS out of EightThirtyTwo @ 100MHz running from BRAM, and about 10 DMIPS from SDRAM, depending on how busy the video's keeping the RAM.)
    Paul Ruiz
    @pnru_gitlab

    I'm not feeling bad, also because it is not my design :^) I just ported it. I read up on Dhrystones and there are some interesting insights (for me, maybe everybody here already knows).

    The f32c is a scalar design, so should do 1 DMips/Mhz if its instructions are about equally efficient as those of the original VAX. In practical terms, I think that is true for both MIPS and Riscv. Compilers got better since then and that accounts for some 30-40% of DMips 'inflation'. Also running from zero-wait ram instead of slow ram plus a small cache accounts for some 30% of improvement. I now understand the 1.5-1.8 result for the f32c. No put-down intended, I can see that the f32c is fast.

    I'm running my Dhrystone tests with the kencc compiler (which is more -Os than -O3) and a simple library. This skews my measurement versus what is normally used. Compensating for that and using the 1.3 number that the f32c page reports for usage with a cache and SDRAM, I get back to the 1 DMips/Mhz number on a like-for-like basis. This is 20 times faster.

    Now, comparing a scalar CPU with a CPU using 12 clocks I would have expected the difference to be 12x not 20x. This difference is the same order of magnitude as the 2 DMips versus expected 3 DMips that I mentioned earlier. This is worth further investigation.

    Maybe the choice to use a 256 byte cache line was not clever. Maybe I should reduce the cache line to 64 bytes, double the number of sets from 16 to 32 and split into I and D caches. This can be fitted into the cache design with relative ease.

    As a final note, I'm interested in this SoC as a target for early 80's Unix and 2 DMips is twice as fast as the original VAX, twice as fast as the 3B2 (what SysV was developed on) per core, and equally fast as a 68020 or 68030 running at 15Mhz. The design is already at the mid-80's speed that I am looking for.

    Alastair M. Robinson
    @robinsonb5
    Hmmm, kencc - that's one I must investigate - thanks! (I'm a big fan of alternative lightweight toolchains - something small enough to be self-hosting.) But yes the quality of both the compiler and the standard library have a big impact on Dhrystone - one of the criticisms it frequently attracts - but it's not necessarily a bad thing; you just have to understand that it's not measure purely of CPU speed. It's measuring the speed of a particular system and environment when tasked with a typical early 80s worklload.
    Paul Ruiz
    @pnru_gitlab
    Kencc is the plan 9 compiler suite. The RV32 and RV64 versions are by Richard Miller. You can find them here https://gitlab.com/pnru/riscv-kencc Presentation here https://youtu.be/LHJqdXGb0uc
    Paul Ruiz
    @pnru_gitlab
    My Unix System III port to the Allwinner D1 SoC uses this as its self hosted compiler. I'm aiming to get this running on RVSoc as well.
    Alastair M. Robinson
    @robinsonb5
    Awesome, thanks, I'll check it out.
    emard
    @emard
    @pnru_gitlab seems you right, f32c can exe near 2 instructions per clock, it was my imagination that this unix capable riscv can be faster, but it's not all in the speed, the compatibility to run complex code like kernel is proof that everything inside is well done and works
    Valentin Plotkin
    @plotkin1996
    Greetings everyone. Sorry if it's not the right place to ask, but is there any way to purchase the broad from EEA/Norway now? Crowdsupply refuses to ship because "this item is not CE marked and cannot be shipped to your country." Which is a bit absurd, because, wasn't the board designed in Croatia in the first place?
    Goran Mahovlic
    @goran-mahovlic
    Hi, I will check tomorow as I think I have few more boards that I can sell to company or over envox.eu
    Well yes EU has new rules that are the same for small and big and ULX3S is unfortunately still not certified.
    All boards are made in Croatia then shipped Mouser and then blocked to EU market.
    Ties Stuij
    @stuij
    I tried to buy from Mouser living in the UK but unfortunately that doesn't work either as I'm not a business.
    Goran Mahovlic
    @goran-mahovlic
    Yes, unfortunately as mouser also bloks selling to UK as UK is enforcing new UK CERT. But I will check today if I have few ULX3S for Envox.
    Valentin Plotkin
    @plotkin1996
    Thanks.
    Goran Mahovlic
    @goran-mahovlic
    @plotkin1996 @stuij this will not last long as currently I have only 4 available >> https://www.envox.eu/product/ulx3s/
    Ties Stuij
    @stuij
    Hmm, first of all I can't beat the recapcha. every time i type in my credit card details, the recaptcha tells me it expired. Secondly seems like UK customers still only can be companies: "need a Buyers from the UK need to provide a valid VAT number at checkout."
    Double hmm,.. seems I can do direct payment. No VAT number has been asked. Let's see how far that brings us.
    Goran Mahovlic
    @goran-mahovlic
    I know they changed recapcha recentely but I am not sure with VAT for UK. Maybe you can check directly on info@envox.hr
    Valentin Plotkin
    @plotkin1996
    Thank you so much!
    Goran Mahovlic
    @goran-mahovlic
    No problem I will ship your board tomorow.
    Ties Stuij
    @stuij
    yea thanks regardless :)
    Peter
    @chiefdome:matrix.org
    [m]
    Hi
    Tim 'mithro' Ansell
    @mithro
    Tim 'mithro' Ansell
    @mithro