Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    cheater
    @cheater
    oh :(
    Tom Smeding
    @tomsmeding
    it seems the nf makes a massive difference...
    aaaaand it gets killed by the oom killer
    cheater
    @cheater
    yeah nf makes a massive difference
    the results look more normal
    @tomsmeding try lowering dbPartNum
    regarding the oom
    say from 500k to 200k
    or 100k
    Tom Smeding
    @tomsmeding
    my gtx 1050 mobile is 2.1x as fast as my i7-700HQ for dbPartNum = 10000
    seems to work :)
    tomsmeding @tomsmeding will be making dinner
    cheater
    @cheater
    yeah :)
    Jonathan Fraser
    @JonathanFraser
    image.png
    Any Ideas? Try to get this spun up on the new GPU support for WSL2.
    Jonathan Fraser
    @JonathanFraser
    Realized I had the 11.3 toolkit installed, downgraded to 10.2 but now there's linking issues
    image.png
    Jonathan Fraser
    @JonathanFraser
    Tried setting ld_library_path but no joy. Anyway to see the full list of ldflags in use?
    Jonathan Fraser
    @JonathanFraser
    So it looks like the problem is it's only using the cuda tookit directory as a library location and that doesn't contain a libcuda.so by default (there is a stub impl in ./stub/libcuda.so)
    if I symlink the system libcuda into the toolkit things seem to work
    Trevor L. McDonell
    @tmcdonell
    hey @JonathanFraser, long time no see! how are things?
    I don’t have much experience on windows but glad to see you seem to have it working. There is a PR for CUDA 11 support still waiting for me to merge and release. Not sure if that would have avoided the problem for you, I’ll try and get around to that soon anyway
    Trevor L. McDonell
    @tmcdonell
    @JonathanFraser btw I uploaded a new version of the cuda bindings which works with cuda-11
    Callan McGill
    @Boarders
    Completely random question (not a feature request, just curious!): how feasible would it be to use something like vulkan for the accelerate backend? Is that language just way too far away accelerate at present / cuda offering specific features or is it just a case of "wow that is a lot of elbow grease"?
    Trevor L. McDonell
    @tmcdonell
    just a lot of elbow grease
    vulkan uses llvm internally (just like cuda), so what we'd really want to do is munge the llvm we generate so that the vulkan driver accepts it, and just use the vulkan api as a coordination layer (launching kernels, allocating device memory, etc.)
    (saying "munge the llvm we generate" seriously understates the work required, but still...)
    there are plans to support more devices/backends, but it's a lot of elbow grease ):
    Callan McGill
    @Boarders
    ok cool! Nice to know (if only for my curiosity). Accelerate is such a cool project and it is amazing what you have accomplished imo
    Trevor L. McDonell
    @tmcdonell
    thanks! it's always super nice to hear from users ^_^
    Troels Henriksen
    @athas
    Vulkan compute was seriously complicated last I looked at it. Conceptually doable, but the boilerplate is apocalyptic.
    Vulkan is mainly for graphics, and heavily stratified so that it is implementable on very basic devices that cannot do general-purpose computation. You have to enable an enormous number of "extensions" to get access to thing like a standard notion of memory.
    One thing we ran into is that Vulkan did not support "bitcasts" (without extensions that were not universally supported), which is necessary if you want to reuse memory for different types.
    But this was two years ago, and it was pretty clear that things would get better... eventually.
    Trevor L. McDonell
    @tmcdonell
    ah interesting! maybe it's better these days, but yeah I'm not sure it would be the best target to go after (not that I have the time...)
    Johannes Maier
    @kenranunderscore
    Hi everyone! I'm trying to get the accelerate-examples to run, preferable with the docker variant that is to be found in the top-level directory. To successfully perform a stack build I had to recreate the symlink to libcuda.so.1 that had been removed again during the Docker stages. Am I doing something wrong that I even need libcuda inside the container?
    Johannes Maier
    @kenranunderscore
    Oh, and when I try to run one of the examples inside the container, I'm seeing /opt/accelerate-examples/.stack-work/install/x86_64-linux/15ab62d7d6829823d56e91b6a792b2fece77c32bb61dfca3848225d46a7ad9b4/8.10.2/bin/accelerate-canny: error while loading shared libraries: libffi.so.7: cannot open shared object file: No such file or directory, which makes me suspect even more that I'm simply using the Docker setup the wrong way. Any help is greatly appreciated!
    Johannes Maier
    @kenranunderscore
    In the meantime I've read the README in accelerate-llvm, where I learned that I should probably use NVIDIA-Docker. Now I've tried to get the examples to run locally (I have the CUDA toolkit installed on Ubuntu, as well as the other dependencies) but I'm still getting errors. Will try again sometime later this week with more questions :)
    Trevor L. McDonell
    @tmcdonell

    Hi @kenranunderscore. Yes, to use the docker images you’ll probably want to run via nvidia-docker, which should take care of the libcuda.so.1 error.

    Feel free to post any errors you run into, I’ll do my best to respond although might not be too quick as I’m on leave right now. You might have better luck with the version of accelerate-llvm from GitHub by the way, as that should work with the latest ghc/llvm/CUDA

    Charles Durham
    @fabricatedmath
    @tmcdonell Hey Trevor! Been away from using Accelerate for a while. Back at it and really liking all the changes. Pattern I1, I2 is by itself a huge time savings! However, I'm trying to generate curands on device through FFI and get that device ptr managed by Accelerate. I'm following this stack overflow: https://stackoverflow.com/questions/48526958/how-to-use-a-cuda-deviceptr-as-an-accelerate-array and it seems like the API entirely changed. How should I go about doing this now?
    Trevor L. McDonell
    @tmcdonell
    hey @fabricatedmath, long time no see, how have you been? yes, lots of changes, I’ll try and whip up an equivalent example for you with the new api shortly
    Charles Durham
    @fabricatedmath

    @tmcdonell Good! Keeping busy. Chasing a toddler around also now. How have you been?

    After I posted my question I did see a lot of the modules are still importable (I thought they were hidden from import because they were hidden from haddock). Is accelerate-fft the gold standard now for an cuda ffi?

    Trevor L. McDonell
    @tmcdonell
    @fabricatedmath whoa, congratulations! I bet a toddler will indeed keep you busy (: I’m okay, was mostly just busy with teaching before the summer break, taking a bit of time to recover
    yep accelerate-fft is a good one to look at, and also accelerate-blas, which has a few more examples
    Charles Durham
    @fabricatedmath

    @tmcdonell Hey, thanks! I seem to have gotten it all working based on those two templates, thanks for the help.

    I'm also trying to add a native implementation using MWC randoms and wrapping the call to randomArray into a "Par Native (Future (Vector Float))" but the Array types aren't working together. How do I move the Array from Data.Array.Accelerate into the one from Data.Array.Accelerate.Representation.Array?

    Trevor L. McDonell
    @tmcdonell
    oh I forgot to mention, I have this random number generator which might work for you? https://github.com/tmcdonell/sfc-random-accelerate
    the interface is pretty minimal at the moment…
    can you share the (non-working) code? might be easier for me to see
    Charles Durham
    @fabricatedmath

    I'll take a look, thanks.

    Hey, yeah, here's a gist https://gist.github.com/fabricatedmath/90e64295975288161c0bdbf8f5662692.

    The ptx part compiles and runs just fine, the native is giving the error in the gist

    Charles Durham
    @fabricatedmath
    sorry, forgot to @tmcdonell , but please don't mistake this for urgency, really appreciate the help