Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Trevor L. McDonell
    @tmcdonell
    yes, I was just about to say that. sorry I didn't get your message in time!
    Troels Henriksen
    @athas
    Do you know if Accelerate does something particularly fancy to the nbody example when compiled with the llvm-cpu backend? It is much faster than I would expect (runtime does not seem to scale quadratically with n).
    Trevor L. McDonell
    @tmcdonell
    no, there's no special code path for the cpu backend
    Troels Henriksen
    @athas
    Does Accelerate do the equivalent of a C compiler's -ffast-math?
    Trevor L. McDonell
    @tmcdonell
    I haven't looked at the generated code in a while (possibly never for the cpu backend, that was implemented when we were still generating CUDA!)
    yes, it does do that
    Troels Henriksen
    @athas
    Oh, cool, that makes sense.
    I'm asking because I have a student who is finishing up a thesis on a multicore backend for Futhark, and I am helping him benchmark Accelerate to compare to a more mature backend. Performance is mostly identical for compute-bound programs, but sometimes Accelerate is way faster on pretty straightforward code (like nbody), which goes away if I recompile the Futhark-generated code with -ffast-math.
    Have you had any trouble in practice with using -ffast-math? I have been too paranoid to use it.
    Trevor L. McDonell
    @tmcdonell
    it came up once before, which is why these compensated sum functions exist (which effectively disable -ffast-math)
    *which also
    Troels Henriksen
    @athas
    Ah, cool! I also considered whether one could exploit the fact that if a user asks for parallel summation of floats, then they tacitly claim that float addition is commutative, and thus clearly they don't mind losing a bit of accuracy.
    But it also looks like -ffast-math does stuff like use CPU instructions for e.g. square roots, rather than calling the math library. I'm less sure how to handle that.
    Trevor L. McDonell
    @tmcdonell
    yeah, I'm a bit in two minds about it as well, but as you said there is a sort of tacit agreement here. at least in LLVM -ffast-math is an alias for a few different options, so you could choose to enable only the ones you are comfortable with
    (and on a per-instruction basis)
    Trevor L. McDonell
    @tmcdonell
    @SlavMFM not sure if you are still in the channel, but out of curiosity what OS(s) are you running? it helps planning where to spend development effort etc., if we want to start on an AMD target
    Slaus Blinnikov
    @SlavMFM
    @tmcdonell oh, a bit embarrassing ^^, I hope not to draw attention away from other important directions! I have Ubuntu Linux, but distro doesn't matter I guess, because I had to update kernel to v.5.4 because it was the only way to get OpenCL working: https://askubuntu.com/questions/1209725/how-to-get-opencl-support-for-navi10-gpus-from-amd/1211465#1211465 .
    Trevor L. McDonell
    @tmcdonell
    good to know, thank you! (:
    Callan McGill
    @Boarders
    If I wanted to work with a vector of an arbitrary but statically known size with accelerate how would I do that? For example in the k-means example it sticks with doing it for tuples but how would one work with arbitrary known sized vectors (even if just up to the tuple size accelerate supports)?
    Trevor L. McDonell
    @tmcdonell
    Hi @Boarders! If I get what you mean, we don't have anything special to support that sort of thing. I think you'll need to define a type class covering the types you are interested in, and then your accelerate functions are parameterised over that. what you'd do in regular Haskell basically.
    Callan McGill
    @Boarders
    Cool, thank you
    Hugh Sipière
    @hgsipiere
    hi, how does this project compare to futhark?
    the main difference i see is that Accelerate is an EDSL compared to having its own external file parser or that futhark supports amd gpus
    other than that, it isn't so clear?
    Trevor L. McDonell
    @tmcdonell
    yes, that's basically it. the projects have broadly similar goals: they are languages for computation on data-parallel arrays. if you are working in Haskell I expect Accelerate will be easiest. since futhark is a standalone compiler it is perhaps easier to use it from a different language (but I'm not exactly sure/which ones). accelerate supports multicore cpus and Nvidia gpus; futhark supports OpenCL, but their cpu backend is still in development (not sure of the status, it might be complete).
    but as you say, beyond that the details are a bit murky on exactly what features the language supports (or what gpu features/instructions, etc... I myself never done a detailed comparison)
    if you had something specific in mind I might be able to advise if/how well that would look in Accelerate. sometimes @athas is here and he would be able to comment on Futhark
    Hugh Sipière
    @hgsipiere
    it would be nice if it supported AMD GPUs though I assume that's a lot of work
    would it be a small job, like a quick pull request or major backend?
    Hugh Sipière
    @hgsipiere
    i'd probably be doing a lot of linear algebra/numerical methods. A GPU is usually quite good for matrices, with Futhark I'd be writing my own linear algebra functions for the GPU but with Accelerate I can use a CPU BLAS library
    i'm not really sure what is best, i'm a maths student you see not comp sci haha
    Hugh Sipière
    @hgsipiere
    ^^ i just figured out the answer, use both so dw about that
    Troels Henriksen
    @athas
    Accelerate is much easier to use if the rest of your code is in Haskell. Futhark is probably easier to use from other languages, although I do recall reading a paper about an Accelerate FFI. There are also significant differences in language capabilities, but these technical concerns are likely more important in most cases.
    Trevor L. McDonell
    @tmcdonell
    with accelerate there are (some) bindings to BLAS libraries on the GPU, so that is your best bet. currently they are limited, but it's easy to add more (just let us know which ones you need)
    HugoPeters1024
    @HugoPeters1024

    Hi quick question if that's okay, I'm trying out integration with gloss to display a texture generated by accelerate. As a minimal example I tried the following

      genPicture :: Array DIM2 Word32                                         
      genPicture = fromList (Z:.640:.480) $ repeat 255

    but the resulting window remains the same as the background color as if the picture is invisible

    I assumed that the RGBA values were encoded as 4 bytes in the word32 so I expected 255 to produce a black non-opaque image (0x000000ff)
    Trevor L. McDonell
    @tmcdonell
    yes that sounds correct. I guess you are using it together with bitmapOfArray ?
    HugoPeters1024
    @HugoPeters1024
    yes exactly
    Trevor L. McDonell
    @tmcdonell
    that's... odd... let me check
    wait, no, you're on a little endian system (I assume), so that is actually 0xff000000
    HugoPeters1024
    @HugoPeters1024
    ah, that's it! thanks for the swift response!
    Trevor L. McDonell
    @tmcdonell
    no problem! for a moment there I thought I had totally broken something ^^"
    *black is actually 0xff000000
    HugoPeters1024
    @HugoPeters1024
    still got some weird stretching going on but I think that's a gloss problem
    Trevor L. McDonell
    @tmcdonell
    btw there are some utilities for this in colour-accelerate
    HugoPeters1024
    @HugoPeters1024
    great! is it Christmas already? :)
    Trevor L. McDonell
    @tmcdonell
    accelerate-io-JuicyPixels might also be useful to you
    HugoPeters1024
    @HugoPeters1024
    I'm trying to work through your phd thesis to better understand this tool and I was wondering something about the limitations of data parallelism. Namely, you describe that operations should not depend on each other (or at least not a way that can't be resolved with a tree like execution of sorts). This makes sense since concurrent writes are a bit of a no no because interleaved reads will be non deterministic. However, I was wondering if it is still possible to write double buffered transformations that perform multiple reads on the same input (a blur kernel for example).
    Trevor L. McDonell
    @tmcdonell
    yes that should be fine
    the difficulty is when the computation performed in parallel spawns further parallel work