Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Trevor L. McDonell
    @tmcdonell
    @athas yeah, I’m not too hopefully to use ROCm as a target actually, at least until it’s more fully-baked
    Trevor L. McDonell
    @tmcdonell
    making sure there are no implicit conversions for example (OpenCL int /= haskell int, as i’m sure you are familiar with recently) and dealing with aggregate types I remember was a source of bugs. that was all a long time ago, maybe I’m not such a bad programmer anymore. probably other things too but I don’t remember. It’s all things which can be solved with extensive testing, but (at the time) I couldn’t reflect the C types into the haskell type system to catch all those bugs for me, so they kept slipping in...
    (actually I used to silently try to use 32-bit ints on the GPU whenever I could-loop counters and such-because they are so much faster, but eventually abandoned that because it just isn’t robust)
    Troels Henriksen
    @athas
    Yes, the implicit conversions are super annoying, but I think I eventually got rid of those by using a phantom-typed expression language that I then transform to C.
    Some could still sneak in, I guess. My main problem with generating C code has been supporting complex control flow, e.g. jumping out of multiple loops. That requires goto in C, and works fine with NVIDIA, but often triggers compiler bugs on AMD.
    Trevor L. McDonell
    @tmcdonell
    @athas ah yes you are right, that was a problem too; that impedance mismatch going from HS expressions to C statements
    statusfailed
    @statusfailed_gitlab
    Is there a way to evaluate an Exp a? Usually when I type in an expression of that type at the REPL, it shows me a value, but sometimes I get a big pretty-printed expression
    Trevor L. McDonell
    @tmcdonell
    the only way to evaluate things is with run (and it's variants) which all evaluate array expressions. but you can create a scalar (one element) array with unit
    statusfailed
    @statusfailed_gitlab
    Ah cool, ok!
    Trevor L. McDonell
    @tmcdonell
    what you are seeing is the show instance for Exp (functions and expressions), and I guess the simplifier is able to reduce it down to a single value in some cases. There is a show instance for Acc (functions and expressions) which does the same thing too.
    Robbert van der Helm
    @robbert-vdh
    @statusfailed_gitlab I use this in my tests:
    evalExp :: Elt a => Exp a -> a
    evalExp e = head . A.toList $ run (unit e)
    statusfailed
    @statusfailed_gitlab
    Ah nice :-)
    I will steal that- I also want to write unit tests for my expressions :D
    @tmcdonell actually the simplifier seems really clever- I have only run into a couple cases where it's not able to reduce into a single value
    this particular one has a 'coerce' at the top, maybe that's why?
    (in fact, for a long time I thought the Show instance for Exp a was actually evaluating the expression, not just pretty-printing it)
    Troels Henriksen
    @athas
    What is the easiest way to compile accelerate-examples without any CUDA stuff? Setting the llvm-ptx flag to false does not seem to do the trick.
    Troels Henriksen
    @athas
    I figured it out: an llvm-ptx: false flag on both accelerate-examples and accelerate-fft.
    Trevor L. McDonell
    @tmcdonell
    yes, I was just about to say that. sorry I didn't get your message in time!
    Troels Henriksen
    @athas
    Do you know if Accelerate does something particularly fancy to the nbody example when compiled with the llvm-cpu backend? It is much faster than I would expect (runtime does not seem to scale quadratically with n).
    Trevor L. McDonell
    @tmcdonell
    no, there's no special code path for the cpu backend
    Troels Henriksen
    @athas
    Does Accelerate do the equivalent of a C compiler's -ffast-math?
    Trevor L. McDonell
    @tmcdonell
    I haven't looked at the generated code in a while (possibly never for the cpu backend, that was implemented when we were still generating CUDA!)
    yes, it does do that
    Troels Henriksen
    @athas
    Oh, cool, that makes sense.
    I'm asking because I have a student who is finishing up a thesis on a multicore backend for Futhark, and I am helping him benchmark Accelerate to compare to a more mature backend. Performance is mostly identical for compute-bound programs, but sometimes Accelerate is way faster on pretty straightforward code (like nbody), which goes away if I recompile the Futhark-generated code with -ffast-math.
    Have you had any trouble in practice with using -ffast-math? I have been too paranoid to use it.
    Trevor L. McDonell
    @tmcdonell
    it came up once before, which is why these compensated sum functions exist (which effectively disable -ffast-math)
    *which also
    Troels Henriksen
    @athas
    Ah, cool! I also considered whether one could exploit the fact that if a user asks for parallel summation of floats, then they tacitly claim that float addition is commutative, and thus clearly they don't mind losing a bit of accuracy.
    But it also looks like -ffast-math does stuff like use CPU instructions for e.g. square roots, rather than calling the math library. I'm less sure how to handle that.
    Trevor L. McDonell
    @tmcdonell
    yeah, I'm a bit in two minds about it as well, but as you said there is a sort of tacit agreement here. at least in LLVM -ffast-math is an alias for a few different options, so you could choose to enable only the ones you are comfortable with
    (and on a per-instruction basis)
    Trevor L. McDonell
    @tmcdonell
    @SlavMFM not sure if you are still in the channel, but out of curiosity what OS(s) are you running? it helps planning where to spend development effort etc., if we want to start on an AMD target
    Slaus Blinnikov
    @SlavMFM
    @tmcdonell oh, a bit embarrassing ^^, I hope not to draw attention away from other important directions! I have Ubuntu Linux, but distro doesn't matter I guess, because I had to update kernel to v.5.4 because it was the only way to get OpenCL working: https://askubuntu.com/questions/1209725/how-to-get-opencl-support-for-navi10-gpus-from-amd/1211465#1211465 .
    Trevor L. McDonell
    @tmcdonell
    good to know, thank you! (:
    Callan McGill
    @Boarders
    If I wanted to work with a vector of an arbitrary but statically known size with accelerate how would I do that? For example in the k-means example it sticks with doing it for tuples but how would one work with arbitrary known sized vectors (even if just up to the tuple size accelerate supports)?
    Trevor L. McDonell
    @tmcdonell
    Hi @Boarders! If I get what you mean, we don't have anything special to support that sort of thing. I think you'll need to define a type class covering the types you are interested in, and then your accelerate functions are parameterised over that. what you'd do in regular Haskell basically.
    Callan McGill
    @Boarders
    Cool, thank you
    Hugh Sipière
    @hgsipiere
    hi, how does this project compare to futhark?
    the main difference i see is that Accelerate is an EDSL compared to having its own external file parser or that futhark supports amd gpus
    other than that, it isn't so clear?
    Trevor L. McDonell
    @tmcdonell
    yes, that's basically it. the projects have broadly similar goals: they are languages for computation on data-parallel arrays. if you are working in Haskell I expect Accelerate will be easiest. since futhark is a standalone compiler it is perhaps easier to use it from a different language (but I'm not exactly sure/which ones). accelerate supports multicore cpus and Nvidia gpus; futhark supports OpenCL, but their cpu backend is still in development (not sure of the status, it might be complete).
    but as you say, beyond that the details are a bit murky on exactly what features the language supports (or what gpu features/instructions, etc... I myself never done a detailed comparison)
    if you had something specific in mind I might be able to advise if/how well that would look in Accelerate. sometimes @athas is here and he would be able to comment on Futhark
    Hugh Sipière
    @hgsipiere
    it would be nice if it supported AMD GPUs though I assume that's a lot of work
    would it be a small job, like a quick pull request or major backend?
    Hugh Sipière
    @hgsipiere
    i'd probably be doing a lot of linear algebra/numerical methods. A GPU is usually quite good for matrices, with Futhark I'd be writing my own linear algebra functions for the GPU but with Accelerate I can use a CPU BLAS library
    i'm not really sure what is best, i'm a maths student you see not comp sci haha
    Hugh Sipière
    @hgsipiere
    ^^ i just figured out the answer, use both so dw about that