Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Trevor L. McDonell
    @tmcdonell
    @athas ah yes you are right, that was a problem too; that impedance mismatch going from HS expressions to C statements
    statusfailed
    @statusfailed_gitlab
    Is there a way to evaluate an Exp a? Usually when I type in an expression of that type at the REPL, it shows me a value, but sometimes I get a big pretty-printed expression
    Trevor L. McDonell
    @tmcdonell
    the only way to evaluate things is with run (and it's variants) which all evaluate array expressions. but you can create a scalar (one element) array with unit
    statusfailed
    @statusfailed_gitlab
    Ah cool, ok!
    Trevor L. McDonell
    @tmcdonell
    what you are seeing is the show instance for Exp (functions and expressions), and I guess the simplifier is able to reduce it down to a single value in some cases. There is a show instance for Acc (functions and expressions) which does the same thing too.
    Robbert van der Helm
    @robbert-vdh
    @statusfailed_gitlab I use this in my tests:
    evalExp :: Elt a => Exp a -> a
    evalExp e = head . A.toList $ run (unit e)
    statusfailed
    @statusfailed_gitlab
    Ah nice :-)
    I will steal that- I also want to write unit tests for my expressions :D
    @tmcdonell actually the simplifier seems really clever- I have only run into a couple cases where it's not able to reduce into a single value
    this particular one has a 'coerce' at the top, maybe that's why?
    (in fact, for a long time I thought the Show instance for Exp a was actually evaluating the expression, not just pretty-printing it)
    Troels Henriksen
    @athas
    What is the easiest way to compile accelerate-examples without any CUDA stuff? Setting the llvm-ptx flag to false does not seem to do the trick.
    Troels Henriksen
    @athas
    I figured it out: an llvm-ptx: false flag on both accelerate-examples and accelerate-fft.
    Trevor L. McDonell
    @tmcdonell
    yes, I was just about to say that. sorry I didn't get your message in time!
    Troels Henriksen
    @athas
    Do you know if Accelerate does something particularly fancy to the nbody example when compiled with the llvm-cpu backend? It is much faster than I would expect (runtime does not seem to scale quadratically with n).
    Trevor L. McDonell
    @tmcdonell
    no, there's no special code path for the cpu backend
    Troels Henriksen
    @athas
    Does Accelerate do the equivalent of a C compiler's -ffast-math?
    Trevor L. McDonell
    @tmcdonell
    I haven't looked at the generated code in a while (possibly never for the cpu backend, that was implemented when we were still generating CUDA!)
    yes, it does do that
    Troels Henriksen
    @athas
    Oh, cool, that makes sense.
    I'm asking because I have a student who is finishing up a thesis on a multicore backend for Futhark, and I am helping him benchmark Accelerate to compare to a more mature backend. Performance is mostly identical for compute-bound programs, but sometimes Accelerate is way faster on pretty straightforward code (like nbody), which goes away if I recompile the Futhark-generated code with -ffast-math.
    Have you had any trouble in practice with using -ffast-math? I have been too paranoid to use it.
    Trevor L. McDonell
    @tmcdonell
    it came up once before, which is why these compensated sum functions exist (which effectively disable -ffast-math)
    *which also
    Troels Henriksen
    @athas
    Ah, cool! I also considered whether one could exploit the fact that if a user asks for parallel summation of floats, then they tacitly claim that float addition is commutative, and thus clearly they don't mind losing a bit of accuracy.
    But it also looks like -ffast-math does stuff like use CPU instructions for e.g. square roots, rather than calling the math library. I'm less sure how to handle that.
    Trevor L. McDonell
    @tmcdonell
    yeah, I'm a bit in two minds about it as well, but as you said there is a sort of tacit agreement here. at least in LLVM -ffast-math is an alias for a few different options, so you could choose to enable only the ones you are comfortable with
    (and on a per-instruction basis)
    Trevor L. McDonell
    @tmcdonell
    @SlavMFM not sure if you are still in the channel, but out of curiosity what OS(s) are you running? it helps planning where to spend development effort etc., if we want to start on an AMD target
    Slaus Blinnikov
    @SlavMFM
    @tmcdonell oh, a bit embarrassing ^^, I hope not to draw attention away from other important directions! I have Ubuntu Linux, but distro doesn't matter I guess, because I had to update kernel to v.5.4 because it was the only way to get OpenCL working: https://askubuntu.com/questions/1209725/how-to-get-opencl-support-for-navi10-gpus-from-amd/1211465#1211465 .
    Trevor L. McDonell
    @tmcdonell
    good to know, thank you! (:
    Callan McGill
    @Boarders
    If I wanted to work with a vector of an arbitrary but statically known size with accelerate how would I do that? For example in the k-means example it sticks with doing it for tuples but how would one work with arbitrary known sized vectors (even if just up to the tuple size accelerate supports)?
    Trevor L. McDonell
    @tmcdonell
    Hi @Boarders! If I get what you mean, we don't have anything special to support that sort of thing. I think you'll need to define a type class covering the types you are interested in, and then your accelerate functions are parameterised over that. what you'd do in regular Haskell basically.
    Callan McGill
    @Boarders
    Cool, thank you
    Hugh Sipière
    @hgsipiere
    hi, how does this project compare to futhark?
    the main difference i see is that Accelerate is an EDSL compared to having its own external file parser or that futhark supports amd gpus
    other than that, it isn't so clear?
    Trevor L. McDonell
    @tmcdonell
    yes, that's basically it. the projects have broadly similar goals: they are languages for computation on data-parallel arrays. if you are working in Haskell I expect Accelerate will be easiest. since futhark is a standalone compiler it is perhaps easier to use it from a different language (but I'm not exactly sure/which ones). accelerate supports multicore cpus and Nvidia gpus; futhark supports OpenCL, but their cpu backend is still in development (not sure of the status, it might be complete).
    but as you say, beyond that the details are a bit murky on exactly what features the language supports (or what gpu features/instructions, etc... I myself never done a detailed comparison)
    if you had something specific in mind I might be able to advise if/how well that would look in Accelerate. sometimes @athas is here and he would be able to comment on Futhark
    Hugh Sipière
    @hgsipiere
    it would be nice if it supported AMD GPUs though I assume that's a lot of work
    would it be a small job, like a quick pull request or major backend?
    Hugh Sipière
    @hgsipiere
    i'd probably be doing a lot of linear algebra/numerical methods. A GPU is usually quite good for matrices, with Futhark I'd be writing my own linear algebra functions for the GPU but with Accelerate I can use a CPU BLAS library
    i'm not really sure what is best, i'm a maths student you see not comp sci haha
    Hugh Sipière
    @hgsipiere
    ^^ i just figured out the answer, use both so dw about that
    Troels Henriksen
    @athas
    Accelerate is much easier to use if the rest of your code is in Haskell. Futhark is probably easier to use from other languages, although I do recall reading a paper about an Accelerate FFI. There are also significant differences in language capabilities, but these technical concerns are likely more important in most cases.
    Trevor L. McDonell
    @tmcdonell
    with accelerate there are (some) bindings to BLAS libraries on the GPU, so that is your best bet. currently they are limited, but it's easy to add more (just let us know which ones you need)
    HugoPeters1024
    @HugoPeters1024

    Hi quick question if that's okay, I'm trying out integration with gloss to display a texture generated by accelerate. As a minimal example I tried the following

      genPicture :: Array DIM2 Word32                                         
      genPicture = fromList (Z:.640:.480) $ repeat 255

    but the resulting window remains the same as the background color as if the picture is invisible

    I assumed that the RGBA values were encoded as 4 bytes in the word32 so I expected 255 to produce a black non-opaque image (0x000000ff)
    Trevor L. McDonell
    @tmcdonell
    yes that sounds correct. I guess you are using it together with bitmapOfArray ?