Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Pepijn de Vos
    @pepijndevos
    I thought the deal was that full flattening is incredibly inefficient, so Futhark does partial flattening to obtain a reasonable trade-off between sequential code and communication overhead. Curious to understand why recursion is hard and what can be done etc.
    Troels Henriksen
    @athas
    @pepijndevos That is correct, but we'll soon be enabling a new compilation scheme in which multiple increasingly-flattened versions of the code are produced, and the best one picked at run-time. My idea is to employ full flattening as the final version that is picked if nothing else works.
    While I want to have that to handle things that Futhark already permits (in particular irregular parallelism), it will not be particularly efficient.
    Pepijn de Vos
    @pepijndevos
    And why does only full flattening support recursion?
    Troels Henriksen
    @athas
    It's the only scheme I have seen that supports recursion without having some other significant restrictions.
    For example, I think Harlan supports recursion by simply emulating a stack on the GPU. As I recall, the cost is that parallel operations inside recursive calls are simply sequentialised.
    (Also, the stack operations will be incredibly slow.)
    Full flattening turns the recursion inside out, so to speak, so you get full parallelisation and nice memory access patterns. The downside is that while the memory is accessed decently enough, it will be accessed a lot.
    Orestis
    @omalaspinas_twitter
    H
    Hello. Very nice work. I was wondering if there was any plan to extend futhark to say multi threaded C (something like
    Orestis
    @omalaspinas_twitter
    POSIX) or even mpi for distributed memory?
    Troels Henriksen
    @athas
    @omalaspinas_twitter MPI/distribution is probably rather long-term for various reasons, but I hope to start work on a multicore-CPU backend this fall.
    The OpenCL currently generated by the Futhark compiler runs OK on a CPU OpenCL implementation too, although there are still some GPU-isms in the optimisation pipeline.
    I do still want a multicore backend that does not depend on OpenCL, though.
    Orestis
    @omalaspinas_twitter
    Great. Would there be a way to contribute?
    Troels Henriksen
    @athas
    Right now it's still a research-level project, so any contributions would have to be very fundamental, at the level of "what should the IR even look like?", or "what kind of runtime system should be targeted (OpenMP or a custom scheduler)?"
    You're welcome to chip in, but it's going to be real time-consuming to answer such questions.
    Once the foundations are laid, hopefully smaller and more well-defined issues will open up.
    Orestis
    @omalaspinas_twitter
    Actually I was thinking about the possibility to apply for a grant to work on the topic (either shared memory or event better mpi stuff). I don't know if you would be interested. Can I contact you by e-mail?
    Troels Henriksen
    @athas
    nrootconauto
    @nrootconauto
    BOOL arrays,are the optimized
    jakehehrlich
    @jakehehrlich
    I'm curious as to how futhark is compiled. Is it compiled to C or is it compiled to assembly and ptx in a single binary?
    Troels Henriksen
    @athas
    @nrootconauto what?
    @jakehehrlich it generates either C or Python, with embedded GPU kernels.
    jakehehrlich
    @jakehehrlich
    And the embeded GPU kernels are always just marked with "__kernel" I suppose?
    Troels Henriksen
    @athas
    For OpenCL, sure.
    jakehehrlich
    @jakehehrlich
    What are the other options? The documentation only seems to mention OpenCL
    Troels Henriksen
    @athas
    They are embedded as a string that is compiled at run-time.
    jakehehrlich
    @jakehehrlich
    Oh woah...where does the compiler live? In the GPU driver?
    Troels Henriksen
    @athas
    In the OpenCL driver. It's technically distinct from the GPU driver itself (runs in userspace), but it is typically part of the same overall package.
    jakehehrlich
    @jakehehrlich
    I had no idea OpenCL worked that way. What are the futhark OpenCL alternatives? Code but not documentation suggests Cuda and Vulkan?
    Troels Henriksen
    @athas
    You mean non-OpenCL backends for Futhark? Only CUDA and sequential C. The Vulkan backend is in practice not useful (and in a separate, unmerged branch for now).
    jakehehrlich
    @jakehehrlich
    How does one invoke futhark to get the source out rather than having it generate an so?
    Troels Henriksen
    @athas
    Which source? The host-level source or the GPU kernels?
    jakehehrlich
    @jakehehrlich
    Both? I'm intrested in the generated code and how I can drive the steps to get to a final binary
    Troels Henriksen
    @athas
    When you compile a Futhark program foo.fut with futhark opencl, a foo.c file will be left in the current directory. This file will also contain the GPU kernels as a string, but it's more convenient to extract them by running ./foo --dump-kernels foo.cl.
    jakehehrlich
    @jakehehrlich
    Is there a way to get futhark to only spit out the source?
    Troels Henriksen
    @athas
    Sort of. Pass --library.
    But then you won't get the source for the input parsing machinery. On the other hand, you'll get a .h file!
    jakehehrlich
    @jakehehrlich
    That gives a .h file, a .c file without input parsing...and doesn't give a .so?
    I was under the impression that that produced a .so
    Troels Henriksen
    @athas
    It does not. The .so is something you'll have to put together yourself.
    It's too system-dependent, so the Futhark compiler doesn't even try.
    jakehehrlich
    @jakehehrlich
    Perfect. That's what I was hoping to hear
    Troels Henriksen
    @athas
    It's assumed that if you want to use a Futhark program as a library, you already have a build system for whatever application you're really writing, and you probably have opinions on how compiling and linking should be done.
    (E.g. typically I don't compile to an .so, I compile to an .o and link directly.)
    jakehehrlich
    @jakehehrlich
    Yeah the instructions made it sound like using an .so was the only way but I'd generally want to statically link
    I certainly have opinions on how to compile and link lol
    Only one translation unit is output, correct?
    Troels Henriksen
    @athas
    That is correct.