Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Snektron
    @snektron:matrix.org
    [m]
    Apparently that extension was promoted to core-optional in OpenCL 1.2, so thats why mesa is giving me trouble. Apparently all that is needed is a pragma in the OpenCL code, which is still accepted for backwards compat: https://www.khronos.org/registry/OpenCL//sdk/2.2/docs/man/html/cl_khr_fp64.html
    Let me check if my device supports it, need to turn on my pc one sec
    Troels Henriksen
    @athas
    That pragma should be inserted by the compiler.
    Snektron
    @snektron:matrix.org
    [m]
    hm, mesa does report the device supporting cl_khr_fp64
    Troels Henriksen
    @athas
    Mesa has lied to me before. For example, it claimed to support OpenCL, but whenever I tried to run a kernel, my system would reboot.
    Snektron
    @snektron:matrix.org
    [m]
    wow
    doing a quick search in the generated c code after compiling with futhark opencl does not yield any cl_khr_fp64
    Troels Henriksen
    @athas
    Interesting. That would imply that the compiler doesn't think your program uses f64, but still includes parts of the supporting code. That's a very likely bug to creep in.
    So first, does your program use f64?
    1 reply
    Ah, I see something.
    We mistakenly defined f32.sgn with double precision. I'll fix it.
    Snektron
    @snektron:matrix.org
    [m]
    Interestingly, my program now crashes with out-of-bounds when running under mesa, but not when running under amdgpu-pro or any other backend
    I suppose that might be attributed to mesa shenanigans
    Troels Henriksen
    @athas
    That's really odd.
    Okay, I made some fixes to the double precision stuff.
    I can't explain what might be going wrong with the out-of-bounds accesses.
    Snektron
    @snektron:matrix.org
    [m]
    Hm, its still complaining about double's. Was there a way to dump the opencl source code to a file?
    Troels Henriksen
    @athas
    --dump-opencl kernels.cl
    Snektron
    @snektron:matrix.org
    [m]
    Ah, whoops, i hadn't compiled in the most recent commit, sorry about that. It seems to run now (with the same bounds problems, but i'll look into those later). Thanks :)
    Troels Henriksen
    @athas
    Those bounds issues are nasty, but it is likely to be a bug in the Mesa compiler. AMD also had a bug where they miscompiled my gotos by mistakenly thinking they were loops.
    You can try putting #[unsafe] in front and see if that makes it run properly.
    OpenCL interestingly does not have any restrictions on goto, which is probably a mistake. There's no way a GPU can handle non-reducible control flow. But the only use of goto in the generated code is to break out of nested loops, which is reducible, so it really ought to work.
    Snektron
    @snektron:matrix.org
    [m]
    Unsafe does seem to work. Strange, because it does print some wild index that isn't supposed to happen at all.
    Its a shame opencl implementations are plagued by these kinds of bugs. Unfortunately i also encountered things like implementations not adhering to alignment rules of struct members and some other nasty bugs.
    Troels Henriksen
    @athas
    If the Mesa compiler miscompiles the code, it'll probably jump past the code that computes the index.
    Snektron
    @snektron:matrix.org
    [m]
    I guess as long as the other platforms work its not a critical problem.
    Troels Henriksen
    @athas
    I think it's disappointing that AMDs code quality is so relatively poor. I have hope it will improve, for two reasons: 0) generally increased funding due to the massive success of Ryzen. 1) The large AMD-based supercomputers they are contracted to support.
    ROCm is undergoing very active development these days to support these supercomputers. Of course, active development doesn't necessarily imply a reduction in bugs...
    2 replies
    Snektron
    @snektron:matrix.org
    [m]
    Note that Mesa is not by AMD so AMD isn't really to blame here i think
    Troels Henriksen
    @athas
    I can't really blame Mesa for their bugs. They are short-staffed and focused on graphics, and their graphics implementation is very good indeed.
    Snektron
    @snektron:matrix.org
    [m]
    in fact, the amdgpu-pro implementation doesn't have these problems. I just think its sad that AMD doesn't upstream their improvements, its the least they could do really.
    Troels Henriksen
    @athas
    I think AMD has more or less ditched their own OpenGL/Vulkan stack and just use Mesa on Linux.
    Snektron
    @snektron:matrix.org
    [m]
    Anyway
    For my current iteration of the parser generator stuff, i generate a module that a parser module accepts. That module defines some other modules in turn, like which type is used to represent productions, for example.
    The thing is though, that i only use that module structure because i'd like to use those types to index some generated arrays, so i pretty much only use to_i64
    Do you think there's a way around that, or should i just take the L and keep the module structure?
    (the types are unsigned, which is why i can't use them as indices already)
    Troels Henriksen
    @athas
    What is the problem?
    You don't need an module just to carry a single type.
    Snektron
    @snektron:matrix.org
    [m]
    I'd like to call to_i64 on the type so i can use it as an array index
    Troels Henriksen
    @athas
    Your top module could just include
    type t
    val to_i64 : t -> i64
    No need for a module.
    Or if you want to abstract it futher:
    type t
    val index [n] 'a : [n]a -> t -> a
    Snektron
    @snektron:matrix.org
    [m]
    I'd like to not have to have that module at all, and just say "use this value as index"
    For signed values, this is already possible, but i have some unsigned values as well
    Troels Henriksen
    @athas
    I think you need to show me some code. I have a hard time visualising this.
    Snektron
    @snektron:matrix.org
    [m]
    While thinking of an example i realized one needs to have a module anyway to restrict a type parameter

    I was originally thinking of something like

    let index [n] 't 'i (ts: [n]t) (index: i) = ts[i]

    but that won't compile regardless

    Gusten Theodor Isfeldt
    @Gusten_Isfeldt_gitlab

    @Gusten_Isfeldt_gitlab Also, be careful in general. I never finished implementing proper synchronisation for accumulator operators that cannot be implemented with hardware atomics (basically, primitive operators).

    @athas would this manifest as incorrect results, and if so, would the errors in general be small, like missed values in the accumulation?

    Troels Henriksen
    @athas
    @Gusten_Isfeldt_gitlab Yes, that seems like a likely outcome.