These are chat archives for
Sign in to start talking
a language for fast, portable data-parallel computation
Apr 13 2018 08:20
According to Scott Gray here, you still have to drop down to assembly to access this kind of stuff with Cuda anyway:
Apr 13 2018 14:16
For those particular things there are halide scheduling directives. The challenging thing that remains is getting register allocation right. I can't seem to write an sgemm at cublas' block size without using more registers.
Apr 13 2018 16:44
Note also that nvidia is seeking to fix this state of affairs with cutlass
I get the sense that it acts as a bunch of performance unit tests for their compiler people. They want it to be possible to get peak performance from CUDA C
Apr 13 2018 18:19
Anyone in the core Halide dev team planning on speaking at CppCon 2018?
Apr 13 2018 18:29
Not that I've heard.
Apr 13 2018 18:50
Windows buildbots are down (no surprise) but also not visible on network (eep).
Apr 13 2018 20:56
Yeah Cutlass looked quite interesting, I didn’t play with it yet though