These are chat archives for halide/Halide

Oct 2017
Steven Johnson
Oct 19 2017 18:13
Yet another timeout on Travis this morning.
(This is after ccache was enabled, assuming of course that I enabled it correctly)
Time to check the docs
Steven Johnson
Oct 19 2017 18:18
Looks like we may just have finally crept up to the limits: standard timeout for any job on is 50 mins, and the timeout failures I’ve seen all are in the 48+ mins range. Maybe we just have to split our jobs into smaller pieces or remove some tests from the Travis part.
Steven Johnson
Oct 19 2017 18:41
FYI, according to, looks like the Ubuntu Trusty setup we use promises 2 cores; seems like we should use -j2 with make. Is there a reason we haven’t done so? (Correctness?)
Shoaib Kamil
Oct 19 2017 19:31
Are you talking about the build phase or the test phase?
Steven Johnson
Oct 19 2017 20:20
Entire job is limited to 50 mins (build+test)
And now the non-Travis buildbots seem to be down (502 Bad Gateway)
Marcos Slomp
Oct 19 2017 20:47
Andrew Adams
Oct 19 2017 20:49
Looks like the process crashed or something
restarting it worked
Haven't seen that before
Steven Johnson
Oct 19 2017 20:50
@abadams : waaay back in 2015, you disabled Travis builds of CMake with HALIDE_SHARED_LIBRARY=0 with the comment "Don't build as a static library with cmake. It risks exceeding the travis memory limit.” — any idea why this was happening? Not sure I see what CMake-with-static-library would be more memory-intensive than Make-with-static-library.
Andrew Adams
Oct 19 2017 20:51
The makefile static library building process is incremental. It slowly adds object files
I'm guessing cmake isn't?
It was OOMing
Steven Johnson
Oct 19 2017 20:52
Because of edit history it’s a little confusing because HALIDE_SHARED_LIBRARY=1 is the default for CMake builds anyway.
Andrew Adams
Oct 19 2017 21:37
I figured out how to implement register shuffles without boiling an ocean
(in cuda)
warp shuffles really
It gives a modest improvement to cuda mat mul: from 25% slower than cublas down to 21% slower than cublas
Unfortunately it interacts badly with vectorization for now, which can be important on cuda for reducing the number of memory transcations
So I need to work on that
Steven Johnson
Oct 19 2017 22:23
For grins, I took a look at how long our correctness tests take to run. Here’s the graph:
(huh, can I insert photos into Gitter?)
Anyway, on my laptop: most were <1s, a handful were more than a few seconds, but one was > 100s. Any guesses?
It’s cascaded_filters, of which almost all of its time is in LLVM rather than Halide. (The deliberately-complicated pipeline we hand to LLVM apparently makes it very unhappy.)
Dillon Sharlet
Oct 19 2017 23:02
It might be worth trying it on older LLVMs, maybe it was a regression and we should report it
Steven Johnson
Oct 19 2017 23:03
good thought
Steven Johnson
Oct 19 2017 23:14
Nope: llvm-trunk & 5.0 are about the same for this. 4.0 is even slower.
Zalman Stern
Oct 19 2017 23:23
Very cool on warp shuffles