These are chat archives for halide/Halide

20th
Nov 2017
ronlieb
@ronlieb
Nov 20 2017 16:27
Please point me at an example or two of using define_extern to call external code (such as fftw) thx
and
ronlieb
@ronlieb
Nov 20 2017 16:31
thx
Steven Johnson
@steven-johnson
Nov 20 2017 17:11
FYI: I have merged changes that (should) fix the spurious Travis failures we’ve been getting; you may want to sync failing PRs to head and retry.
Steven Johnson
@steven-johnson
Nov 20 2017 18:25
(I’m taking the liberty of bringing a few pending PRs up to date with master for this reason)
Dillon Sharlet
@dsharletg
Nov 20 2017 20:46
so I've found that we can make the tests perhaps significantly faster by using -O0 to build the tests
the problem is, I get a bizarre error whenever the test contains an IRVisitor/IRMutator about the vtable being missing
(for that class)
(linker error)
to give a sense of what we might get, a simple test like correctness_argmax goes from 5 seconds to 3.5 seconds
Steven Johnson
@steven-johnson
Nov 20 2017 20:48
the VTable this is basically the EXPORT nonsense biting us
I think the orthodoxy is that gcc/clang says you should only EXPORT entire classes… trying to EXPORT specific methods (as we do) is hit or miss
(MSVC didn’t have this issue but that’s a whole different ball of worms)
Dillon Sharlet
@dsharletg
Nov 20 2017 20:49
ah, can we export both the class and the methods?
Steven Johnson
@steven-johnson
Nov 20 2017 20:49
I’m starting to think we should just ditch the partial EXPORT stuff entirely and just export all public symbols
yes
class __attribute((visibility(“default”))) MyClass { …}
IIRC, the EXPORT macro was a necessary evil for Windows in the past, but since then, MSVC has added an “export everything” flag to more easily mimic Unixy behavior
Dillon Sharlet
@dsharletg
Nov 20 2017 20:51
FWIW I do think we should keep EXPORT and be selective about what we export
if the fix for this issue is just exporting IRVisitor/IRMutator, that's easy
(trying it right now)
I always preferred windows handling of DLL export/import stuff
it's harder to get right
but cleaner
hmm, exporting IRVisitor doesn't help
hmm, adding EXPORT to IRVisitor and removing it from the individual methods did work!
Steven Johnson
@steven-johnson
Nov 20 2017 20:55
…does it work for clang + gcc + xcode + msvc?
Dillon Sharlet
@dsharletg
Nov 20 2017 20:55
not sure, I think I'm only testing it with gcc
Steven Johnson
@steven-johnson
Nov 20 2017 20:55
…with all optimization levels?
Dillon Sharlet
@dsharletg
Nov 20 2017 20:55
I will push a branch and see what the build bots say once I make sure everything else is working
first I'm going to see if this even makes a real dent in the test time
Andrew Adams
@abadams
Nov 20 2017 20:58
There are now a variety of hacky research projects that want to use the internals of Halide. The EXPORT thing just gets in their way
Dillon Sharlet
@dsharletg
Nov 20 2017 20:59
for hacky stuff, there's always just linking to the static library, right?
Andrew Adams
@abadams
Nov 20 2017 20:59
hrm, true
but I question the value of the EXPORT thing
It doesn't provide much safety, if any, or seem to reduce library sizes much
Dillon Sharlet
@dsharletg
Nov 20 2017 21:00
so the thing I was going to try next to improve build/test time, is removing all the internal stuff from halide.h
Andrew Adams
@abadams
Nov 20 2017 21:00
All it seems to do is clutter the code and annoy us
Dillon Sharlet
@dsharletg
Nov 20 2017 21:00
and for tests that need those internals, include them manually (e.g. #include <Simplify.h>)
Steven Johnson
@steven-johnson
Nov 20 2017 21:00
If we could use it to hide all of LLVM while exposing Halide, that would be a selling point. Maybe.
Andrew Adams
@abadams
Nov 20 2017 21:01
If including Halide.h is an issue
we should just configure the build to use a PCH for it
IIRC, it's trivial to do
Dillon Sharlet
@dsharletg
Nov 20 2017 21:02
I don't know yet if it is or not, not sure
I just feel like it takes way longer to build the simple corretness tests than it should
it seems to take 3-5 seconds to compile one page correctness tests
and often <<1 second to run the compiled test
Andrew Adams
@abadams
Nov 20 2017 21:03
I'd try precompiling the header before trimming Halide.h. I suspect turning on PCH is easier
ImageStack uses PCH and it sped up the build quite a lot
Dillon Sharlet
@dsharletg
Nov 20 2017 21:23
test_correctness with -O3 = 22 min, with -O0 = 16 min
half that might be simd_op_check :)
I think this might make a significant dent on the build bots
especially if we can apply it to more tests, going to try that soon
Steven Johnson
@steven-johnson
Nov 20 2017 21:24
+1
Zalman Stern
@zvookin
Nov 20 2017 22:19
Yeah, optimizing most tests seems silly
That said, I think we have at least one that does some performance stuff
I.e. comparing time to run C loops
Andrew Adams
@abadams
Nov 20 2017 22:20
For me argmax takes 3.1s at O3, 1.95s at O0, 1.97s with O3+PCH, and 1.0s with O0+PCH
So looks like you get a 1s speedup from O3->O1, and an independent 1s speedup for turning on PCH
Zalman Stern
@zvookin
Nov 20 2017 22:21
Tasty tasty low hanging fruit
Andrew Adams
@abadams
Nov 20 2017 22:21
Given that we mash everything into a single huge header, precompiling it is obvious in hindsight
Zalman Stern
@zvookin
Nov 20 2017 22:21
The fact that precompiled headers are still a thing you turn on is kind of silly
Andrew Adams
@abadams
Nov 20 2017 22:22
Note that all the buildbots use ccache, so this should only be an issue when Halide.h changes
Zalman Stern
@zvookin
Nov 20 2017 22:22
?
I'd expect any test .cpp changing to incur the full cost
Andrew Adams
@abadams
Nov 20 2017 22:22
Yeah, or that
But if you don't use ccache, you get the overhead of compiling every test from scratch from every trivial change inside a Halide cpp file
Zalman Stern
@zvookin
Nov 20 2017 22:23
That's why I said this should just be in the compiler
ccache should extend into the PCH mechanism
If the input bits didn't change, the PCH didn't change
Andrew Adams
@abadams
Nov 20 2017 22:23
Isn't it the role of a good build system to cache everything it can? Rather than the compiler?
Not sure if it does PCH
bazel just does this, right?
Zalman Stern
@zvookin
Nov 20 2017 22:24
Interesting question
I hope so
I guess I was saying that the compiler needs to provide some help on the PCH
Andrew Adams
@abadams
Nov 20 2017 22:25
true, you need to at least expose it as a build step that can be cached
Gotta run to a meeting. @dsharletg either turn on PCH in your PR, or I'll do it in a follow-up.
Zalman Stern
@zvookin
Nov 20 2017 22:26
I could blame cmake instead of the compiler if you prefer :-)
Dillon Sharlet
@dsharletg
Nov 20 2017 22:29
@abadams I think they should be separate PRs anyways and it sounds like you already did the work?
Andrew Adams
@abadams
Nov 20 2017 22:30
It's nearly a one-liner
Happy to do it as a separate PR
Dillon Sharlet
@dsharletg
Nov 20 2017 23:17
Looks like -O0 speeds up ARM test building by almost 2x
332 = -O0
Andrew Adams
@abadams
Nov 20 2017 23:17
cool
But IMO we should leave it at O3 for the performance tests
To avoid it as a future pitfall
Dillon Sharlet
@dsharletg
Nov 20 2017 23:48
done