These are chat archives for halide/Halide

10th
Nov 2017
Andrew Adams
@abadams
Nov 10 2017 01:23
Are there no tests for prefetch? I have a change that speeds it up a ton, but I can't find any way to see if it's correct
It's used in the hexagon benchmarks, but there's no validation of correctness there, and a broken prefetch wouldn't necessarily cause a validation failure anyway
@dsharletg @psuriana how do we test prefetch?
Andrew Adams
@abadams
Nov 10 2017 01:37
and when I ask it to fetch nine elements with a stride of 1, reduce prefetch dimensions decides to load two whole cache lines
Is that WAI? I guess it's possible than 9 elements could span two cache lines
Steven Johnson
@steven-johnson
Nov 10 2017 16:09
I also have been unable to get python bindings buoldable on local machines
The python bindings definitely need live
Love, that is
Is it possible to write Generators with them? That seems like a nice to have fit.
But right now I just want to be able to build them.
Patricia Suriana
@psuriana
Nov 10 2017 17:20
@abadams we don't really have a test for prefetch. I just eyeballed the IR to check if it's reasonable.
@abadams what is WAI?
Andrew Adams
@abadams
Nov 10 2017 17:21
Fetching two cache lines to retrieve 9 elements
Dillon Sharlet
@dsharletg
Nov 10 2017 17:21
I test prefetching with hexagon_matmul
Andrew Adams
@abadams
Nov 10 2017 17:21
I concluded that it is
Patricia Suriana
@psuriana
Nov 10 2017 17:21
@abadams yeah, it's a bit tricky how to test prefetch. Maybe I should just make an easy synthetic test case and verify it prefetched the expected region? What do you think?
Dillon Sharlet
@dsharletg
Nov 10 2017 17:22
hexagon_matmul has both validation and it checks performance
but only on device obviously
Andrew Adams
@abadams
Nov 10 2017 17:22
Not useful to me here :)
Patricia Suriana
@psuriana
Nov 10 2017 17:22
Is this prefetch on hexagon or CPU?
Andrew Adams
@abadams
Nov 10 2017 17:22
or on the buildbots, more importantly
I don't actually want to use prefetch, I just want to make it take less time during lowering and have confidence that I'm not breaking it
Dillon Sharlet
@dsharletg
Nov 10 2017 17:22
well, we probably should be running that app on the CPU and simulator at least :)
I will test your PR
but yeah we should also consider a synthetic test that checks the prefetched area
Patricia Suriana
@psuriana
Nov 10 2017 17:23
I will make up some synthetic test case for prefetch
Andrew Adams
@abadams
Nov 10 2017 17:23
A custom lowering pass that rewrites the prefetch calls to a regular old extern call to a function inside the test might be easy to do
Patricia Suriana
@psuriana
Nov 10 2017 17:24
@abadams for the prefetching 2 cache line things, is it on hexagon or CPU?
Andrew Adams
@abadams
Nov 10 2017 17:24
(via Func::add_custom_lowering_pass)
CPU
Patricia Suriana
@psuriana
Nov 10 2017 17:25
for CPU, the prefetch is lowered to loop of single prefetch using the __builtin_prefetch
Andrew Adams
@abadams
Nov 10 2017 17:26
It was just prefetching more than I expected, but I guess it's conservative
Patricia Suriana
@psuriana
Nov 10 2017 17:27
seems like __builtin_prefetch reads 1 cache line
do you check the IR? Is it prefetching the right size?
Andrew Adams
@abadams
Nov 10 2017 17:28
It was injecting a loop of size two
so two calls to __builtin_prefetch
each loads like 64 bytes
so it was prefetching 128 bytes
Each iteration of the loop loads 36 bytes, so I was surprised
but I guess that 36 bytes could span two cache lines
Patricia Suriana
@psuriana
Nov 10 2017 17:29
Yeah it's possible I guess depending where the address starts. Do you have an easy test case to reproduce this which I can take a look at?
Dillon Sharlet
@dsharletg
Nov 10 2017 17:33
hexagon_matmul looks fine with the prefetch PR
Andrew Adams
@abadams
Nov 10 2017 17:35
Don't worry about the two-cache-line thing. I was just confused at first, but it makes sense to me now.
Dillon Sharlet
@dsharletg
Nov 10 2017 17:42
What do you all think about adding a "Comment" IR node? The use case I have in mind is that I want to add a comment containing the lowered Halide IR that is behind an offloaded device API call
when you compile_to_lowered_stmt for a pipeline that has offload steps, you still see the IR that got compiled to device
right now you just see a call to halide_cuda_run (or whatever) and a refernece to a binary blob
basically compile_to_lowered_stmt right now is useless for pipelines that do anything significant on a device
Andrew Adams
@abadams
Nov 10 2017 17:44
IMO it should be a module-level node
so not really an IR node
A special type of Buffer?
Or you want it situated at the site where the kernel was lifted out?
Evaluate(StringImm()) would do that
Hrm, maybe
might not pretty-print right
If we want this to print to HTML, it should be another LoweredFunc in the module
even if it was already compiled, and is just there for reference
Dillon Sharlet
@dsharletg
Nov 10 2017 17:47
doesn't need to be pretty-print
Andrew Adams
@abadams
Nov 10 2017 17:47
Anyway, for string comments I've used Evaluate(StringImm) and it works fine
Dillon Sharlet
@dsharletg
Nov 10 2017 17:47
OK I will do that
Andrew Adams
@abadams
Nov 10 2017 17:47
I think you want this to print to HTML correctly
Dillon Sharlet
@dsharletg
Nov 10 2017 17:47
I do want it situated at the call site
Andrew Adams
@abadams
Nov 10 2017 17:47
I don't think the StringImm trick is a good idea for anything non-trivial
Dillon Sharlet
@dsharletg
Nov 10 2017 17:47
I mean, that would be nice :) But I don't see how to realistically do it without a mess
(supporting HTML that is)
Andrew Adams
@abadams
Nov 10 2017 17:48
Have a list of Stmts in the module called device_function_bodies
that get printed whenever you dump a module
or a list of LoweredFuncs
Dillon Sharlet
@dsharletg
Nov 10 2017 17:49
Hmm, so they wouldn't appear at the site of the offload call like I was going to do, but I guess that's not a big deal
Andrew Adams
@abadams
Nov 10 2017 17:50
You could also add a DeadCode IR node that holds a Stmt. Visitors and mutators by default would not enter it. It's frozen and meaningless. But the printer would enter it.
Stmt and an explanatory string, I guess
It might do things like use lifted temporaries that no longer exist, which would be confusing
Zalman Stern
@zvookin
Nov 10 2017 18:09
We've talked about a feature to write various device code variants to files for debugging.
Andrew Adams
@abadams
Nov 10 2017 20:13

Does it work to do something like:

define SAVE_FOO FOO

define FOO BAR

...

define FOO SAVE_FOO

undef SAVE_FOO

ugh, markdown strikes again
(this would be for making a user_assert macro defined before Halide.h not make things explode)
more generally:
ifdef FOO
define SAVE_FOO FOO
endif
define FOO BAR
(use FOO)
undef FOO
ifdef SAVE_FOO
define FOO SAVE_FOO
undef SAVE_FOO
Trying to push the existing definition of some macros onto a stack at the start of Halide.h, and pop them off at the end
Andrew Adams
@abadams
Nov 10 2017 20:20
We wouldn't need these macros at all if I could set FILE and LINE as default parameter values and have them apply at the call site. I think g++ has an extension for this, but I can't find it.
builtin_LINE(), builtin_FILE(), etc
Andrew Adams
@abadams
Nov 10 2017 20:29
(but clang doesn't support them)
Zalman Stern
@zvookin
Nov 10 2017 21:05
I think part of the reflection work is going to introduce a portable version of the builtin_LINE() that can be used as default args or similar.
But I don't think it made it into C++17
I built OpenCV with Halide support. Just requires adding exactly the right incantation to the cmake line.
We should likely bundle the Halide python support into a pip installable package.
OpenCV does not have one yet.
Andrew Adams
@abadams
Nov 10 2017 21:07
There was an attempt at that by jrk's brother
Zalman Stern
@zvookin
Nov 10 2017 21:07
Also might be worth adding direct OpenCV interop to the python support.
Ala the numpy stuff we have now
Andrew Adams
@abadams
Nov 10 2017 21:08
but I had a lot of trouble just getting it compile, and it could only compile single-threaded
so it slowed down the buildbots quite a bit
Zalman Stern
@zvookin
Nov 10 2017 21:08
IS it on a branch somewhere?
Andrew Adams
@abadams
Nov 10 2017 21:08
The PR may still be open. I think the user is minrk
Oh wow, the cmakebuild was ignoring HalideFooter.h
so all those macros were leaking into user code
Zalman Stern
@zvookin
Nov 10 2017 21:12
Joy.
Andrew Adams
@abadams
Nov 10 2017 21:12
Yeah, the prefixing trick means I can delete that file anyway
Hrm... this doesn't work for EXPORT, which ends up in Halide.h
ugh, so do internal_assert et al
(in inlined functions)
Zalman Stern
@zvookin
Nov 10 2017 21:17
yep
Andrew Adams
@abadams
Nov 10 2017 21:17
I guess for the ones in headers we can just qualify them
qualifying EXPORT is kinda annoying
Zalman Stern
@zvookin
Nov 10 2017 21:17
IT becomes very difficult to reason about expansion
I'm not going to worry about EXPORT too much
I mean I guess we should rename it to HALIDE_EXPORT
Andrew Adams
@abadams
Nov 10 2017 21:18
I'm increasingly thinking we should just always export all symbols
given that people are trying to reuse our compiler internals in ad-hoc research projects
Zalman Stern
@zvookin
Nov 10 2017 21:19
Yeah, maybe
I thought there were some edge cases where we wanted to avoid doing so. E.g. symbols from the C++ libary or if we'd linked in our runtime?
Andrew Adams
@abadams
Nov 10 2017 21:20
I don't think so - we used to just export everything on linux
Zalman Stern
@zvookin
Nov 10 2017 21:20
I.e. switching to default hidden had some advantage
Andrew Adams
@abadams
Nov 10 2017 21:20
only started hiding things so we'd discovered lack-of-export bugs without having to run the windows buildbot
Zalman Stern
@zvookin
Nov 10 2017 21:20
Maybe it was llvm
and the GPU drivers that include llvm
on some mobile platform
can't remember
Linking Halide itself in on mobile seems like a no no anyway
Zalman Stern
@zvookin
Nov 10 2017 21:28
I have an IRMutator2::mutate(const Expr &expr) method being called with an undefined Expr argument. Is that allowed?
Andrew Adams
@abadams
Nov 10 2017 21:29
Yes, it's supposed to return an undefined Expr
Zalman Stern
@zvookin
Nov 10 2017 21:29
Ah. ok
Zalman Stern
@zvookin
Nov 10 2017 22:21
So this crop issue is a perfect illustration on why we need (at least) optional bounds checking on the Buffer indexing operators.
I ran into the same thing writing the test for device cropping.
And valgrind/asan/etc. won't catch it
(It may be possible to extend the Buffer code to do the right thing with asan etc. but I doubt it will be faster than just bounds checking.)
Andrew Adams
@abadams
Nov 10 2017 22:23
Adding bounds checks really makes them for writing tests only. They'll be too slow to ever use for anything.
because it will disable autovectorization of code that uses them
Zalman Stern
@zvookin
Nov 10 2017 22:24
It can be optional, but they're a huge issue for test coverage now
Andrew Adams
@abadams
Nov 10 2017 22:24
That file uses assert
should it toggle on NDEBUG?
or I guess the bounds check could just be in an assert
Zalman Stern
@zvookin
Nov 10 2017 22:25
I'll figure out why it isn't on, but it was not firing and I did not see any assertions in the path.
Andrew Adams
@abadams
Nov 10 2017 22:25
I mean that file uses assert elsewhere
Zalman Stern
@zvookin
Nov 10 2017 22:25
Yeah.
Andrew Adams
@abadams
Nov 10 2017 22:25
so it's natural to use it for this
and that lets you turn it off for fast code
Zalman Stern
@zvookin
Nov 10 2017 22:25
Ok, I'll add it
Andrew Adams
@abadams
Nov 10 2017 22:26
I would time a serial make test_correctness with and without the change
just in case
(I doubt it will hurt - the images used in the tests are tiny)
Zalman Stern
@zvookin
Nov 10 2017 22:26
ok