These are chat archives for halide/Halide

26th
Apr 2018
Kern Handa
@kernhanda
Apr 26 2018 04:18
How does Halide figure out which features are available on a CPU? For example, is armv7s automatically detected or do you have to add it explicitly in some manner (env variable or API call) every time?
I'm using Halide on armv6l, and have to turn no_neon on
wondering if there's a way to add intelligence about the armv6l architecture directly into Halide
Steven Johnson
@steven-johnson
Apr 26 2018 04:20
look for halide_get_cpu_features
Kern Handa
@kernhanda
Apr 26 2018 04:21
those are known features, though, no? not active features?
Steven Johnson
@steven-johnson
Apr 26 2018 05:58
halide_get_cpu_features() detects and returns a mask of features that are available on the current CPU
Kern Handa
@kernhanda
Apr 26 2018 05:59
right. i'm trying to go beyond "available features" to "enabled features"
Steven Johnson
@steven-johnson
Apr 26 2018 06:00
Not sure I understand
Kern Handa
@kernhanda
Apr 26 2018 06:01
as a related aside, i'm seeing "Illegal instruction" being returned for a target(arm-32-linux-jit-no_asserts-no_bounds_query-no_neon) but not for target(arm-32-linux-no_asserts-no_bounds_query-no_neon-no_runtime)
as the only differences are the no_runtime and jit features, is this likely to be a Halide issue or a LLVM one?
Steven Johnson
@steven-johnson
Apr 26 2018 06:02
…setting “jit” in a feature string is probably a bad idea :-)
Kern Handa
@kernhanda
Apr 26 2018 06:02
i'm not
that's just the result of printing it
Steven Johnson
@steven-johnson
Apr 26 2018 06:02
ah.
hm
offhand I don’t know
Kern Handa
@kernhanda
Apr 26 2018 06:02
the jit comes from calling get_jit_target_from_environment
Steven Johnson
@steven-johnson
Apr 26 2018 06:04
alas, it’s late here and I have to log off — I will try to respond tomorrow if no one else does first
Kern Handa
@kernhanda
Apr 26 2018 06:04
no worries. night!
as far as what i meant earlier re: enabling features -- right now, when i do something like get_jit_target_from_environment, halide returns arm-32-linux. would modifying halide_get_cpu_features still be the way to go if i wanted it to return the equivalent of arm-32-linux-no_neon for a particular arm architecture?
Igor Nazarenko
@inazarenko
Apr 26 2018 16:20
is there any way to disable loop partitioning in the schedule?
Zalman Stern
@zvookin
Apr 26 2018 16:22
I don't see a way to do so
Pranav Bhandarkar
@pranavb-ca
Apr 26 2018 16:22
Are you using BoundaryConditions?
Zalman Stern
@zvookin
Apr 26 2018 16:22
Originally it was mostly around likely()
so if you didn't have likely() in place, it wouldn't happen
Igor Nazarenko
@inazarenko
Apr 26 2018 16:22
yes and yes
Pranav Bhandarkar
@pranavb-ca
Apr 26 2018 16:22
So, I have had success in the past by replacing BoundaryConditions with explicit clamps
Zalman Stern
@zvookin
Apr 26 2018 16:23
But I see a few other things that seem to be tied into it now
Igor Nazarenko
@inazarenko
Apr 26 2018 16:23
yep, that'd work
Zalman Stern
@zvookin
Apr 26 2018 16:23
Is this for a specific compilation path where maybe the lowering needs to be modified?
Igor Nazarenko
@inazarenko
Apr 26 2018 16:23
just seems like a decision that would make more sense in scheduling than in the algorithm itself
Zalman Stern
@zvookin
Apr 26 2018 16:24
Is the problem that partitioning is being done badly or that you want to do something else for special reasons?
Igor Nazarenko
@inazarenko
Apr 26 2018 16:24
too much partitioning of loops that are already very large
Zalman Stern
@zvookin
Apr 26 2018 16:24
Part of the idea of BoundaryConditions was to make it more knowledgeable about the backend
Shoaib Kamil
@shoaibkamil
Apr 26 2018 16:24
@kernhanda Can you see if the same issues occur without no_asserts-no_bounds_query in the targets?
Zalman Stern
@zvookin
Apr 26 2018 16:25
Yeah, that is a fairly standard issue and we should figure out a better approach.
It was somewhat a surprise how hard loop partitioning actually is
But it's been a fair while since it has been worked on
Also, the idea was BoundaryConditions would only be used at the input and output of larger pipelines
There may be some other use cases where they are used on a slow path and one doesn't want the likely to be there.
Unfortunately, exposing this via e.g. parameters to the BoundaryConditions functions starts to make them complex enough that it is easier to just write the clamps by hand.
Some form of scheduling would be cool, but that requires having a handle on the internally generated Funcs the BoundaryConditions stuff made. (Plus knowing to do it.)
I wonder if some sort of higher level code size guidance scheduling directives might be doable.
Zalman Stern
@zvookin
Apr 26 2018 16:33
I guess the shorter answer is it's supposed to be schedulable via likely() and BoundaryConditions is trying to make all the details go away and that doesn't work.
Kern Handa
@kernhanda
Apr 26 2018 17:34
@shoaibkamil, thanks. looking into that