These are chat archives for halide/Halide

22nd
Nov 2017
Suyog
@suyogsarda_twitter
Nov 22 2017 14:47
@dsharletg i have a small pull request for removing max check for vlut as discussed yesterday pranavb-ca/Halide#6 (i had to push to pranavb-ca branch as i am not added as contributor to Halide repo). Can you please check that? Thanks
Dillon Sharlet
@dsharletg
Nov 22 2017 16:16
Anyone can open a PR to Halide
looks good
Andrew Adams
@abadams
Nov 22 2017 16:18
Steven's away, right?
The llvm 3.9 builders are still running on trunk (and failing, unsurprisingly)
So I guess whatever he did to turn them off didn't stick
Not urgent - we can just let them fail
They don't run on PRs
The cmake buildbots aren't running the correctness tests
So we only have build-breakage coverage on windows
Andrew Adams
@abadams
Nov 22 2017 16:27
I opened an issue
Suyog
@suyogsarda_twitter
Nov 22 2017 16:30

@dsharletg Thanks. There is one more check of max<256 for vlut @ https://github.com/pranavb-ca/Halide/blob/52c06813eec1f6cb6c0ca65b97f871cd15948db9/src/CodeGen_Hexagon.cpp#L910

Should that be removed as well?

Dillon Sharlet
@dsharletg
Nov 22 2017 16:32
looks like it
Suyog
@suyogsarda_twitter
Nov 22 2017 16:34
Will remove it too then :)
Zalman Stern
@zvookin
Nov 22 2017 17:09
Is the alignment fix failing on Windows or just not tested? I’m out today. But mostly sitting around waiting right now.
Reading the async/semaphore comment now.
Andrew Adams
@abadams
Nov 22 2017 17:11
The win64 buildbot hadn't run yet
The win32 had, so it's almost certainly fine, I just wanted to alert you that the magic number of successful builds to wait for is 13
I think I've figured out how to do it with semaphores
So that comment is a little stale
Zalman Stern
@zvookin
Nov 22 2017 17:12
Ok. Happy to discuss or review the design.
Generally the main advantage to f semaphores is one side is always non-blocking.
Our situation is perhaps more complex as a piece of storage has to move back and forth across the two sides and this both will need to block.
Andrew Adams
@abadams
Nov 22 2017 17:15
A semaphore handles blocking the consumer nicely
But blocking the producer really wants a monotonic counter, not a semaphore
I'm emulating one with an int stored on each side that tracks the current value of the counter
Turns out this int doesn't need to be shared. Both sides are capable of updating it and then doing a semaphore acquire or release based on the amount it changed by
Actually that's a sloppy way of saying what's going on. The producer tracks the head of the region it has written to, and the consumer tracks the largest address that won't clobber stuff it still needs due to wrap-around in the circular buffer
The producers int doesn't get too far ahead of the consumer's via synchronization on the semaphore
Everything the consumer bumps its counter, it releases that much from the semaphore
and every time the producer bumps its counter, it acquires that much from the semaphore first
Zalman Stern
@zvookin
Nov 22 2017 17:18
I like that explanation. It sounds like a good start on the right invariants.
Zalman Stern
@zvookin
Nov 22 2017 18:17
Not pressing, but if anyone has thoughts on the strict float PR, especially suggestions for testing, I’d love to hear them. Perhaps the first design decision to evaluate is the AllowStrictFloat and ForceStrictFloat target flags. Using per-instruction fast-math flags requires turning off the global fast-math options. (If those options are on globally, all instructions get them implicitly. There’s no way for an instruction to turn off the global option.) I felt the difference between having the global flags on and trying to put the flags on all the instructions instead was a big enough difference to require turning it on with a target flag. I anticipate a lot of people using AllowStrictFloat in their base, always on, flag set.
ronlieb
@ronlieb
Nov 22 2017 18:18
i created halide/Halide#2563 on Suyog's behalf, which merges his changes into a branch of Halide master.
Andrew Adams
@abadams
Nov 22 2017 18:18
AllowStrictFloat with no use of the intrinsic doesn't seem like something anyone wants - Halide does arbitrary reassociation but we stop llvm from doing it?
Zalman Stern
@zvookin
Nov 22 2017 18:20
Ideally AllowStrictFloat with no use of the intrinsic is the same as what we have now.
Andrew Adams
@abadams
Nov 22 2017 18:20
Would it be simpler if we didn't have AllowStrictFloat, and any use of strict_fp in the module just turned off the llvm global flags?
Wait, I thought AllowStrictFloat turned off llvm global flags?
Zalman Stern
@zvookin
Nov 22 2017 18:21
Maybe. It gives simpler usability than having the target flag.
AllowStrictFloat turns off the global flags, but they should be applied to all the instructions they affect.
So it should be the same. In practice there will almost be some difference because that’s how reality is.
E.g. anything from .ll in the runtime.
Andrew Adams
@abadams
Nov 22 2017 18:23
oh I see - I misunderstood
Zalman Stern
@zvookin
Nov 22 2017 18:23
But really we’d like the explicit control there.
Andrew Adams
@abadams
Nov 22 2017 18:24
Yeah, I'd just have the current behavior of AllowStrictFloat done by detecting the existence of a strict_fp intrinsic
Otherwise I can imagine people wanting to turn on and off the target flag while debugging
which seems like a nuisance
Zalman Stern
@zvookin
Nov 22 2017 18:25
I was forced to change the way the compiler does this and made the call to continue to provide the current behavior via target flag control. We could just make it the default or do a pass looking for the intrinsic.
Debugging is a motivation for the flag. It allows characterizing the difference between global fast math and per instruction flags.
(With now strict float on play.)
I could leave the flag in and automatically use that mode if the intrinsic is used.
Andrew Adams
@abadams
Nov 22 2017 18:27
I vote for either doing a pass, or spending some time benchmarking enough things to convince ourselves that it's OK to just have this as the default
Zalman Stern
@zvookin
Nov 22 2017 18:27
But sounds like killing the flag is probably the way to go.
Ok.
I was thinking we could switch the default and kill the flag later, but it’s tractable to evaluate beforehand.
Dillon and I are going to make sure the intrinsic and/or ForceStrictFloat flag meet the use cases it was done for.
(“Beforehand” above meaning before merging the PR.)
Zalman Stern
@zvookin
Nov 22 2017 18:38
I’m still a bit unsure re: llvm’s level of effect. I will try to craft some more test cases that show a difference at the llvm level. So far the tests show much larger effects from our simplifier and different order of computation due to scheduling changes.
Andrew Adams
@abadams
Nov 22 2017 18:38
I'd check the fft
We know that's very sensitive to llvm making different fp decisions
Zalman Stern
@zvookin
Nov 22 2017 18:39
That’s a great idea I hadn’t thought of.
Both for performance and accuracy.
Dillon Sharlet
@dsharletg
Nov 22 2017 20:38
another good example for fast-math tests might be a naive DFT, which would be a one-liner reduction so easy to write into a test
Dillon Sharlet
@dsharletg
Nov 22 2017 22:29
I'm looking at the build bots more to see if we can speed anything up
I noticed that the mac build: https://buildbot.halide-lang.org/master/#/builders/57/builds/148/steps/14/logs/stdio took 20+ minutes, but the log itself says it only took 1 min 40 s
Andrew Adams
@abadams
Nov 22 2017 22:43
The performance step waits to acquire an exclusive lock on the machine
It means that it had to hang around for 20 mins before the other builder running on the machine reached a point where it could pause
Dillon Sharlet
@dsharletg
Nov 22 2017 22:44
I see, so not something to fix...
Andrew Adams
@abadams
Nov 22 2017 22:44
yeah
the metal tests legitimately look really slow. 1442 seconds
Wonder if one of them is the long pole there
wait...
Why did that test take 30 mins to compile llvm?
It's for a fixed llvm version
should have been cached
That's worth investigating
Maybe those steps should explicitly be skipped on non-trunk builds if llvm-config/clang exists in the expected place.
Right now it relies on svn up to not touch any files, and for the cmake command to recognize that