These are chat archives for halide/Halide

8th
Dec 2017
Dillon Sharlet
@dsharletg
Dec 08 2017 00:03
Anyone want to take a look at halide/Halide#2607 ? I think we were all already OK with it but I had to redo it after the clean
Steven Johnson
@steven-johnson
Dec 08 2017 00:04
If this is the same, LGTM
Dillon Sharlet
@dsharletg
Dec 08 2017 00:04
it's a lot smaller since Andrew happened to make much of the same changes in another PR
Dillon Sharlet
@dsharletg
Dec 08 2017 00:27
just saying LGTM isn't going to turn it green :)
Steven Johnson
@steven-johnson
Dec 08 2017 00:27
woops
forgot about the new Review Required Overlords. One Sec
Steven Johnson
@steven-johnson
Dec 08 2017 15:39
I'm coming in to MTV today to try sorting the GPU
Steven Johnson
@steven-johnson
Dec 08 2017 16:45
Anyone know what the story is with the “correctness_interleave_x” errors happening? Simulator? LLVM? Injection?
hexagon-specific
Dillon Sharlet
@dsharletg
Dec 08 2017 16:46
hmm is that now?
new?
Steven Johnson
@steven-johnson
Dec 08 2017 16:47
I saw something similar a day or two ago but it seemed transitory
the two builds between there were tooling failures
Steven Johnson
@steven-johnson
Dec 08 2017 16:51
Hmm, I’ll just restart the build(s) in question then
Dillon Sharlet
@dsharletg
Dec 08 2017 16:52
?
it's definitely failing now
I looked at the changes from 1124-1127
that's probably the one that broke it
we didn't submit any affecting changes in tohse builds
Steven Johnson
@steven-johnson
Dec 08 2017 16:53
Oh, I misunderstood
Wait, that’s just a flat LLVM build failure
Dillon Sharlet
@dsharletg
Dec 08 2017 16:54
I know
this test passed at 1125
err
1124
failed at 1127
the two intervening builds were unrelated failures
so the test could have started failing on any build between 1125-1127
Steven Johnson
@steven-johnson
Dec 08 2017 16:54
alright, so we have to figure out the LLVM injection point
Dillon Sharlet
@dsharletg
Dec 08 2017 16:54
1126 and 1127 don't have any relevant changes
1125 does have Hexagon LLVM changes
including to vdelta
which is exactly what correctness_interleave_x tests
@ronlieb can you take a look?
ronlieb
@ronlieb
Dec 08 2017 17:15
yep, we shall look, thx forf heads up
Dillon Sharlet
@dsharletg
Dec 08 2017 18:11
I think the apps need a basic command line argument parser, the one-off hacky parsing is not going to cut it for what I'm working on
anyone have any suggestions?
last time I needed this, I also needed to run it on a LEGO robot, so it had zero dependencies
Andrew Adams
@abadams
Dec 08 2017 18:13
I think getopt is the standard
Steven Johnson
@steven-johnson
Dec 08 2017 18:14
getopt isn’t available on osx IIRC
Andrew Adams
@abadams
Dec 08 2017 18:14
ah
and it's pretty gross
Dillon Sharlet
@dsharletg
Dec 08 2017 18:14
it also is pretty terrible
Andrew Adams
@abadams
Dec 08 2017 18:14
uses globals
Steven Johnson
@steven-johnson
Dec 08 2017 18:14
let’s use ABSL! (/ducks)
@abadams : swapping out cards now; I take it the Gigabyte (GTX750 I think?) on the desk was the proposed replacement?
Dillon Sharlet
@dsharletg
Dec 08 2017 18:15
I'm half serious about just grabbing the one I wrote for robots
Steven Johnson
@steven-johnson
Dec 08 2017 18:15
Drop a PR and lets take a look
Andrew Adams
@abadams
Dec 08 2017 18:15
Where's the generator argument-parsing code
Maybe it could be generalized
Steven Johnson
@steven-johnson
Dec 08 2017 18:16
Generator.cpp
generate_filter_main
Andrew Adams
@abadams
Dec 08 2017 18:16
Looks like it's the usual sort of loop + if
nothing fancy there
Steven Johnson
@steven-johnson
Dec 08 2017 18:16
yeah
Andrew Adams
@abadams
Dec 08 2017 18:19
Looks like you use llvm's CommandLine library?
Dillon Sharlet
@dsharletg
Dec 08 2017 18:19
it just looks like it :) that was obviously the inspiration for it
I think it has some nice properties
Andrew Adams
@abadams
Dec 08 2017 18:19
Where is your cl.h?
Andrew Adams
@abadams
Dec 08 2017 18:20
It's single-header file?
Dillon Sharlet
@dsharletg
Dec 08 2017 18:20
heh had to make sure I credited LLVM
Andrew Adams
@abadams
Dec 08 2017 18:21
We could dump it in tools I suppose
Dillon Sharlet
@dsharletg
Dec 08 2017 18:21
we could make it header only if we wanted to drop the ability to add command line flags to arbitrary cpp files
which I don't think we need
Andrew Adams
@abadams
Dec 08 2017 18:23
We don't?
The apps don't link to anything
header-only seems like a big plus for them
Dillon Sharlet
@dsharletg
Dec 08 2017 18:24
I agree, I don't think we need the ability to add command line flags anywhere
just main
which make sit possible to do this header only
there's gotta be a library that does this already somewhere
that isn't bad
https://github.com/jarro2783/cxxopts seems to be the thing people like
Steven Johnson
@steven-johnson
Dec 08 2017 18:28
adding command-line flags to arbitrary cpp files is an antipattern IMHO
@abadams — of course the new card is HDMI and the monitor we’re using is DisplayPort and I can’t find adapters :-) will just try chromoting in for now
Andrew Adams
@abadams
Dec 08 2017 18:29
The 750 doesn't have a displayport output?
surprising
Steven Johnson
@steven-johnson
Dec 08 2017 18:29
dual HDMI + dual DVI
Dillon Sharlet
@dsharletg
Dec 08 2017 18:29
I think plumbing random debug parameters all the way from main into the guts of a program is an antipattern :)
Andrew Adams
@abadams
Dec 08 2017 18:29
Yeah, just use environment variables prefixed with HL_ !
He
Heh
Steven Johnson
@steven-johnson
Dec 08 2017 18:32
If we have guts that deep inside our Halide apps then we have lots of problems going on :-)
Dillon Sharlet
@dsharletg
Dec 08 2017 18:34
I don't think we need this for halide apps
Steven Johnson
@steven-johnson
Dec 08 2017 18:38
ok, restarted WinBot2 with new card. Let’s see.
Is there an simple Buildbot view to see the pending build requests? (I have at least one PR that has passed Travis but hasn’t even started any buildbot tests; looks like I have to spelunk thru the workers to see which is building what.)
Steven Johnson
@steven-johnson
Dec 08 2017 18:45
no wonder: something is amiss with master. investigating.
yeah, this is not good: master is refusing to restart, and twistd.log is completely empty.
Andrew Adams
@abadams
Dec 08 2017 18:48
buildbot restart master?
Steven Johnson
@steven-johnson
Dec 08 2017 18:48
start
will try restart
Andrew Adams
@abadams
Dec 08 2017 18:48
ah
no, restart sucks
it has a builtin timeout that is too short
Steven Johnson
@steven-johnson
Dec 08 2017 18:49
ugh
Andrew Adams
@abadams
Dec 08 2017 18:49
you want to stop then start
Steven Johnson
@steven-johnson
Dec 08 2017 18:49
rebooting the vm now
yes, it was stopped
damn these buildbots are finicky
Andrew Adams
@abadams
Dec 08 2017 18:49
yep
Steven Johnson
@steven-johnson
Dec 08 2017 18:51
ok a reboot + start fixed it. No workers showing up yet.
ouch, will I need to restart all the workers? I had assumed they’d auto-reconnect
Andrew Adams
@abadams
Dec 08 2017 18:52
they will
Steven Johnson
@steven-johnson
Dec 08 2017 18:53
ok
Andrew Adams
@abadams
Dec 08 2017 18:53
but I think they use exponential backoff
so the longer the master was off, the longer before they reconnect
Steven Johnson
@steven-johnson
Dec 08 2017 18:56
ok, workers appear to be working again
Steven Johnson
@steven-johnson
Dec 08 2017 19:11
ok, now the master appears to still be working (based on twistd.log) but the status web page is no longer loading. whee!
Andrew Adams
@abadams
Dec 08 2017 19:11
Did nginx come back up?
The build master serves http over a local port
Steven Johnson
@steven-johnson
Dec 08 2017 19:12
Don’t know, but it was working initially
Andrew Adams
@abadams
Dec 08 2017 19:12
There's an nginx proxy that wraps it in https
Steven Johnson
@steven-johnson
Dec 08 2017 19:12
I’ll check
…and now it’s back, for no apparent reason
Steven Johnson
@steven-johnson
Dec 08 2017 19:23
interestingly, all the workers are now back up, but some don’t seem to be getting anything scheduled onto them.
eg last job for arm32-1 was 9 hours ago
Steven Johnson
@steven-johnson
Dec 08 2017 19:34
I’m not certain yet, but the buildbot page seems to go down if you try to access Console View — it never loads, everything else fails, and it take a few minutes for everything to come back up.
Andrew Adams
@abadams
Dec 08 2017 19:54
can confirm
must be doing something very cpu intensive
Steven Johnson
@steven-johnson
Dec 08 2017 20:35
maybe crashing and restarting?
ronlieb
@ronlieb
Dec 08 2017 21:00
FYI: Krzysztof (our llvm person) is looking into the interleave failure. no ETA at this point, but he is quick
Zalman Stern
@zvookin
Dec 08 2017 21:00
@abadams @slomp @shoaibkamil In order for Metal device buffers to support cropping, we're going to have to make the device field into a pointer to a structure that holds the mtl_buffer and an offset. (Metal supports passing the offset in arguments to kernels, but it doesn't support making one buffer from another with the offset built-in.) First, can someone else who knows Metal double check my thinking here. Secondly, this is a breaking change to some code which uses the device field of the buffer directly. It also requires strict wrap/unwrap discipline. It isn't strictly necessary to do this, but it is pretty useful to support cropping without trashing the device allocation.
(Just learned one can edit messages after posting them. Wow.)
Marcos Slomp
@slomp
Dec 08 2017 21:11
@zvookin I like this extra flexibility; in some ways, we can think of this structure as a "buffer view" and having it can be pretty handy in case we need to extend or ad metadata to it in the future.
As far as the custom metal runtime that @shoaibkamil implemented for our needs, we can revisit and adapt it accordingly.
I think it is reasonable to assume that whoever is manipulating device-level pointers directly is aware that these things are brittle and prone to change, and that maintenance of that code will be required as Halide progresses.
Zalman Stern
@zvookin
Dec 08 2017 21:46
Ok, I'll make the change.
We probably should have just left them all pointers, but so it goes.
Steven Johnson
@steven-johnson
Dec 08 2017 22:22
@abadams : well, the bad news is: after swapping out the cards, WinBot2 is still failing in exactly the same obscure way.
adding insult to injury, some workers are steadfastly refusing to schedule themselves still (e.g. arm32-1)
I’m baffled at the moment