Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 04:59
    abadams commented #4460
  • 04:28
    JVision opened #4460
  • 04:12
    dsharletg commented #4459
  • 03:08
    JVision commented #4450
  • Dec 08 14:46
    akokoshn opened #4459
  • Dec 08 13:53
    JVision commented #4450
  • Dec 08 02:44

    abadams on master

    Add unsharp app, with fresh man… Delete defunct unsharp test Add and reschedule harris corne… and 14 more (compare)

  • Dec 08 02:44
    abadams closed #4445
  • Dec 07 21:38
    edubois commented #4449
  • Dec 07 21:34
    abadams commented #4449
  • Dec 07 21:28
    edubois commented #4449
  • Dec 07 21:23
    abadams commented #4449
  • Dec 07 21:21
    abadams commented #4455
  • Dec 07 21:18
    abadams commented #4450
  • Dec 07 21:09
    abadams review_requested #4458
  • Dec 07 21:09
    abadams opened #4458
  • Dec 07 21:05

    abadams on avoid_type_punning_when_making_shapes

    Avoid type punning in the gener… (compare)

  • Dec 07 20:20
    abadams synchronize #4445
  • Dec 07 20:20

    abadams on apps_from_autoscheduler

    Don't rely on weak linkage (compare)

  • Dec 07 00:04

    steven-johnson on srj-msan-stringify

    (compare)

Andrew Adams
@abadams
so it would just work
Zalman Stern
@zvookin
That is what I was highlighting.
The only issue I see with this design is that the overhead of the thread may be too high to use for very lightweight hardware synchronization mechanisms. Other than that, I don't see a lot of reason to do the customized lowering.
I need to make a couple more changes to the hexagon DMA
Will try to do so today.
The test only calls buffer_copy, which is mostly as it should be.
Dillon Sharlet
@dsharletg
So BTW regarding hexagon offloading, I've been thinking we simply punt on that for now
and only target standalone
anything that we get working on standalone can be made to work with offloading without solving any "hard" problems like async + storage folding, it just might involve a lot of plumbing and infrastructure
Steven Johnson
@steven-johnson
re: the windows buildbots, proposed fix is out there.
Zalman Stern
@zvookin
I'll have to consider the implications, but I think the current stuff just works if the DMA things are scheduled inside an offloaded thing.
Dillon Sharlet
@dsharletg
I think there might be some hiccups with the device interface
that will need to get plumbed over via offloading
and I don't think that will happen transparently right now
it might be easy to make it work though
Zalman Stern
@zvookin
yeah, that's small boogs territory.
I guess I'm expecting it will have to work with offload very early on to have a useful test.
Andrew Adams
@abadams
@dsharletg the host->device case also works, but there's no benefit for cuda because the version without async already manages to overlap the cpu compute and copies in a subtle way.
Confused me for a while.
CPU compute -> synchronous copy -> async kernel launch -> next batch of CPU compute (overlapped with GPU kernel launch) -> synchronous copy (stalls until kernel launch is done) ->
Wait, so I guess the CPU compute is hidden under the GPU compute
not the copy
Dillon Sharlet
@dsharletg
That's great news!
Steven Johnson
@steven-johnson
I’m restarting the buildbot master now
Steven Johnson
@steven-johnson
On the recent issue of exported symbols varying between opt levels: it looks like CMake added a feature in 3.4 that attempts to auto-build a .def file for you on Windows, with the net effect of (mostly) acting like the gcc-ish default of “export all symbols”: https://blog.kitware.com/create-dlls-on-windows-without-declspec-using-new-cmake-export-all-feature/
I haven’t tried it (and we are talking about CMake here so who knows)...
Steven Johnson
@steven-johnson
We explicitly forbid using ‘.’ in a Func name since we use that as a separator internally, but we don’t seem to have a similar constraint on Var name. Deliberate or accidental?
Andrew Adams
@abadams
Var names are not uniqued either
Accidental I think
Zalman Stern
@zvookin
Var names are not uniqued by design
They're value types
Steven Johnson
@steven-johnson
Right
Andrew Adams
@abadams
Lack of '.' enforcement is the accidental thing
Steven Johnson
@steven-johnson
Just idly wondering if more constraints on the names allowed would give us more flexibility in the future. (e.g. GeneratorParam names are limited to C-style identifier rules, with additional constraints on underscore usage). Probably overthinking it.
Re: the windows buildbots: I updated the scripts and did a buildbot stop and start, but builds completing since then still seem to be using the old, broken windows testing approach. I wonder, do the workers queue up the commands on the worker (and thus this could be just stale builds completing)? Investigating...
Steven Johnson
@steven-johnson
Hmm, this is odd: I stopped buildbot again; when restarting, it is now failing with "could not find buildbot-www; is it installed?” which is something I haven’t seen before. @abadams, is it wise/unwise to restart the entire buildbot VM when updating?
Steven Johnson
@steven-johnson
logout, log back in, now starting it is telling me I need a txrequests package installed. Oy.
Just gonna reboot the VM.
Nope. Still busticated.
Steven Johnson
@steven-johnson
bah: chmod is not my friend
chmod’ing stuff to my user seems to have healed it, per comments in @abadams document — sadly, the failure modes were obscure and unrelated enough that I didn’t think to try that
ronlieb
@ronlieb
Hi Folks, i am seeing a failure building camera_pipe after the most recent commit.
make: * No rule to make target bin/Demosaic.o', needed bybin/process'. Stop.
Dillon Sharlet
@dsharletg
just pushed what should fix it
ronlieb
@ronlieb
it did ,thx
Suyog
@suyogsarda_twitter

Hi All, i was looking at issue 2317 (halide/Halide#2317) where input.dim(0).set_min(0) was resulting in slower code on CPU. Further digging into code and some experiment showed that slowness is only due to input.dim(0).set_min(0) and not due to input.dim(1).set_min(0).

In the codegen, i see some checks and asserts for "halide_buffer_is_bounds_query" and these are inserted on CPU side always. Even if the schedule is offloaded to Hexagon, the asserts are always inserted in CPU code. Hence the slowness is always observed on CPU schedule, but not on Hexagon.

Q - For schedules offloaded to Hexagon, even if the asserts are on CPU side, why isn't slowness observed? I assume we are measuring time which involves the CPU to Hexagon and back offload time too. Any idea?

Andrew Adams
@abadams
It'd have to be a really small pipeline for that assert to matter - e.g. processing an 8x8 image
It's an inlined comparison of two of the input buffer fields to zero - should be perfectly branch-predicted too
The no_asserts-no_bounds_query target flags turns off all that code, so you can try those for testing.
Suyog
@suyogsarda_twitter
Thanks i will try that. However, the effect is observed for every test case though (and size of image is large enough). Also, in the code generated, CPU code differs only on those asserts accompanied by some bunch of mov instructions at the end of computations for that function, while for schedules on hexagon, code generated is exactly same. Hence my guess was those asserts for slowness on CPU code.