Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 05:21
    UncleHandsome commented #4396
  • 03:26
    jn80842 synchronize #4392
  • 03:19
    jn80842 synchronize #4392
  • 03:07
    UncleHandsome opened #4396
  • 01:59

    steven-johnson on srj-ub

    (compare)

  • 01:59

    steven-johnson on master

    Fix UB in Halide::Runtime::Buff… Merge pull request #4389 from h… (compare)

  • 01:59
    steven-johnson closed #4389
  • 01:59
    steven-johnson commented #4384
  • 01:53
    steven-johnson review_requested #4395
  • 01:53
    steven-johnson review_requested #4395
  • 01:53
    steven-johnson opened #4395
  • 01:51

    steven-johnson on srj-llvm

    Update README for LLVM build in… (compare)

  • 01:13
    steven-johnson synchronize #4393
  • 01:13

    steven-johnson on srj-range

    WIP Update Resnet50Generator.cpp Update cost_model_generator.cpp and 7 more (compare)

  • 00:31
    steven-johnson assigned #4394
  • 00:31
    steven-johnson opened #4394
  • Nov 11 23:51
    steven-johnson commented #4393
  • Nov 11 23:50
    steven-johnson opened #4393
  • Nov 11 23:47

    steven-johnson on srj-range

    Use Halide::Range instead of st… (compare)

  • Nov 11 23:43

    steven-johnson on srj-range

    Update PyRDom.cpp (compare)

Zalman Stern
@zvookin
I need to make a couple more changes to the hexagon DMA
Will try to do so today.
The test only calls buffer_copy, which is mostly as it should be.
Dillon Sharlet
@dsharletg
So BTW regarding hexagon offloading, I've been thinking we simply punt on that for now
and only target standalone
anything that we get working on standalone can be made to work with offloading without solving any "hard" problems like async + storage folding, it just might involve a lot of plumbing and infrastructure
Steven Johnson
@steven-johnson
re: the windows buildbots, proposed fix is out there.
Zalman Stern
@zvookin
I'll have to consider the implications, but I think the current stuff just works if the DMA things are scheduled inside an offloaded thing.
Dillon Sharlet
@dsharletg
I think there might be some hiccups with the device interface
that will need to get plumbed over via offloading
and I don't think that will happen transparently right now
it might be easy to make it work though
Zalman Stern
@zvookin
yeah, that's small boogs territory.
I guess I'm expecting it will have to work with offload very early on to have a useful test.
Andrew Adams
@abadams
@dsharletg the host->device case also works, but there's no benefit for cuda because the version without async already manages to overlap the cpu compute and copies in a subtle way.
Confused me for a while.
CPU compute -> synchronous copy -> async kernel launch -> next batch of CPU compute (overlapped with GPU kernel launch) -> synchronous copy (stalls until kernel launch is done) ->
Wait, so I guess the CPU compute is hidden under the GPU compute
not the copy
Dillon Sharlet
@dsharletg
That's great news!
Steven Johnson
@steven-johnson
I’m restarting the buildbot master now
Steven Johnson
@steven-johnson
On the recent issue of exported symbols varying between opt levels: it looks like CMake added a feature in 3.4 that attempts to auto-build a .def file for you on Windows, with the net effect of (mostly) acting like the gcc-ish default of “export all symbols”: https://blog.kitware.com/create-dlls-on-windows-without-declspec-using-new-cmake-export-all-feature/
I haven’t tried it (and we are talking about CMake here so who knows)...
Steven Johnson
@steven-johnson
We explicitly forbid using ‘.’ in a Func name since we use that as a separator internally, but we don’t seem to have a similar constraint on Var name. Deliberate or accidental?
Andrew Adams
@abadams
Var names are not uniqued either
Accidental I think
Zalman Stern
@zvookin
Var names are not uniqued by design
They're value types
Steven Johnson
@steven-johnson
Right
Andrew Adams
@abadams
Lack of '.' enforcement is the accidental thing
Steven Johnson
@steven-johnson
Just idly wondering if more constraints on the names allowed would give us more flexibility in the future. (e.g. GeneratorParam names are limited to C-style identifier rules, with additional constraints on underscore usage). Probably overthinking it.
Re: the windows buildbots: I updated the scripts and did a buildbot stop and start, but builds completing since then still seem to be using the old, broken windows testing approach. I wonder, do the workers queue up the commands on the worker (and thus this could be just stale builds completing)? Investigating...
Steven Johnson
@steven-johnson
Hmm, this is odd: I stopped buildbot again; when restarting, it is now failing with "could not find buildbot-www; is it installed?” which is something I haven’t seen before. @abadams, is it wise/unwise to restart the entire buildbot VM when updating?
Steven Johnson
@steven-johnson
logout, log back in, now starting it is telling me I need a txrequests package installed. Oy.
Just gonna reboot the VM.
Nope. Still busticated.
Steven Johnson
@steven-johnson
bah: chmod is not my friend
chmod’ing stuff to my user seems to have healed it, per comments in @abadams document — sadly, the failure modes were obscure and unrelated enough that I didn’t think to try that
ronlieb
@ronlieb
Hi Folks, i am seeing a failure building camera_pipe after the most recent commit.
make: * No rule to make target bin/Demosaic.o', needed bybin/process'. Stop.
Dillon Sharlet
@dsharletg
just pushed what should fix it
ronlieb
@ronlieb
it did ,thx
Suyog
@suyogsarda_twitter

Hi All, i was looking at issue 2317 (halide/Halide#2317) where input.dim(0).set_min(0) was resulting in slower code on CPU. Further digging into code and some experiment showed that slowness is only due to input.dim(0).set_min(0) and not due to input.dim(1).set_min(0).

In the codegen, i see some checks and asserts for "halide_buffer_is_bounds_query" and these are inserted on CPU side always. Even if the schedule is offloaded to Hexagon, the asserts are always inserted in CPU code. Hence the slowness is always observed on CPU schedule, but not on Hexagon.

Q - For schedules offloaded to Hexagon, even if the asserts are on CPU side, why isn't slowness observed? I assume we are measuring time which involves the CPU to Hexagon and back offload time too. Any idea?

Andrew Adams
@abadams
It'd have to be a really small pipeline for that assert to matter - e.g. processing an 8x8 image
It's an inlined comparison of two of the input buffer fields to zero - should be perfectly branch-predicted too
The no_asserts-no_bounds_query target flags turns off all that code, so you can try those for testing.
Suyog
@suyogsarda_twitter
Thanks i will try that. However, the effect is observed for every test case though (and size of image is large enough). Also, in the code generated, CPU code differs only on those asserts accompanied by some bunch of mov instructions at the end of computations for that function, while for schedules on hexagon, code generated is exactly same. Hence my guess was those asserts for slowness on CPU code.
Andrew Adams
@abadams
@steven-johnson windows builds are now all failing tests (now that we're running them). Can't tell if it's real, or a build config issue. I see cmake reporting steps as failed, but don't see any indication of what failed.
It's not doing something dumb like assuming the word "error" in the output means a failure is it? I recall that being an issue
Steven Johnson
@steven-johnson
Well, that's progress I guess... Will investigate when I get in today. It's almost certainly a build config issue unless something has injected a platform specific failure in the last week or so as these targets worked correctly on my local box.