Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 05:57

    abadams on super_simplify

    Revert "Match through lets. Tra… Trim down random pipeline exprs Use genuinely random schedules (compare)

  • Nov 17 23:39

    abadams on super_simplify

    Match through lets. Trades comp… (compare)

  • Nov 17 19:53

    abadams on super_simplify

    Remove source of duplicate rules (compare)

  • Nov 17 19:32

    abadams on super_simplify

    Add beam size to header (compare)

  • Nov 17 19:31

    abadams on super_simplify

    Add beam size arg to synthesize… Merge branch 'super_simplify' o… (compare)

  • Nov 17 19:26

    abadams on super_simplify

    Minor bugfix (compare)

  • Nov 17 02:09

    abadams on super_simplify

    Add expr enumeration code Better type construction for se… Fix EQ rules and 1 more (compare)

  • Nov 17 02:02

    abadams on super_simplify

    Enable OR rules Try to fix EQ matching. Still b… (compare)

  • Nov 17 01:55

    abadams on super_simplify

    Small improvements Merge branch 'super_simplify' o… Fresh rules (compare)

  • Nov 17 01:35

    abadams on super_simplify

    Update three op sequences to no… (compare)

  • Nov 16 22:05

    abadams on super_simplify

    Add all 3-op sequences on a sma… (compare)

  • Nov 16 19:58

    abadams on super_simplify

    Fresh results, with CSE set to 2 (compare)

  • Nov 16 04:54

    abadams on super_simplify

    Fix bad simplify_mod again (compare)

  • Nov 16 04:51

    abadams on super_simplify

    Fresh rules (compare)

  • Nov 15 22:15

    abadams on super_simplify

    Add max filter app (compare)

  • Nov 15 21:51
    shubhamp-ca synchronize #4402
  • Nov 15 21:42
    steven-johnson commented #4402
  • Nov 15 21:39
    shubhamp-ca commented #4402
  • Nov 15 21:22
    dsharletg closed #4358
  • Nov 15 21:22

    dsharletg on master

    [Hexagon] LUT32 implementation … [Hexagon] LUT32 implementation … [Hexagon] LUT32 implementation … and 7 more (compare)

Dillon Sharlet
@dsharletg
and only target standalone
anything that we get working on standalone can be made to work with offloading without solving any "hard" problems like async + storage folding, it just might involve a lot of plumbing and infrastructure
Steven Johnson
@steven-johnson
re: the windows buildbots, proposed fix is out there.
Zalman Stern
@zvookin
I'll have to consider the implications, but I think the current stuff just works if the DMA things are scheduled inside an offloaded thing.
Dillon Sharlet
@dsharletg
I think there might be some hiccups with the device interface
that will need to get plumbed over via offloading
and I don't think that will happen transparently right now
it might be easy to make it work though
Zalman Stern
@zvookin
yeah, that's small boogs territory.
I guess I'm expecting it will have to work with offload very early on to have a useful test.
Andrew Adams
@abadams
@dsharletg the host->device case also works, but there's no benefit for cuda because the version without async already manages to overlap the cpu compute and copies in a subtle way.
Confused me for a while.
CPU compute -> synchronous copy -> async kernel launch -> next batch of CPU compute (overlapped with GPU kernel launch) -> synchronous copy (stalls until kernel launch is done) ->
Wait, so I guess the CPU compute is hidden under the GPU compute
not the copy
Dillon Sharlet
@dsharletg
That's great news!
Steven Johnson
@steven-johnson
I’m restarting the buildbot master now
Steven Johnson
@steven-johnson
On the recent issue of exported symbols varying between opt levels: it looks like CMake added a feature in 3.4 that attempts to auto-build a .def file for you on Windows, with the net effect of (mostly) acting like the gcc-ish default of “export all symbols”: https://blog.kitware.com/create-dlls-on-windows-without-declspec-using-new-cmake-export-all-feature/
I haven’t tried it (and we are talking about CMake here so who knows)...
Steven Johnson
@steven-johnson
We explicitly forbid using ‘.’ in a Func name since we use that as a separator internally, but we don’t seem to have a similar constraint on Var name. Deliberate or accidental?
Andrew Adams
@abadams
Var names are not uniqued either
Accidental I think
Zalman Stern
@zvookin
Var names are not uniqued by design
They're value types
Steven Johnson
@steven-johnson
Right
Andrew Adams
@abadams
Lack of '.' enforcement is the accidental thing
Steven Johnson
@steven-johnson
Just idly wondering if more constraints on the names allowed would give us more flexibility in the future. (e.g. GeneratorParam names are limited to C-style identifier rules, with additional constraints on underscore usage). Probably overthinking it.
Re: the windows buildbots: I updated the scripts and did a buildbot stop and start, but builds completing since then still seem to be using the old, broken windows testing approach. I wonder, do the workers queue up the commands on the worker (and thus this could be just stale builds completing)? Investigating...
Steven Johnson
@steven-johnson
Hmm, this is odd: I stopped buildbot again; when restarting, it is now failing with "could not find buildbot-www; is it installed?” which is something I haven’t seen before. @abadams, is it wise/unwise to restart the entire buildbot VM when updating?
Steven Johnson
@steven-johnson
logout, log back in, now starting it is telling me I need a txrequests package installed. Oy.
Just gonna reboot the VM.
Nope. Still busticated.
Steven Johnson
@steven-johnson
bah: chmod is not my friend
chmod’ing stuff to my user seems to have healed it, per comments in @abadams document — sadly, the failure modes were obscure and unrelated enough that I didn’t think to try that
ronlieb
@ronlieb
Hi Folks, i am seeing a failure building camera_pipe after the most recent commit.
make: * No rule to make target bin/Demosaic.o', needed bybin/process'. Stop.
Dillon Sharlet
@dsharletg
just pushed what should fix it
ronlieb
@ronlieb
it did ,thx
Suyog
@suyogsarda_twitter

Hi All, i was looking at issue 2317 (halide/Halide#2317) where input.dim(0).set_min(0) was resulting in slower code on CPU. Further digging into code and some experiment showed that slowness is only due to input.dim(0).set_min(0) and not due to input.dim(1).set_min(0).

In the codegen, i see some checks and asserts for "halide_buffer_is_bounds_query" and these are inserted on CPU side always. Even if the schedule is offloaded to Hexagon, the asserts are always inserted in CPU code. Hence the slowness is always observed on CPU schedule, but not on Hexagon.

Q - For schedules offloaded to Hexagon, even if the asserts are on CPU side, why isn't slowness observed? I assume we are measuring time which involves the CPU to Hexagon and back offload time too. Any idea?

Andrew Adams
@abadams
It'd have to be a really small pipeline for that assert to matter - e.g. processing an 8x8 image
It's an inlined comparison of two of the input buffer fields to zero - should be perfectly branch-predicted too
The no_asserts-no_bounds_query target flags turns off all that code, so you can try those for testing.
Suyog
@suyogsarda_twitter
Thanks i will try that. However, the effect is observed for every test case though (and size of image is large enough). Also, in the code generated, CPU code differs only on those asserts accompanied by some bunch of mov instructions at the end of computations for that function, while for schedules on hexagon, code generated is exactly same. Hence my guess was those asserts for slowness on CPU code.
Andrew Adams
@abadams
@steven-johnson windows builds are now all failing tests (now that we're running them). Can't tell if it's real, or a build config issue. I see cmake reporting steps as failed, but don't see any indication of what failed.
It's not doing something dumb like assuming the word "error" in the output means a failure is it? I recall that being an issue
Steven Johnson
@steven-johnson
Well, that's progress I guess... Will investigate when I get in today. It's almost certainly a build config issue unless something has injected a platform specific failure in the last week or so as these targets worked correctly on my local box.
Suyog
@suyogsarda_twitter
@abadams arm-64-android-no_asserts-no_bounds_query didn't produce faster code. Problem seems something else then
Steven Johnson
@steven-johnson
looking at windows failure log now — there are only a handful of failures but the errors aren’t really explicated, gonna have to try to replicate individually by remoting into one of the windows buildbots. (My personal Windows box is down again and I’m in different office today, yay windows)
(given that it’s only a handful, it’s possible these are legit errors that have crept in over the past two weeks of non-testing rather than config stuff)
Steven Johnson
@steven-johnson
@abadams: are the two Windows buildbots interchangeable from a build target standpoint?