Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 18:21

    zvookin on fixed_length_vectors

    Silence "may be used uninitiali… Update WABT to 1.0.29 (#6748) Update hannk README link to hos… and 16 more (compare)

  • 16:31
    stevesuzuki-arm commented #6781
  • 16:12
    zvookin commented #6781
  • 15:57
    stevesuzuki-arm commented #6781
  • 15:38
    zvookin commented #6781
  • 15:33
    zvookin commented #6781
  • 14:39
    stevesuzuki-arm opened #6781
  • May 24 23:37

    vksnk on xtensa-codegen

    Fix fundamental confusion about… Fix annoying typo in Func.h (#6… Add execute_generator() API (#6… and 4 more (compare)

  • May 24 22:30
    abadams commented #6729
  • May 24 22:19
    steven-johnson synchronize #6731
  • May 24 22:19

    steven-johnson on deprecate-old-gp

    Update Generator.cpp (compare)

  • May 24 22:19
    steven-johnson synchronize #6637
  • May 24 22:19

    steven-johnson on abstract-generator

    Fix annoying typo in Func.h (#6… Add execute_generator() API (#6… Allow overriding of `Generator:… and 2 more (compare)

  • May 24 22:01
    steven-johnson synchronize #6731
  • May 24 22:01

    steven-johnson on deprecate-old-gp

    Merge branch 'main' into srj/ge… Merge branch 'srj/gen-context' … (compare)

  • May 24 22:01
    steven-johnson commented #6729
  • May 24 22:01
    steven-johnson synchronize #6729
  • May 24 22:01

    steven-johnson on gen-context

    Remove `rounding_halving_sub` a… Augment Halide::Func to allow f… More typed-Func work (#6735) -… and 25 more (compare)

  • May 24 22:00
    steven-johnson synchronize #6731
  • May 24 22:00

    steven-johnson on deprecate-old-gp

    Remove `rounding_halving_sub` a… Augment Halide::Func to allow f… More typed-Func work (#6735) -… and 26 more (compare)

steven-johnson
@steven-johnson:matrix.org
[m]
well, give a lot of credit to the WebAssembly team for the LLVM backend we use... :-)
steven-johnson
@steven-johnson:matrix.org
[m]
Looks like there's an LLVM top-of-tree failure on some of the bots -- I'll get to it after lunch
steven-johnson
@steven-johnson:matrix.org
[m]
Svenn-Arne Dragly
@dragly

I am working "serializing" a Python object with Expr members to a Halide Func. In the process, I end up having a function with a large number of explicit definitions in one dimension. Unfortunately, I am not able to make those be calculated in an efficient way - once for each value while and sharing potential pre-calcualted values. In particular, this code:

f = Func("f")
f[row, col] = 0.0
f[row, 0] = 1.0 + sqrt(row*row)
f[row, 1] = 2.0 + sqrt(row*row)
f[row, 2] = 3.0 + sqrt(row*row)
f[row, 3] = 4.0 + sqrt(row*row)

g = Func("g")
g[row, col] = f[row, col] + 42.0

g.compile_to_lowered_stmt("out.txt", [], StmtOutputFormat.Text)
print(np.asanyarray(g.realize(2, 4)))

Leads to the following generated code:

  for (g.s0.col, g.min.1, g.extent.1) {
  ...
   for (g.s0.row, g.min.0, g.extent.0) {
    allocate f[float32 * 1 * (max(t6, 3) + 1)]
    produce f {
     f[t7] = 0.000000f
     f[t8] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 1.000000f
     f[t9] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 2.000000f
     f[t10] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 3.000000f
     f[t11] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 4.000000f
    }
    consume f {
     g[g.s0.row + t12] = f[t7] + 42.000000f
    }

Unfortunately, Halide does not notice that only one value of f is needed, and calculates all of f for each g. I guess this is expected.

Calling f.compute_root() helps reduce the number of calculations, but results in code with 4 four loops over row instead. This is problematic in my actual use-case, because it no longer automatically shares values that can be pre-calculated (such as the sqrt above).

Is there a way to get Halide to calculate f for each explicitly set col in one loop over row?

6 replies
Ashish Uthama
@ashishUthama

upgrading from Halide 12 to Halide 14 (tip)
running into a lot of:

Unhandled exception: Error: Cannot split a loop variable resulting from a split using PredicateLoads or PredicateStores.

Right now, it looks like something related to tile() with tailstrategy omitted (i.e the default Auto) . Does this ring a bell? (will dig more in a bit)

Did some defaults change?
@dsharletg - likely related to halide/Halide#6020 ? (I'll explore by adding explicit tailstrategies to the code)
Dillon Sharlet
@dsharletg
That shouldn't happen if you weren't explicitly using PredicateLoads or PredicateStores
Ashish Uthama
@ashishUthama
I am not, will try to create a repro and file an issue
Source code worked as-is on Halide 12
Ashish Uthama
@ashishUthama
@alexreinking:matrix.org - thanks for the comment on the windows build. We build Halide from source, but with the regular versioned release cadence - I am considering just using the released binaries.
Alex Reinking
@alexreinking:matrix.org
[m]
Sure! I'd still like to understand why your build was failing, though :)
Ashish Uthama
@ashishUthama
The downloaded libraries appear to be significantly larger than what I build locally, ~150MB vs ~45MB. Would you have any thoughts on why that may be?
Alex Reinking
@alexreinking:matrix.org
[m]
It could be a difference in how we're building LLVM or what targets are enabled?
Ashish Uthama
@ashishUthama
I would like to too .. but not sure how to proceed :( I checked the definition in VS and it correctly opened up that header
These are our cmake flags:
51 CMAKE_HALIDE_OPTIONS:= \
52 -DLLVM_DIR=${LLVM_ROOT}/release/lib/cmake/llvm \
53 -DCLANG=${CLANG} \
54 -DWARNINGS_AS_ERRORS=OFF \
55 -DWITH_PYTHON_BINDINGS=OFF \
56 -DWITH_TEST_AUTO_SCHEDULE=ON \
57 -DWITH_TEST_CORRECTNESS=OFF \
58 -DWITH_TEST_ERROR=ON \
59 -DWITH_TEST_WARNING=ON \
60 -DWITH_TEST_PERFORMANCE=ON \
61 -DWITH_TEST_OPENGL=OFF \
62 -DWITH_TEST_GENERATOR=ON \
63 -DWITH_APPS=OFF \
64 -DWITH_TUTORIALS=OFF \
65 -DWITH_DOCS=OFF \
66 -DWITH_UTILS=OFF .
oh, llvm - ok, likely that. Will double check, it might be that our in house llvm build has limited targets.
Alex Reinking
@alexreinking:matrix.org
[m]
That would make sense. Our binaries contain all supported backends
Ashish Uthama
@ashishUthama
yes, that explains it.Thanks! And -- thanks for the push to versioned regular releases, Much appreciated!
Alex Reinking
@alexreinking:matrix.org
[m]
Totally! Hopefully we'll be able to get the release latency (after LLVM) even tighter for the next releases :)
Derek Gerstmann
@derek-gerstmann
@alexreinking:matrix.org Hiya! I'm looking at backporting PR #6405 from master to create a v13.0.1 release. Should I create a "backports/13.x" branch and merge things in there, and then push the results into "releases/13.x"? Just trying to match what Andrew and I are seeing in the repo to follow conventions. Any suggestions?
Alex Reinking
@alexreinking:matrix.org
[m]
I use backports/N.x for staging changes to release/N.x. When it's ready, I open a PR with release/N.x as the target branch. There is CI set up for this scenario. Be sure to include a commit that bumps the version to 13.0.1.
release/N.x is (or ought to be) protected (like master), so you can't push to it
Derek Gerstmann
@derek-gerstmann
Ahh ... okay cool! Does the "release/N.x" branch itself get created automatically?
Alex Reinking
@alexreinking:matrix.org
[m]
No. When creating a new major release, we fork it off of master.
Derek Gerstmann
@derek-gerstmann
Makes sense! Okay, I'll work on getting things merged! Thanks!
Alex Reinking
@alexreinking:matrix.org
[m]
Also, I prefer to not squash commits from backports/N.x into release/N.x. I think the cherry-picking history is valuable (as are any separate/additional patches necessary to correctly backport) as is keeping the version number bump separate.
No problem! Happy to share the release responsibility :)
Derek Gerstmann
@derek-gerstmann
Cool. Yeah, I'm the same. I rarely squash commits unless there's a really good reason to.
Sweet! Happy to help! :)
Alex Reinking
@alexreinking:matrix.org
[m]
I bring it up because the repository default PR-merge mode is to squash
Derek Gerstmann
@derek-gerstmann
Oooh, good to know! I'll turn it off for the release PR. Thanks for the heads up!
Alex Reinking
@alexreinking:matrix.org
[m]
Of course!
Ashish Uthama
@ashishUthama
@alexreinking:matrix.org - is it reasonable to ask for the LICENSE file to be included in the downloads?
Alex Reinking
@alexreinking:matrix.org
[m]
I think it is... I would add it to the share/doc/Halide folder (at least on Linux), along with the other READMEs.
Ashish Uthama
@ashishUthama
I'll create an issue and try to make the change.
Alex Reinking
@alexreinking:matrix.org
[m]
Sure... I can review, but I'm traveling through 12/5, and pushing for the 11/19 PLDI deadline, so my bandwidth is limited
Ashish Uthama
@ashishUthama
no hurry!
Nikola Smiljanić
@popizdeh
Can someone please explain this message Loop over output.s0.x has extent output.extent.0. Can only vectorize loops over a constant extent > 1. Let's say we're dealing with floats and SSE, I don't get why the loop over x can't simply loop over 1/4 of the extent and process 4 float values at the time (let's ignore the case where extent is not divisible by 4). Do I need to split x into constant size chunks in order to get vectorization working?
Zalman Stern
@zvookin
How are you calling vectorize? Typically one does f.vectorize(x, 4) which provides the split as part of a single directive. If one writes f.vectorize(x) it means the extent must be constant and known and the vectorization amount is the complete extent.
5 replies
steven-johnson
@steven-johnson:matrix.org
[m]
mac-buildbot-1 is going DOWN for a long-overdue OS upgrade (since I am actually physically in front of it). Back up soon-ish I hope.
Derek Gerstmann
@derek-gerstmann
FYI -- Halide v13.0.1 has been released: https://github.com/halide/Halide/releases/tag/v13.0.1
Vlad Levenfeld
@vladl-innopeaktech_gitlab
To call some AOT generated code from a Hexagon binary (running on the simulation), what do I need to link my Hexagon binary to? I am getting some error messages about undefined symbols (halide_string_to_string, halide_msan_annotate_memory_is_initialized, a couple of others) but libHalide.so and libHalide.a are both x86_64 libs
(I am getting the error messages when I run the simulation)
Or is there perhaps a way to statically link those missing functions when I run the AOT generator?
shoaibkamil
@shoaibkamil:matrix.org
[m]
It sounds like you're missing a runtime perhaps? What was the target for the AOT generated code?
Vlad Levenfeld
@vladl-innopeaktech_gitlab
hexagon-32-qurt-hvx_128-hvx_v66-no_asserts-no_bounds_query-enable_llvm_loop_opt
I can see some DSP libs in Hexagon SDK's copy of Halide, but I don't seem to be generating these libs when I am building Halide from source
Vlad Levenfeld
@vladl-innopeaktech_gitlab
this might be relevant
$ objdump -t build/host/ResizeNearestNeighbor.a | grep halide_
00000000 l    df *ABS*  00000000 halide_buffer_t.cpp
00000000         *UND*  00000000 halide_error
00000000         *UND*  00000000 halide_msan_annotate_memory_is_initialized
00000000  w    F .text.halide_qurt_hvx_lock     000000b0 halide_qurt_hvx_lock
00000000  w    F .text.halide_qurt_hvx_unlock   000000ac halide_qurt_hvx_unlock
00000000  w    F .text.halide_qurt_hvx_unlock_as_destructor     00000008 halide_qurt_hvx_unl
ock_as_destructor
00000000         *UND*  00000000 halide_string_to_string
00000000  w    F .text.halide_vtcm_free 00000008 halide_vtcm_free
00000000  w    F .text.halide_vtcm_malloc       0000000c halide_vtcm_malloc
That ResizeNearestNeighbor.a is the AOT generation result. It already has references to these functions at this stage.