Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 05:32
    aekul commented #6856
  • 05:27
    aekul synchronize #6856
  • 05:22
    aekul synchronize #6856
  • 00:41
    steven-johnson synchronize #6951
  • 00:41

    steven-johnson on arm-optimize

    trigger buildbots (compare)

  • 00:24
    steven-johnson commented #6726
  • 00:19
    steven-johnson synchronize #6764
  • 00:19

    steven-johnson on pygen-three

    Two quick build fixes (#6950) … Remove add_python_aot_extension… Merge branch 'main' into srj/py… and 1 more (compare)

  • Aug 17 23:23
    steven-johnson review_requested #6952
  • Aug 17 23:22
    steven-johnson synchronize #6952
  • Aug 17 23:22

    steven-johnson on python-helpers-2

    Two quick build fixes (#6950) … Remove add_python_aot_extension… Merge branch 'main' into srj/py… (compare)

  • Aug 17 23:20

    steven-johnson on python-helpers

    (compare)

  • Aug 17 23:20
    steven-johnson edited #6952
  • Aug 17 23:20

    steven-johnson on main

    Remove add_python_aot_extension… (compare)

  • Aug 17 23:20
    steven-johnson closed #6949
  • Aug 17 23:20
    steven-johnson commented #6764
  • Aug 17 23:19
    steven-johnson synchronize #6764
  • Aug 17 23:19

    steven-johnson on pygen-three

    WIP Don't throw an exception from g… Handle saturating_cast in compu… and 3 more (compare)

  • Aug 17 22:17

    steven-johnson on quick-fixes

    (compare)

  • Aug 17 22:17

    steven-johnson on main

    Two quick build fixes (#6950) … (compare)

Dennis van den Berg
@dpvdberg

I'm trying to schedule my program on the GPU, the halide profiler is telling me:

average threads used: 0.900000
heap allocations: 0  peak heap usage: 0 bytes
  halide_malloc:         0.000ms   (0%)    threads: 0.000
  halide_free:           0.000ms   (0%)    threads: 0.000
  endiannessSwapWordOut: 1.015ms   (100%)  threads: 0.900

This thread usage of 0.9 is worrying me. I checked and the debug info (after enabling) shows me
halide_opencl_run (user_context: 0x0, entry: _kernel_endiannessSwapWordOut_s0_wordIdx___block_id_x, blocks: 4313x1x1, threads: 4x1x1, ...
So it is running on multiple blocks and threads. What does this 0.9 thread utilization tell me?

Xuanda Yang
@TH3CHARLie
Does Halide support any Nvidia GPU with capability == 8.6 ? (e.g. RTX 30{70-90})
from CodeGen_PTX_Dev::mcpu() in CodeGen_PTX_Dev.cpp, it supports until sm80. Has anyone tried run GPU codegen on RTX cards?
Alex Reinking
@alexreinking:matrix.org
[m]
Not as far as I know, but I recently acquired a 3090, so hopefully that will change soon!
Volodymyr Kysenko
@vksnk:matrix.org
[m]
I might be wrong but I thought that when you compile for say sm80 this means that 8.0 is the minimum capability it is expected to run on, so it should work on everything after that (like 8.6)?
that being said, if you compile for later capability it might be able to optimize better
ivangarcia44
@ivangarcia44
If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?
ivangarcia44
@ivangarcia44
Even with target=x86-64-linux-disable_llvm_loop_opt, I notice xmm* registers being used in the output assembly file (fileName.s). Does that mean there is auto vectorization going on somewhere in generation pipeline?
Xuanda Yang
@TH3CHARLie

Not as far as I know, but I recently acquired a 3090, so hopefully that will change soon!

halide/Halide#6334 changing now!

steven-johnson
@steven-johnson:matrix.org
[m]
If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?

If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?

No scheduling primitives will be used unless you specify them.

1 reply

Even with target=x86-64-linux-disable_llvm_loop_opt, I notice xmm* registers being used in the output assembly file (fileName.s). Does that mean there is auto vectorization going on somewhere in generation pipeline?

No: Halide assumes that SSE2 is present for all x86-64 architectures, and uses the XMM register for scalar floating point operations

1 reply
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]
Does someone aware about a 'simple' to have autodiff in 2D?
ref/details:
https://github.com/halide/Halide/discussions/6347
aalan
@asouza_:matrix.org
[m]
Hello Soufiane KHIAT if I am not mistaken you could already do what is being proposed with the region parameter and a auxiliar array (an vjp)
1 reply
Jonathan Ragan-Kelley
@jrk
I think @BachiLi is the right person for the autodiff question above!
Vlad Levenfeld
@vladl-innopeaktech_gitlab
Do I need the Hexagon SDK in order to generate Hexagon code or can I do this with Halide master branch? I am trying to add some instrumentation to my generated code (maybe there's an easier way to do this)
Vlad Levenfeld
@vladl-innopeaktech_gitlab
And, if I can do it from master branch, how do I build the tools dir (GenGen.cpp and such)
Vlad Levenfeld
@vladl-innopeaktech_gitlab
Nevermind, I was having some linking issues and just needed a sanity check. Sorry for the noise
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]
Another discussion, how could we implement and Halide version of the "InsertKey" similar to std::unordered_map or std::set.
Idea: storing unique keys in a contigious array.
https://github.com/halide/Halide/discussions/6373
Alex Reinking
@alexreinking:matrix.org
[m]
@dsharletg: Are there any changes to the Hexagon backend that should make the release notes? Trying to push Halide 13 out the door.
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]

I have this Issue:

Condition failed: in.is_bounded()
Unbounded producer->consumer relationship: Vertices-> FaceNormal

When I try to read an array with a buffer of indices.

ref:
https://github.com/halide/Halide/issues/4108#issuecomment-956546487
halide/Halide#4108

Alex Reinking
@alexreinking:matrix.org
[m]
aalan
@asouza_:matrix.org
[m]
Congrats to all the team
Hello Alex Reinking very interesting the link about the Photoshop on the web. Do you have any more information about the use of halide on web projects or on Photoshop? Thanks
Alex Reinking
@alexreinking:matrix.org
[m]
I don't work for Adobe, so I'm sorry to say I do not
I know that @steven-johnson and @shoaibkamil have been involved in the WASM backend
shoaibkamil
@shoaibkamil:matrix.org
[m]
The backend was all Steven :). I don’t have more details to share other than those shared on the blog post linked from the release notes.
steven-johnson
@steven-johnson:matrix.org
[m]
well, give a lot of credit to the WebAssembly team for the LLVM backend we use... :-)
steven-johnson
@steven-johnson:matrix.org
[m]
Looks like there's an LLVM top-of-tree failure on some of the bots -- I'll get to it after lunch
steven-johnson
@steven-johnson:matrix.org
[m]
Svenn-Arne Dragly
@dragly

I am working "serializing" a Python object with Expr members to a Halide Func. In the process, I end up having a function with a large number of explicit definitions in one dimension. Unfortunately, I am not able to make those be calculated in an efficient way - once for each value while and sharing potential pre-calcualted values. In particular, this code:

f = Func("f")
f[row, col] = 0.0
f[row, 0] = 1.0 + sqrt(row*row)
f[row, 1] = 2.0 + sqrt(row*row)
f[row, 2] = 3.0 + sqrt(row*row)
f[row, 3] = 4.0 + sqrt(row*row)

g = Func("g")
g[row, col] = f[row, col] + 42.0

g.compile_to_lowered_stmt("out.txt", [], StmtOutputFormat.Text)
print(np.asanyarray(g.realize(2, 4)))

Leads to the following generated code:

  for (g.s0.col, g.min.1, g.extent.1) {
  ...
   for (g.s0.row, g.min.0, g.extent.0) {
    allocate f[float32 * 1 * (max(t6, 3) + 1)]
    produce f {
     f[t7] = 0.000000f
     f[t8] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 1.000000f
     f[t9] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 2.000000f
     f[t10] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 3.000000f
     f[t11] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 4.000000f
    }
    consume f {
     g[g.s0.row + t12] = f[t7] + 42.000000f
    }

Unfortunately, Halide does not notice that only one value of f is needed, and calculates all of f for each g. I guess this is expected.

Calling f.compute_root() helps reduce the number of calculations, but results in code with 4 four loops over row instead. This is problematic in my actual use-case, because it no longer automatically shares values that can be pre-calculated (such as the sqrt above).

Is there a way to get Halide to calculate f for each explicitly set col in one loop over row?

6 replies
Ashish Uthama
@ashishUthama

upgrading from Halide 12 to Halide 14 (tip)
running into a lot of:

Unhandled exception: Error: Cannot split a loop variable resulting from a split using PredicateLoads or PredicateStores.

Right now, it looks like something related to tile() with tailstrategy omitted (i.e the default Auto) . Does this ring a bell? (will dig more in a bit)

Did some defaults change?
@dsharletg - likely related to halide/Halide#6020 ? (I'll explore by adding explicit tailstrategies to the code)
Dillon Sharlet
@dsharletg
That shouldn't happen if you weren't explicitly using PredicateLoads or PredicateStores
Ashish Uthama
@ashishUthama
I am not, will try to create a repro and file an issue
Source code worked as-is on Halide 12
Ashish Uthama
@ashishUthama
@alexreinking:matrix.org - thanks for the comment on the windows build. We build Halide from source, but with the regular versioned release cadence - I am considering just using the released binaries.
Alex Reinking
@alexreinking:matrix.org
[m]
Sure! I'd still like to understand why your build was failing, though :)
Ashish Uthama
@ashishUthama
The downloaded libraries appear to be significantly larger than what I build locally, ~150MB vs ~45MB. Would you have any thoughts on why that may be?
Alex Reinking
@alexreinking:matrix.org
[m]
It could be a difference in how we're building LLVM or what targets are enabled?
Ashish Uthama
@ashishUthama
I would like to too .. but not sure how to proceed :( I checked the definition in VS and it correctly opened up that header
These are our cmake flags:
51 CMAKE_HALIDE_OPTIONS:= \
52 -DLLVM_DIR=${LLVM_ROOT}/release/lib/cmake/llvm \
53 -DCLANG=${CLANG} \
54 -DWARNINGS_AS_ERRORS=OFF \
55 -DWITH_PYTHON_BINDINGS=OFF \
56 -DWITH_TEST_AUTO_SCHEDULE=ON \
57 -DWITH_TEST_CORRECTNESS=OFF \
58 -DWITH_TEST_ERROR=ON \
59 -DWITH_TEST_WARNING=ON \
60 -DWITH_TEST_PERFORMANCE=ON \
61 -DWITH_TEST_OPENGL=OFF \
62 -DWITH_TEST_GENERATOR=ON \
63 -DWITH_APPS=OFF \
64 -DWITH_TUTORIALS=OFF \
65 -DWITH_DOCS=OFF \
66 -DWITH_UTILS=OFF .
oh, llvm - ok, likely that. Will double check, it might be that our in house llvm build has limited targets.
Alex Reinking
@alexreinking:matrix.org
[m]
That would make sense. Our binaries contain all supported backends
Ashish Uthama
@ashishUthama
yes, that explains it.Thanks! And -- thanks for the push to versioned regular releases, Much appreciated!
Alex Reinking
@alexreinking:matrix.org
[m]
Totally! Hopefully we'll be able to get the release latency (after LLVM) even tighter for the next releases :)
Derek Gerstmann
@derek-gerstmann
@alexreinking:matrix.org Hiya! I'm looking at backporting PR #6405 from master to create a v13.0.1 release. Should I create a "backports/13.x" branch and merge things in there, and then push the results into "releases/13.x"? Just trying to match what Andrew and I are seeing in the repo to follow conventions. Any suggestions?
Alex Reinking
@alexreinking:matrix.org
[m]
I use backports/N.x for staging changes to release/N.x. When it's ready, I open a PR with release/N.x as the target branch. There is CI set up for this scenario. Be sure to include a commit that bumps the version to 13.0.1.