Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 02:14
    rootjalex review_requested #6947
  • 02:13
    rootjalex opened #6947
  • 02:13

    rootjalex on compute-sat-cast

    Handle saturating_cast in compu… (compare)

  • 01:06

    steven-johnson on no-throw-in-catch

    (compare)

  • 01:06

    steven-johnson on main

    Don't throw an exception from g… (compare)

  • 01:06
    steven-johnson closed #6946
  • Aug 16 22:17
    steven-johnson synchronize #6764
  • Aug 16 22:17

    steven-johnson on pygen-three

    Fix badly-merged CMakePresets.j… Add minimal useful implementati… Export HalidePythonExtensionHel… and 2 more (compare)

  • Aug 16 22:16

    steven-johnson on install-py-helpers

    (compare)

  • Aug 16 22:15

    steven-johnson on py-readme

    (compare)

  • Aug 16 22:15

    steven-johnson on main

    Add/update Python Readme (#6939… (compare)

  • Aug 16 22:15
    steven-johnson closed #6939
  • Aug 16 22:14
    steven-johnson opened #6946
  • Aug 16 22:14
    steven-johnson review_requested #6946
  • Aug 16 22:14
    steven-johnson review_requested #6946
  • Aug 16 22:14

    steven-johnson on no-throw-in-catch

    Don't throw an exception from g… (compare)

  • Aug 16 21:42
    alexreinking synchronize #6938
  • Aug 16 21:42

    alexreinking on pip

    Add skeleton pip packaging work… (compare)

  • Aug 16 21:41
    alexreinking synchronize #6938
  • Aug 16 21:41

    alexreinking on pip

    Export HalidePythonExtensionHel… update pip.yml (compare)

Andrew Adams
@abadams
Also, if it's bilinear resizing by a compile-time-known amount (e.g. 2x), you can do much much better than the resize app with custom code for it
The resize app is very generic. It works for any filter and any resize amount.
Nikola Smiljanić
@popizdeh
I'm having trouble with input buffer checking, is there a way to make the buffer "optional" or disable checking? To give you more info, I have Input<bool> lutEnabled and Input<Buffer<>> lut and I use f(x, y) = select(lutEnabled, lut(x, y), input(x, y));. My hope was that specializing for lutEnabled == false would produce code where I'm allowed to pass a nullptr for lut input buffer but that's not the case. Bounds checking is done as a top-level check which requires me to pass a valid buffer even when lutEnabled is false. Any ideas?
Ashish Uthama
@ashishUthama
Does Halide have a way to enforce at compile time that all buffers have a min dimension extent as 0? (I assume it makes no difference to generated code, but it would make generated stmt slightly easier to read),
Tzu-Mao Li
@BachiLi

sorry if I asked this before, are there any plans to merge the gpu autoscheduler? Thanks!

@TH3CHARLie and Luke Anderson are looking into this, but I think they are happy to receive help

Ryan Stout
@ryanstout
@BachiLi thanks for the info. I would offer to help if I could, but my C++ skills are pretty lacking, so I'm not sure I would be of any help. (I'm using the python bindings) Thanks
Xuanda Yang
@TH3CHARLie
will update the progress in halide/Halide#5602 once I got my machine
Dennis van den Berg
@dpvdberg
I was wondering whether it is possible to schedule my entire pipeline based on a specialization() call. My goal is to do the following: I have some parameters to my algorithm and for each parameter, I run the auto-scheduler to generate a schedule.h. I want to include each schedule and use each one according to the parameter value.
Dennis van den Berg
@dpvdberg
In fact, I want to just be able to select completely different schedules based on a parameter at runtime.
steven-johnson
@steven-johnson:matrix.org
[m]
Yes, you probably could use specialize() to accomplish that. But it might be simpler to manage if you just generate separate AOT-compiled filters (one per schedule) and move the selecton into ordinary C++ code at the call site.
Dennis van den Berg
@dpvdberg

I'm trying to schedule my program on the GPU, the halide profiler is telling me:

average threads used: 0.900000
heap allocations: 0  peak heap usage: 0 bytes
  halide_malloc:         0.000ms   (0%)    threads: 0.000
  halide_free:           0.000ms   (0%)    threads: 0.000
  endiannessSwapWordOut: 1.015ms   (100%)  threads: 0.900

This thread usage of 0.9 is worrying me. I checked and the debug info (after enabling) shows me
halide_opencl_run (user_context: 0x0, entry: _kernel_endiannessSwapWordOut_s0_wordIdx___block_id_x, blocks: 4313x1x1, threads: 4x1x1, ...
So it is running on multiple blocks and threads. What does this 0.9 thread utilization tell me?

Xuanda Yang
@TH3CHARLie
Does Halide support any Nvidia GPU with capability == 8.6 ? (e.g. RTX 30{70-90})
from CodeGen_PTX_Dev::mcpu() in CodeGen_PTX_Dev.cpp, it supports until sm80. Has anyone tried run GPU codegen on RTX cards?
Alex Reinking
@alexreinking:matrix.org
[m]
Not as far as I know, but I recently acquired a 3090, so hopefully that will change soon!
Volodymyr Kysenko
@vksnk:matrix.org
[m]
I might be wrong but I thought that when you compile for say sm80 this means that 8.0 is the minimum capability it is expected to run on, so it should work on everything after that (like 8.6)?
that being said, if you compile for later capability it might be able to optimize better
ivangarcia44
@ivangarcia44
If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?
ivangarcia44
@ivangarcia44
Even with target=x86-64-linux-disable_llvm_loop_opt, I notice xmm* registers being used in the output assembly file (fileName.s). Does that mean there is auto vectorization going on somewhere in generation pipeline?
Xuanda Yang
@TH3CHARLie

Not as far as I know, but I recently acquired a 3090, so hopefully that will change soon!

halide/Halide#6334 changing now!

steven-johnson
@steven-johnson:matrix.org
[m]
If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?

If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?

No scheduling primitives will be used unless you specify them.

1 reply

Even with target=x86-64-linux-disable_llvm_loop_opt, I notice xmm* registers being used in the output assembly file (fileName.s). Does that mean there is auto vectorization going on somewhere in generation pipeline?

No: Halide assumes that SSE2 is present for all x86-64 architectures, and uses the XMM register for scalar floating point operations

1 reply
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]
Does someone aware about a 'simple' to have autodiff in 2D?
ref/details:
https://github.com/halide/Halide/discussions/6347
aalan
@asouza_:matrix.org
[m]
Hello Soufiane KHIAT if I am not mistaken you could already do what is being proposed with the region parameter and a auxiliar array (an vjp)
1 reply
Jonathan Ragan-Kelley
@jrk
I think @BachiLi is the right person for the autodiff question above!
Vlad Levenfeld
@vladl-innopeaktech_gitlab
Do I need the Hexagon SDK in order to generate Hexagon code or can I do this with Halide master branch? I am trying to add some instrumentation to my generated code (maybe there's an easier way to do this)
Vlad Levenfeld
@vladl-innopeaktech_gitlab
And, if I can do it from master branch, how do I build the tools dir (GenGen.cpp and such)
Vlad Levenfeld
@vladl-innopeaktech_gitlab
Nevermind, I was having some linking issues and just needed a sanity check. Sorry for the noise
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]
Another discussion, how could we implement and Halide version of the "InsertKey" similar to std::unordered_map or std::set.
Idea: storing unique keys in a contigious array.
https://github.com/halide/Halide/discussions/6373
Alex Reinking
@alexreinking:matrix.org
[m]
@dsharletg: Are there any changes to the Hexagon backend that should make the release notes? Trying to push Halide 13 out the door.
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]

I have this Issue:

Condition failed: in.is_bounded()
Unbounded producer->consumer relationship: Vertices-> FaceNormal

When I try to read an array with a buffer of indices.

ref:
https://github.com/halide/Halide/issues/4108#issuecomment-956546487
halide/Halide#4108

Alex Reinking
@alexreinking:matrix.org
[m]
aalan
@asouza_:matrix.org
[m]
Congrats to all the team
Hello Alex Reinking very interesting the link about the Photoshop on the web. Do you have any more information about the use of halide on web projects or on Photoshop? Thanks
Alex Reinking
@alexreinking:matrix.org
[m]
I don't work for Adobe, so I'm sorry to say I do not
I know that @steven-johnson and @shoaibkamil have been involved in the WASM backend
shoaibkamil
@shoaibkamil:matrix.org
[m]
The backend was all Steven :). I don’t have more details to share other than those shared on the blog post linked from the release notes.
steven-johnson
@steven-johnson:matrix.org
[m]
well, give a lot of credit to the WebAssembly team for the LLVM backend we use... :-)
steven-johnson
@steven-johnson:matrix.org
[m]
Looks like there's an LLVM top-of-tree failure on some of the bots -- I'll get to it after lunch
steven-johnson
@steven-johnson:matrix.org
[m]
Svenn-Arne Dragly
@dragly

I am working "serializing" a Python object with Expr members to a Halide Func. In the process, I end up having a function with a large number of explicit definitions in one dimension. Unfortunately, I am not able to make those be calculated in an efficient way - once for each value while and sharing potential pre-calcualted values. In particular, this code:

f = Func("f")
f[row, col] = 0.0
f[row, 0] = 1.0 + sqrt(row*row)
f[row, 1] = 2.0 + sqrt(row*row)
f[row, 2] = 3.0 + sqrt(row*row)
f[row, 3] = 4.0 + sqrt(row*row)

g = Func("g")
g[row, col] = f[row, col] + 42.0

g.compile_to_lowered_stmt("out.txt", [], StmtOutputFormat.Text)
print(np.asanyarray(g.realize(2, 4)))

Leads to the following generated code:

  for (g.s0.col, g.min.1, g.extent.1) {
  ...
   for (g.s0.row, g.min.0, g.extent.0) {
    allocate f[float32 * 1 * (max(t6, 3) + 1)]
    produce f {
     f[t7] = 0.000000f
     f[t8] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 1.000000f
     f[t9] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 2.000000f
     f[t10] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 3.000000f
     f[t11] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 4.000000f
    }
    consume f {
     g[g.s0.row + t12] = f[t7] + 42.000000f
    }

Unfortunately, Halide does not notice that only one value of f is needed, and calculates all of f for each g. I guess this is expected.

Calling f.compute_root() helps reduce the number of calculations, but results in code with 4 four loops over row instead. This is problematic in my actual use-case, because it no longer automatically shares values that can be pre-calculated (such as the sqrt above).

Is there a way to get Halide to calculate f for each explicitly set col in one loop over row?

6 replies
Ashish Uthama
@ashishUthama

upgrading from Halide 12 to Halide 14 (tip)
running into a lot of:

Unhandled exception: Error: Cannot split a loop variable resulting from a split using PredicateLoads or PredicateStores.

Right now, it looks like something related to tile() with tailstrategy omitted (i.e the default Auto) . Does this ring a bell? (will dig more in a bit)

Did some defaults change?
@dsharletg - likely related to halide/Halide#6020 ? (I'll explore by adding explicit tailstrategies to the code)
Dillon Sharlet
@dsharletg
That shouldn't happen if you weren't explicitly using PredicateLoads or PredicateStores
Ashish Uthama
@ashishUthama
I am not, will try to create a repro and file an issue
Source code worked as-is on Halide 12
Ashish Uthama
@ashishUthama
@alexreinking:matrix.org - thanks for the comment on the windows build. We build Halide from source, but with the regular versioned release cadence - I am considering just using the released binaries.
Alex Reinking
@alexreinking:matrix.org
[m]
Sure! I'd still like to understand why your build was failing, though :)