Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 01:40

    zvookin on llvm_vector_predication_intrinsics

    Improve error-handling in Pytho… Refactor buffer-unpacking code … Fixes for Xcode "new" build sys… and 32 more (compare)

  • 00:29
    steven-johnson commented #7012
  • 00:29
    steven-johnson synchronize #7012
  • 00:29

    steven-johnson on fix_round

    Add stack-size-canary test to a… Handle widen_right_* intrinsics… Revert "Temporarily disable tes… and 5 more (compare)

  • 00:16

    zvookin on void_call_intrinsics

    (compare)

  • 00:16

    zvookin on main

    Allow call_intrin to call an LL… (compare)

  • 00:16
    zvookin closed #7048
  • Sep 23 23:48
    zvookin synchronize #7048
  • Sep 23 23:48

    zvookin on void_call_intrinsics

    Formatting. (compare)

  • Sep 23 23:46
    zvookin synchronize #7048
  • Sep 23 23:46

    zvookin on void_call_intrinsics

    Test for void type in a consist… (compare)

  • Sep 23 23:44
    zvookin synchronize #7048
  • Sep 23 23:44

    zvookin on void_call_intrinsics

    Handle case by making get_vecto… (compare)

  • Sep 23 23:42
    abadams synchronize #7012
  • Sep 23 23:42

    abadams on fix_round

    Remove defunct comment (compare)

  • Sep 23 23:19
    steven-johnson commented #7047
  • Sep 23 23:10
    abadams commented #7047
  • Sep 23 23:09
    zvookin commented #7047
  • Sep 23 23:06
    abadams synchronize #7012
  • Sep 23 23:06

    abadams on fix_round

    Take care to never revisit args… (compare)

Alex Reinking
@alexreinking:matrix.org
[m]
Yikes!
aalan
@asouza_:matrix.org
[m]
Ooops
steven-johnson
@steven-johnson:matrix.org
[m]
Seems to be fixed now, see libeigen/eigen#2336 if you are curious
Abhishek Saxena
@AbyShk95
Hi all...
A question...
For the resize app: https://github.com/halide/Halide/tree/master/apps/resize, when I execute it for arm-64-android, I see the performance is quite slow compared to some libraries using SIMD (tested for linear). Specifically, I tried Ne10 bilinear: https://projectne10.github.io/Ne10/doc/group__IMG__RESIZE.html
Is this expected? or is the schedule pushed on git more suited for some other target?
Dillon Sharlet
@dsharletg
I think linear is probably a special case where writing code specifically for bilinear is going to be better
the approach used in the resize app makes more sense for bigger more expensive kernels
Abhishek Saxena
@AbyShk95

I see... but actually the difference in my test was huge on android... Like 4x slower.
I actually tested the same for desktop (i9) too, where timings for the resize app was

planar    linear     uint8  0.50  time: 0.069222 ms
packed    linear     uint8  0.50  time: 0.092122 ms
Success!

and then I tried something on python simd libs, lycon library : https://github.com/ethereon/lycon and the time for doing the same resize was 0.035506 ms
Hence I was a bit curious whether its a schedule issue or is it expected...

Ryan Stout
@ryanstout
sorry if I asked this before, are there any plans to merge the gpu autoscheduler? Thanks!
Nikola Smiljanić
@popizdeh
Does Output<Buffer<>> type need to be handled at generation time? I need to generate for uint8_t, uint16_t and any other type I'd like to use?
Andrew Adams
@abadams
Yes, the type affects what instructions are selected, so it needs to be given at compile time as a generator param.
@AbyShk95 +1 to what Dillon said, but also: Are you making an image larger or smaller? It's very common for bilinear interpolation to be done incorrectly in a way that makes it much much faster when making images smaller.
Also, if it's bilinear resizing by a compile-time-known amount (e.g. 2x), you can do much much better than the resize app with custom code for it
The resize app is very generic. It works for any filter and any resize amount.
Nikola Smiljanić
@popizdeh
I'm having trouble with input buffer checking, is there a way to make the buffer "optional" or disable checking? To give you more info, I have Input<bool> lutEnabled and Input<Buffer<>> lut and I use f(x, y) = select(lutEnabled, lut(x, y), input(x, y));. My hope was that specializing for lutEnabled == false would produce code where I'm allowed to pass a nullptr for lut input buffer but that's not the case. Bounds checking is done as a top-level check which requires me to pass a valid buffer even when lutEnabled is false. Any ideas?
Ashish Uthama
@ashishUthama
Does Halide have a way to enforce at compile time that all buffers have a min dimension extent as 0? (I assume it makes no difference to generated code, but it would make generated stmt slightly easier to read),
Tzu-Mao Li
@BachiLi

sorry if I asked this before, are there any plans to merge the gpu autoscheduler? Thanks!

@TH3CHARLie and Luke Anderson are looking into this, but I think they are happy to receive help

Ryan Stout
@ryanstout
@BachiLi thanks for the info. I would offer to help if I could, but my C++ skills are pretty lacking, so I'm not sure I would be of any help. (I'm using the python bindings) Thanks
Xuanda Yang
@TH3CHARLie
will update the progress in halide/Halide#5602 once I got my machine
Dennis van den Berg
@dpvdberg
I was wondering whether it is possible to schedule my entire pipeline based on a specialization() call. My goal is to do the following: I have some parameters to my algorithm and for each parameter, I run the auto-scheduler to generate a schedule.h. I want to include each schedule and use each one according to the parameter value.
Dennis van den Berg
@dpvdberg
In fact, I want to just be able to select completely different schedules based on a parameter at runtime.
steven-johnson
@steven-johnson:matrix.org
[m]
Yes, you probably could use specialize() to accomplish that. But it might be simpler to manage if you just generate separate AOT-compiled filters (one per schedule) and move the selecton into ordinary C++ code at the call site.
Dennis van den Berg
@dpvdberg

I'm trying to schedule my program on the GPU, the halide profiler is telling me:

average threads used: 0.900000
heap allocations: 0  peak heap usage: 0 bytes
  halide_malloc:         0.000ms   (0%)    threads: 0.000
  halide_free:           0.000ms   (0%)    threads: 0.000
  endiannessSwapWordOut: 1.015ms   (100%)  threads: 0.900

This thread usage of 0.9 is worrying me. I checked and the debug info (after enabling) shows me
halide_opencl_run (user_context: 0x0, entry: _kernel_endiannessSwapWordOut_s0_wordIdx___block_id_x, blocks: 4313x1x1, threads: 4x1x1, ...
So it is running on multiple blocks and threads. What does this 0.9 thread utilization tell me?

Xuanda Yang
@TH3CHARLie
Does Halide support any Nvidia GPU with capability == 8.6 ? (e.g. RTX 30{70-90})
from CodeGen_PTX_Dev::mcpu() in CodeGen_PTX_Dev.cpp, it supports until sm80. Has anyone tried run GPU codegen on RTX cards?
Alex Reinking
@alexreinking:matrix.org
[m]
Not as far as I know, but I recently acquired a 3090, so hopefully that will change soon!
Volodymyr Kysenko
@vksnk:matrix.org
[m]
I might be wrong but I thought that when you compile for say sm80 this means that 8.0 is the minimum capability it is expected to run on, so it should work on everything after that (like 8.6)?
that being said, if you compile for later capability it might be able to optimize better
ivangarcia44
@ivangarcia44
If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?
ivangarcia44
@ivangarcia44
Even with target=x86-64-linux-disable_llvm_loop_opt, I notice xmm* registers being used in the output assembly file (fileName.s). Does that mean there is auto vectorization going on somewhere in generation pipeline?
Xuanda Yang
@TH3CHARLie

Not as far as I know, but I recently acquired a 3090, so hopefully that will change soon!

halide/Halide#6334 changing now!

steven-johnson
@steven-johnson:matrix.org
[m]
If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?

If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?

No scheduling primitives will be used unless you specify them.

1 reply

Even with target=x86-64-linux-disable_llvm_loop_opt, I notice xmm* registers being used in the output assembly file (fileName.s). Does that mean there is auto vectorization going on somewhere in generation pipeline?

No: Halide assumes that SSE2 is present for all x86-64 architectures, and uses the XMM register for scalar floating point operations

1 reply
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]
Does someone aware about a 'simple' to have autodiff in 2D?
ref/details:
https://github.com/halide/Halide/discussions/6347
aalan
@asouza_:matrix.org
[m]
Hello Soufiane KHIAT if I am not mistaken you could already do what is being proposed with the region parameter and a auxiliar array (an vjp)
1 reply
Jonathan Ragan-Kelley
@jrk
I think @BachiLi is the right person for the autodiff question above!
Vlad Levenfeld
@vladl-innopeaktech_gitlab
Do I need the Hexagon SDK in order to generate Hexagon code or can I do this with Halide master branch? I am trying to add some instrumentation to my generated code (maybe there's an easier way to do this)
Vlad Levenfeld
@vladl-innopeaktech_gitlab
And, if I can do it from master branch, how do I build the tools dir (GenGen.cpp and such)
Vlad Levenfeld
@vladl-innopeaktech_gitlab
Nevermind, I was having some linking issues and just needed a sanity check. Sorry for the noise
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]
Another discussion, how could we implement and Halide version of the "InsertKey" similar to std::unordered_map or std::set.
Idea: storing unique keys in a contigious array.
https://github.com/halide/Halide/discussions/6373
Alex Reinking
@alexreinking:matrix.org
[m]
@dsharletg: Are there any changes to the Hexagon backend that should make the release notes? Trying to push Halide 13 out the door.
Soufiane KHIAT
@soufiane.khiat:matrix.org
[m]

I have this Issue:

Condition failed: in.is_bounded()
Unbounded producer->consumer relationship: Vertices-> FaceNormal

When I try to read an array with a buffer of indices.

ref:
https://github.com/halide/Halide/issues/4108#issuecomment-956546487
halide/Halide#4108

Alex Reinking
@alexreinking:matrix.org
[m]
aalan
@asouza_:matrix.org
[m]
Congrats to all the team
Hello Alex Reinking very interesting the link about the Photoshop on the web. Do you have any more information about the use of halide on web projects or on Photoshop? Thanks
Alex Reinking
@alexreinking:matrix.org
[m]
I don't work for Adobe, so I'm sorry to say I do not
I know that @steven-johnson and @shoaibkamil have been involved in the WASM backend
shoaibkamil
@shoaibkamil:matrix.org
[m]
The backend was all Steven :). I don’t have more details to share other than those shared on the blog post linked from the release notes.
steven-johnson
@steven-johnson:matrix.org
[m]
well, give a lot of credit to the WebAssembly team for the LLVM backend we use... :-)