Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 08:18
    dpvdberg commented #4180
  • Oct 16 08:58
    LebedevRI commented #6298
  • Oct 16 02:30
    vksnk commented #6327
  • Oct 16 02:29
    vksnk commented #6327
  • Oct 16 02:06
    vksnk synchronize #6327
  • Oct 16 02:06

    vksnk on bound-storage

    how hard can it be (compare)

  • Oct 16 01:34
    vksnk synchronize #6327
  • Oct 16 01:34

    vksnk on bound-storage

    Update CMakeLists.txt (compare)

  • Oct 16 01:31
    vksnk synchronize #6327
  • Oct 16 01:31

    vksnk on bound-storage

    Per dim bound_storage (compare)

  • Oct 16 00:15
    steven-johnson opened #6328
  • Oct 16 00:15
    steven-johnson review_requested #6328
  • Oct 16 00:15

    steven-johnson on gather_nd

    [hannk] Improve GatherOp We (m… (compare)

  • Oct 15 22:30

    abadams on averaging_tree

    Benchmark supports more methods (compare)

  • Oct 15 22:09
    alexreinking commented #6298
  • Oct 15 22:08
    alexreinking commented #6298
  • Oct 15 21:50
    alexreinking commented #6298
  • Oct 15 21:46
    LebedevRI commented #6298
  • Oct 15 21:32
    alexreinking commented #6298
  • Oct 15 21:18
    LebedevRI commented #6298
steven-johnson
@steven-johnson:matrix.org
[m]
You need to have libpng available on your system when you build Halide.
steven-johnson
@steven-johnson:matrix.org
[m]
Good morning all, buildbots down due to a PG&E power outage last night, bringing them back up nbow
steven-johnson
@steven-johnson:matrix.org
[m]
Should be good to go now.
David Ibbitson
@dibbitson

Hi -- is something like this possible:


template <typename T>
class my_generator<T> : public Halide::Generator<my_generator<T>>
{
public:
    Input<Buffer<T>> input{"input", 2};
    Output<Buffer<T>> output{"output", 2};
}

The compilation errors tell me it's probably not, but I could be doing something wrong. Is it possible to achieve this another way?

aalan
@asouza_:matrix.org
[m]
hello @dibbitson , it is possible to use the template T, but is also necessary to call directly some of the base functions and to simplify some of the calls with "using", for instance
    template <typename T = float>
    class test : public Halide::Generator<test<T>>
    {
    public:
        typedef Halide::Generator<test<T>> Base;
        using Base::auto_schedule;
        template <typename T2>
        using Input = typename Base::template Input<T2>;
        template <typename T2>
        using Output = typename Base::template Output<T2>;

        Input<Halide::Buffer<T>> input{"input", 2};
        Output<Halide::Buffer<T>> output{"output", 2};
David Ibbitson
@dibbitson
@asouza_:matrix.org thanks that is very helpful. I did just find out about GeneratorParam<Type> Can that also be used to achieve a similar result?
Ashish Uthama
@ashishUthama
All -given a pipeline, is there anyway I can get to the type of the first input?

I want to use something like:

.split(x, x, xi, natural_vector_size(FIRSTINPUT.type()), TailStrategy::GuardWithIf)

but I only see API's to get func (get_func), which does not have .type()

Andrew Adams
@abadams
my_func.value().type()?
but OutputImageParam also has a .type()
that's the base class for most input-buffery-type-things
Ashish Uthama
@ashishUthama

uniqueHalide.schedule.h:41:85: error: cannot call member function 'int Halide::Target::natural_vector_size(const Halide::Type&) const' without object
41 | .split(x, x, xi, ::Halide::Target::natural_vector_size(ulabel.value().type()), TailStrategy::GuardWithIf)
where
Func ulabel = pipeline.get_func(3);

I am trying to see if I can minimally change an autogenerated (simplish) schedule to be a bit type agnostic

not sure how to get the OutputImageParam from a pipeline..will search
Suyog
@suyogsarda_twitter
@abadams @dsharletg any idea how do we check the sanity of the auto-generated schedule? By sanity, i mean compilation correctness. I think currently, there is no mechanism to check it since the auto-generated schedule don't undergo any native compiler check. Any pointers to this problem would be really helpful
Dillon Sharlet
@dsharletg
It's a bug if changing the schedule changes the result of the program (ignoring floating point reassociation changes). Of course that doesn't mean it can't happen... Whenever I need to investigate something like this, I usually write a test that runs two versions of the pipeline on the same input data, each with different schedules.
Chris Taylor
@catid
Getting a mysterious error in a new generator:
Unhandled exception: Error: Can't access output buffer of undefined Func.
Chris Taylor
@catid
Ah it's because I tried to access the .width() property of an output buffer
MotivaCG
@MotivaCG
Hi all!
I have a couple of questions about the support of onnx but I'm not sure if this is the right place.
First one I know that I can convert from onnx to halide's format, but does it mean that I can pretrain a net store it in onnx and make the inference using halide?
If so, what about performance?
Nikola Smiljanić
@popizdeh
Is there a way to constrain parameter passed to Func::specialize? Say if my Input<int> can only have values [0-3], I'd like the last else in the generated code to handle the value 3, not the generic case...
2 replies
steven-johnson
@steven-johnson:matrix.org
[m]
Hey all -- looks like the GitLab repo for libeigen has vanished (!), breaking our builds as well as many others. I'm temporarily revving the buildbot to skip the relevant parts of Halide, will back up shortly. (I presume this will get fixed externally before too long...)
Alex Reinking
@alexreinking:matrix.org
[m]
Yikes!
aalan
@asouza_:matrix.org
[m]
Ooops
steven-johnson
@steven-johnson:matrix.org
[m]
Seems to be fixed now, see libeigen/eigen#2336 if you are curious
Abhishek G Saxena
@AbyShk95
Hi all...
A question...
For the resize app: https://github.com/halide/Halide/tree/master/apps/resize, when I execute it for arm-64-android, I see the performance is quite slow compared to some libraries using SIMD (tested for linear). Specifically, I tried Ne10 bilinear: https://projectne10.github.io/Ne10/doc/group__IMG__RESIZE.html
Is this expected? or is the schedule pushed on git more suited for some other target?
Dillon Sharlet
@dsharletg
I think linear is probably a special case where writing code specifically for bilinear is going to be better
the approach used in the resize app makes more sense for bigger more expensive kernels
Abhishek G Saxena
@AbyShk95

I see... but actually the difference in my test was huge on android... Like 4x slower.
I actually tested the same for desktop (i9) too, where timings for the resize app was

planar    linear     uint8  0.50  time: 0.069222 ms
packed    linear     uint8  0.50  time: 0.092122 ms
Success!

and then I tried something on python simd libs, lycon library : https://github.com/ethereon/lycon and the time for doing the same resize was 0.035506 ms
Hence I was a bit curious whether its a schedule issue or is it expected...

Ryan Stout
@ryanstout
sorry if I asked this before, are there any plans to merge the gpu autoscheduler? Thanks!
Nikola Smiljanić
@popizdeh
Does Output<Buffer<>> type need to be handled at generation time? I need to generate for uint8_t, uint16_t and any other type I'd like to use?
Andrew Adams
@abadams
Yes, the type affects what instructions are selected, so it needs to be given at compile time as a generator param.
@AbyShk95 +1 to what Dillon said, but also: Are you making an image larger or smaller? It's very common for bilinear interpolation to be done incorrectly in a way that makes it much much faster when making images smaller.
Also, if it's bilinear resizing by a compile-time-known amount (e.g. 2x), you can do much much better than the resize app with custom code for it
The resize app is very generic. It works for any filter and any resize amount.
Nikola Smiljanić
@popizdeh
I'm having trouble with input buffer checking, is there a way to make the buffer "optional" or disable checking? To give you more info, I have Input<bool> lutEnabled and Input<Buffer<>> lut and I use f(x, y) = select(lutEnabled, lut(x, y), input(x, y));. My hope was that specializing for lutEnabled == false would produce code where I'm allowed to pass a nullptr for lut input buffer but that's not the case. Bounds checking is done as a top-level check which requires me to pass a valid buffer even when lutEnabled is false. Any ideas?
Ashish Uthama
@ashishUthama
Does Halide have a way to enforce at compile time that all buffers have a min dimension extent as 0? (I assume it makes no difference to generated code, but it would make generated stmt slightly easier to read),
Tzu-Mao Li
@BachiLi

sorry if I asked this before, are there any plans to merge the gpu autoscheduler? Thanks!

@TH3CHARLie and Luke Anderson are looking into this, but I think they are happy to receive help

Ryan Stout
@ryanstout
@BachiLi thanks for the info. I would offer to help if I could, but my C++ skills are pretty lacking, so I'm not sure I would be of any help. (I'm using the python bindings) Thanks
Xuanda Yang
@TH3CHARLie
will update the progress in halide/Halide#5602 once I got my machine
Dennis van den Berg
@dpvdberg
I was wondering whether it is possible to schedule my entire pipeline based on a specialization() call. My goal is to do the following: I have some parameters to my algorithm and for each parameter, I run the auto-scheduler to generate a schedule.h. I want to include each schedule and use each one according to the parameter value.
Dennis van den Berg
@dpvdberg
In fact, I want to just be able to select completely different schedules based on a parameter at runtime.
steven-johnson
@steven-johnson:matrix.org
[m]
Yes, you probably could use specialize() to accomplish that. But it might be simpler to manage if you just generate separate AOT-compiled filters (one per schedule) and move the selecton into ordinary C++ code at the call site.
Dennis van den Berg
@dpvdberg

I'm trying to schedule my program on the GPU, the halide profiler is telling me:

average threads used: 0.900000
heap allocations: 0  peak heap usage: 0 bytes
  halide_malloc:         0.000ms   (0%)    threads: 0.000
  halide_free:           0.000ms   (0%)    threads: 0.000
  endiannessSwapWordOut: 1.015ms   (100%)  threads: 0.900

This thread usage of 0.9 is worrying me. I checked and the debug info (after enabling) shows me
halide_opencl_run (user_context: 0x0, entry: _kernel_endiannessSwapWordOut_s0_wordIdx___block_id_x, blocks: 4313x1x1, threads: 4x1x1, ...
So it is running on multiple blocks and threads. What does this 0.9 thread utilization tell me?

Xuanda Yang
@TH3CHARLie
Does Halide support any Nvidia GPU with capability == 8.6 ? (e.g. RTX 30{70-90})
from CodeGen_PTX_Dev::mcpu() in CodeGen_PTX_Dev.cpp, it supports until sm80. Has anyone tried run GPU codegen on RTX cards?
Alex Reinking
@alexreinking:matrix.org
[m]
Not as far as I know, but I recently acquired a 3090, so hopefully that will change soon!
Volodymyr Kysenko
@vksnk:matrix.org
[m]
I might be wrong but I thought that when you compile for say sm80 this means that 8.0 is the minimum capability it is expected to run on, so it should work on everything after that (like 8.6)?
that being said, if you compile for later capability it might be able to optimize better