Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 10:16
    rootjalex commented #5491
  • 10:07
    rootjalex commented #5491
  • 10:04
    rootjalex commented #5491
  • 00:53
    abadams commented #5491
  • 00:38
    dsharletg opened #5491
  • 00:37

    dsharletg on simplify-shift

    Simplify after negating b. (compare)

  • 00:24

    dsharletg on simplify-shift

    Simplify signed shifts more str… (compare)

  • 00:22
    dsharletg opened #5490
  • 00:22

    dsharletg on refactor-negate2

    Add comment (compare)

  • 00:20

    dsharletg on refactor-negate2

    (compare)

  • 00:20

    dsharletg on simplify-interleave-slice

    Add test of slices of interleav… (compare)

  • Nov 29 22:10

    dsharletg on find-vector-reduce

    Add pass to find vector reducti… Merge branch 'master' of https:… (compare)

  • Nov 29 22:08
    abadams review_requested #5489
  • Nov 29 22:07
    abadams opened #5489
  • Nov 29 22:07

    abadams on align_strided_const_loads

    Align the base when doing strid… (compare)

  • Nov 28 19:32
    dsharletg edited #5488
  • Nov 28 19:31
    dsharletg opened #5488
  • Nov 28 19:07
    dsharletg synchronize #5479
  • Nov 28 19:07

    dsharletg on small-fixes

    Smaller code. Work around test issue. (compare)

  • Nov 28 03:03

    dsharletg on refactor-negate

    Replace is_negative_negatable_c… Don't assume an interleave cons… (compare)

Zalman Stern
@zvookin
add_executable("${NAME}.generator" "${NAME}_generator.cpp") target_link_libraries("${NAME}.generator" PRIVATE Halide::Generator ${args_GEN_DEPS})
This means the test has to have a generator names the same as the test, which is a good practice, but I decided not to follow it in this case. If there is a strict reason, I can change the file name, but this seems unexpected.
Steven Johnson
@steven-johnson
I think it is purely one of Enforcing Best Practice.
Too Strongly, in this case
Zalman Stern
@zvookin
I'll just rename it
Steven Johnson
@steven-johnson
(we can definitely soften this if it's an issue, of course)
Rasim Akhunzyanov
@brotherofken
Hi.
Is it somehow possible to implement box blur in linear time using the sliding accumulator approach in Halide?
I mean the algorithm where you update accumulator by subtracting/adding elements on the left/right of the current position, so it becomes independent of the kernel size. I spent some time thinking (and googling too) about, but can't come up with a solution.
aavbsouza
@aavbsouza
@brotherofken I believe you are looking for that implementation: https://github.com/halide/Halide/blob/master/apps/iir_blur/iir_blur_generator.cpp
the recursive implementation
Rasim Akhunzyanov
@brotherofken
@aavbsouza Thanks for a quick response! IIR filter is a good alternative, but a bit different thing.
To be more specific I'm looking for a way to implement 'Algorithm 4' from [1]. [1] approximates the Gaussian filter by applying the box filter multiple times.
Also, I'm not sure, but IIR requires four passes over an image and might be slightly slower. Isn't it?
[1] http://blog.ivank.net/fastest-gaussian-blur.html
aavbsouza
@aavbsouza
@brotherofken Its not the same, but the implementation appears to be similar with a different recursion. Maybe this link would be useful to you (https://github.com/dhale/jtk/blob/master/core/src/main/java/edu/mines/jtk/dsp/RecursiveGaussianFilter.java)
Rasim Akhunzyanov
@brotherofken
@aavbsouza I got the point, it's interesting. Thank you!
Andrew Adams
@abadams
In my experience the fastest Gaussian blur is the IIR from Young and Van Vliet
Last time I tried it was better than iterating a box.
Only really works cleanly with floats though.
For ints I think the fastest was convolving with the third derivative of a piecewise quadratic approximation to the Gaussian (the filter only has four taps spaced by zeros: something like [1 0 ... 0 -3 0 ... 0 3 0 ... 0 -1]), and then integrating three times. This is just a mathematical rearrangement of doing three passes of a sliding window box blur.
Neither approach parallelizes particularly well though. Lately I've just been using pyramids because it tiles so cleanly.
aavbsouza
@aavbsouza
Hello, on most examples using the Python bindings the numpy arrays are created using the Fortran order. Would be possible to manually change the strides of the halide buffers instead? thanks
Zalman Stern
@zvookin
I believe the answer to the above Python question is "yes"
I.e. one can do that in the schedule
aavbsouza
@aavbsouza
@zvookin . I could change using the dim().set_stride of a ImageParam. However after these changes the fastest axis for the variables is also automatically changed?
Svenn-Arne Dragly
@dragly
Does Halide support partial copying of buffers from host to device? For instance if images (2D buffers) are available one at the time and should be copied to the device into a stack of images (3D buffer). Is there a way to efficiently copy one image at the time onto the GPU using the Halide::Runtime::Buffer interface?
Lev Yudalevich
@lyudalev
Hi,
Does it make any sense to do floodfill in Halide? If yes, how can such an algorithm be approached?
Svenn-Arne Dragly
@dragly

@lyudalev I looked a bit at flood fill myself when investigating dynamic programming in Halide. It seems like Halide is not a good match for defining such algorithms currently, but Halide can still be used to call an extern C function if you need this as part of a bigger pipeline.

That being said, there are some clever GPU-friendly ways to do algorithms like flood fill. You might be able to find such an implementation and translate it to Halide. Basically, the constraint that usually becomes a problem is that you cannot update any Func after using the result of it in another Func. If you manage to express an iterative algorithm as an RDom in single Func, you are usually good to go.

Andrew Adams
@abadams
@dragly it should work to crop the source and destination buffers (which just makes cropped views) and then do a buffer copy between them.
Tzu-Mao Li
@BachiLi
The following program outputs 0 for both the min and max of the input bounds. Is this a bug?
#include <Halide.h>
using namespace Halide;
int main(int argc, char *argv[]) {
    ImageParam in(Int(32), 1);
    Var x;
    Func f;
    f(x) = in(min(x + 1, in.width() - 1));
    f.infer_input_bounds({42});
    Buffer<int> in_buf = in.get();
    std::cerr << "in_buf.dim(0).min():" << in_buf.dim(0).min() << std::endl; // returns 0, expects 1
    std::cerr << "in_buf.dim(0).max():" << in_buf.dim(0).max() << std::endl; // returns 0, expects 41
}
Tzu-Mao Li
@BachiLi
It seems that infer_input_bounds uses an initial shape of size 0 for the bounds inference, so in.width() will return 0 during the bounds inference
but really what it should do is to return a symbolic value
Tzu-Mao Li
@BachiLi
opened an issue halide/Halide#5481
Abhishek G Saxena
@AbyShk95
Hi there...
I tried to use .hexagon() directive in the schedule of https://github.com/halide/Halide/blob/master/apps/resize/resize_generator.cpp
I get an internal error: Unsupported HVX type: float 32x4
Do I need to modify the schedule/ algorithm to make it work for hexagon?
Andrew Adams
@abadams
Yeah, hexagon doesn't have floating point support
You'd have to make the whole algorithm fixed-point instead
Zalman Stern
@zvookin
IT does, but not vectors
Andrew Adams
@abadams
oh, I didn't realize it had scalar floats
Zalman Stern
@zvookin
I thought so
might be wrong
Division used to work that way
Andrew Adams
@abadams
For that algorithm the floating point work could be fairly minimal. Compute the kernels in float, then quantize them, outside the loop over pixels.
Zalman Stern
@zvookin
I think that is either fixed or shortly will be
If compiling in the standard ARM+HExagon, the float work can be done on ARM as well
Abhishek G Saxena
@AbyShk95
alright! thank you so much! this helps!
Svenn-Arne Dragly
@dragly
@abadams thanks for the tip on using cropped views to do partial copying! I will try that.
@abadams when you say "source and destination buffers", it sounds like you are referring to a specific function. Is that for instance copy_from? And if so, how does Halide handle the host/device buffer separation. I have struggled a bit to find this out from the documentation. If I for instance copy from buffer A to buffer B, is there a way to explicitly tell Halide that A should be on the host only and B on the device only?
Andrew Adams
@abadams
Hrm, actually it looks like the function I was thinking of isn't exposed in Halide::Buffer. I was thinking of halide_buffer_copy in HalideRuntime.h. It supports the src and dst buffers being on different devices. That combined with halide_device_crop to get a reference to just one part of a device buffer.
Svenn-Arne Dragly
@dragly
I see. We might still be able to use that. I will look into it!
On another topic: Should it be possible to create a Runtime::Buffer(nullptr, {width, height}), then call device_wrap_native on it with the cl_mem returned by clCreateImage and then pass it as the output buffer to an AOT function? I get a halide_error_code_buffer_is_null, but that seems to be an indication of the Buffer itself being a nullptr. (Or perhaps that is what it is, since the Runtime::Buffer is some kind of thin wrapper?)
bishop77
@bishop7712_twitter
Dear Halide experts, would you please be so kind and have a look at a question I posted on SO, https://stackoverflow.com/questions/65041537/serial-data-update-with-halide ? I'm new to Halide and want to know if it's worth spending the effort. Many thanks.
Alex Reinking
@alexreinking
That example is a bit abstract. You could model f as an external func. The outermost loops (except the while-true) are easy in Halide. The i loop would be an RDom while the others would be pure.
You would probably prefer to move more of f's implementation into Halide, though