Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 16:25
    benoitsteiner synchronize #4462
  • 16:06
    benoitsteiner synchronize #4462
  • 00:01
    pranavb-ca review_requested #4470
  • 00:01
    pranavb-ca review_requested #4470
  • 00:01
    pranavb-ca review_requested #4470
  • 00:01
    pranavb-ca opened #4470
  • 00:00

    pranavb-ca on fix_hvx_intrinsics

    Merge branch 'master' of https:… Merge branch 'master' of https:… Fix access to Hexagon intrinsic… (compare)

  • Dec 13 22:15
    abadams commented #4469
  • Dec 13 22:07
    vksnk opened #4469
  • Dec 13 17:55

    steven-johnson on 4462

    (compare)

  • Dec 13 17:54
    steven-johnson closed #4463
  • Dec 12 17:13
    benoitsteiner commented #4462
  • Dec 12 00:00

    abadams on apps_from_autoscheduler

    Add BGU implementation Add histogram equalization Add max filter and 4 more (compare)

  • Dec 11 22:44

    abadams on define_div_by_zero

    Calculate Expr bounds using fun… Added JIT-test and removed appl… Merge branch 'master' of https:… and 65 more (compare)

  • Dec 11 22:44
    abadams synchronize #4439
  • Dec 11 18:41

    vksnk on increase-device-num

    (compare)

  • Dec 11 18:40

    vksnk on pos_inf-memory-assert

    (compare)

  • Dec 11 18:40

    vksnk on master

    Check if shared memory allocati… Use has_upper_bound() to check … Merge pull request #4467 from h… (compare)

  • Dec 11 18:40
    vksnk closed #4467
  • Dec 11 18:09
    steven-johnson commented #4439
Mike Woodworth
@mikewoodworth
@shoaibkamil sorry just saw your earlier post. glad you got it sorted, for reference, we're building with Xcode 10.3
Andrew Adams
@abadams
Something seems to have gone wonky with the build master's https certs: https://buildbot.halide-lang.org/master/#builders/11/builds/52
Steven Johnson
@steven-johnson
yay, on it
done
gilbo
@gilbo
@abadams I'm using the halide_buffer_t C struct to describe I/O buffers, using halide_dimension_t within that. If I give it strides such that the left-most index has the largest stride, and schedule using the autoscheduler, then I get errors. I don't want to change anything about how the algorithm code is written, just make it work with a particular pattern of stride values.
Andrew Adams
@abadams
There's a default assumption that the leftmost index has stride 1, because otherwise we could never vectorize anything densely. You can disable this on an input/output buffer on the Halide side with my_buf.dim(0).set_stride(Expr())
You probably then want to do my_buf.dim(2).set_stride(1) or some such
I'm not entirely confident the autoscheduler will handle this gracefully
gilbo
@gilbo
No sweat; I'm just using Fortran ordering in all my test code for now to avoid this issue
I suppose I could flip all the orders (left-right) when lowering to Halide and get everything to work out that way... Might be worth it if I want to package for release or something
Alexander Root
@rootjalex
Is there anyone familiar with the class Definition/struct DefinitionContents code, that can explain what the predicate is, and/or what the values vector contains?
(for context, writing an IRMutator that seems to be running into trouble on the mutate calls to DefinitionContents)
Jonathan Ragan-Kelley
@jrk
aankit-ca
@aankit-ca
Is there a more recent version of random pipeline generator than the one in standalone_autoscheduler branch?
Andrew Adams
@abadams
no
In the category of "thanks I hate it"
Zalman Stern
@zvookin
Clearly we need a parameterized constexpr operator that takes an integer and if the integer is even, the type is const and if it is odd, the type is not const. Much cleaner than adding ununconstexpr to C++ 23.
Steven Johnson
@steven-johnson
I, what
Ryan Stout
@ryanstout
Is there a way to lazily load images in a pipeline. I'm implementing a few types of image stacking on a device that can't hold all of the photos in ram at once. Currently I've got a version where I run photo A+B, then the output of that with C, etc.. But I'm thinking if there was a way to build it into a single pipeline, but not load C, etc.. until its needed, that would probably be faster. (and cleaner)
any suggestions/tips?
Zalman Stern
@zvookin
define_extern is what you want.
Extern stage that loads what you need on demand
Ryan Stout
@ryanstout
@zvookin thanks. Unfortunately, I'm using the python bindings, which I read doesn't support define_extern
Zalman Stern
@zvookin
Do they not support define_extern or do they not support writing the extern in Python?
I'm not expert on the Python realm, but I'd say if you can't use extern stages then this is not likely solvable in Halide.
Ryan Stout
@ryanstout
ok. I could port it if I had to, but was hoping there was a way to keep most of it in python. https://github.com/halide/Halide/tree/master/python_bindings says "No mechanism is provided for supporting Func::define_extern", but there is a few .def("define_extern" in PyFunc.cpp
Zalman Stern
@zvookin
What you want is to call out to the extern stage that is written in C or C++. The stage is small and basically just a function that loads an image, or part of an image, based on the arguments.
Everything else should say the same in Python
Ryan Stout
@ryanstout
ok
@zvookin thanks for the help
aavbsouza
@aavbsouza
Hello. Some classes has a default template parameter of 4 for maximum number of variables. Would be feasible to work with Halide with a greater number of variables (5, 6) as in 3D convolution or locally connected layers ? thanks
Jonathan Ragan-Kelley
@jrk
Ashish Uthama
@ashishUthama
@aavbsouza - did you mean dimension? (in https://halide-lang.org/docs/class_halide_1_1_runtime_1_1_buffer.html) - then yes.
Dongran Liu
@dzzhdzzh
Hey I am very new to Halide and have a GPU question about it. How does it handle the boundary case if my kernel requests to access both left and right value of some intermediate Halide Func?
Does it use the closest valid value? or it just return zero if it out of boundary? I print out the intermediate results and seems like sometimes it does the first and sometimes it does the second.
Thank you!
Andrew Adams
@abadams
@aavbsouza That default value of 4 isn't even a real limit. That's just the amount of space reserved inside the buffer class itself for the purpose of representing dimension metadata. If there are more than four dimensions it just allocates heap space instead to store the shape and strides. If you have more than four dimensions it's slightly more efficient to increase four to something larger, but probably not in any noticeable way.
@dzzhdzzh Halide only inserts boundary conditions if you explicit ask for one using BoundaryConditions::repeat_edge or similar
If you don't, then the intermediate stage is just computed over a larger area to guarantee there are no out-of-bounds accesses.
Dongran Liu
@dzzhdzzh
@abadams So you are saying that the boundary value could be some garbage if I don't explicitly call BoundaryConditions? I only call BoundaryConditions to my input image, but after then, I don't call it on any intermediate image I created.
Seems like it matches what I observed, the input processing result is good, I always get deterministic result.
Andrew Adams
@abadams
Nope, it should never be garbage
Say you have a 10x10 input, then a boundary condition, then a 3x3 blur, then another 3x3 blur
if you evaluate the second blur over 10x10
then Halide will evaluate the first blur over a 12x12 box
which accesses a 14x14 box of the input
which gets clamped by the boundary condition to 10x10
So the bounds computed just expand as you go back up the pipeline until you hit the boundary condition
but garbage values are never computed
If you don't add a boundary condition to the input, then Halide will throw an out of bounds error and refuse to run the pipeline