Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 02:10

    abadams on master

    Move most Interval method bodie… Re-inline the interval construc… Explain Interval::operator== and 1 more (compare)

  • 02:10
    abadams closed #4830
  • Apr 04 22:12
    abadams synchronize #4832
  • Apr 04 22:12

    abadams on push_rval

    Avoid std::swap for msvc (compare)

  • Apr 04 21:59
    abadams commented #4830
  • Apr 04 20:57
    steven-johnson commented #4830
  • Apr 04 20:55
    steven-johnson commented #4832
  • Apr 04 20:53
    steven-johnson commented #4830
  • Apr 04 18:26
    abadams synchronize #4832
  • Apr 04 18:26

    abadams on push_rval

    Not sure why this was heap allo… (compare)

  • Apr 04 18:22
    abadams opened #4832
  • Apr 04 18:20

    abadams on push_rval

    Reduce number of copies in Scop… (compare)

  • Apr 04 18:19

    abadams on push_rval

    (compare)

  • Apr 04 17:43
    abadams synchronize #4830
  • Apr 04 17:43

    abadams on srj-interval

    Re-inline the interval construc… Explain Interval::operator== (compare)

  • Apr 04 17:40
    abadams commented #4830
  • Apr 04 17:18

    abadams on master

    Remove unnecessary includes of … Merge pull request #4826 from h… (compare)

  • Apr 04 17:18
    abadams closed #4826
  • Apr 04 17:17

    abadams on master

    Minor size optimization in Boun… Bounds.h doesn't need to includ… Merge pull request #4829 from h… and 1 more (compare)

  • Apr 04 17:17
    abadams closed #4828
Shoaib Kamil
@shoaibkamil
Steven Johnson
@steven-johnson
pre-built binaries not ready yet though :-/
Steven Johnson
@steven-johnson
update: there are now prebuilts for several archs (including SPARC Solaris)... but none for x86 Linux? interesting
Shoaib Kamil
@shoaibkamil
Do the python bindings use "&" for both bitwise-and and boolean-and?
Steven Johnson
@steven-johnson
unfortunately yes
there isn't a way to overload and or or in python
Shoaib Kamil
@shoaibkamil
Thanks, that's what I thought-- no easy way to overload
Yeah
Thanks for the quick answer
Jonathan Ragan-Kelley
@jrk
thalesvp
@thalesvp
Hi, quick question. I am looking at the tutorial (lesson 21) and it prints the execution time of the manual schedule and the auto schedule. But where can I find the copy-pasteable schedule?
Infinoid
@Infinoid
@thalesvp If you want to see the schedule itself, that's one of the things you can tell the generator to generate.
./lesson_21_generate -o . -f conv_layer -e static_library,schedule target=host auto_schedule=true
That should generate a file called conv_layer.schedule.h, so you can see the schedule it decided on.
thalesvp
@thalesvp
alright, tnx!
Alexander Bonin
@alex4o
Hello, i have the same problem as the guy from SO, my actual computation on the gpu takes 4ms, however copying the memory is inefficient and happens at every stage?
Infinoid
@Infinoid
The SO question mentions transfer timings but not buffer sizes. I wonder how big the data is
Infinoid
@Infinoid
What's the python equivalent of Halide::Input<Buffer<double>> thing{"thing", 2}? thing = hl.ImageParam(hl.Float(64), 2)?
(even if it's not an image)
Steven Johnson
@steven-johnson
That looks right
Infinoid
@Infinoid
I am trying to run python_bindings/apps/bilateral_grid.py on a GPU. Do I need to do anything special to use cuda or opencl with the python bindings?
I tried setting HL_TARGET=host-cuda-cuda_capability_61, which works for C++ apps. But the python version says:
RuntimeError: Error: Schedule for Func bilateral_grid requires <Default_GPU> but no compatible target feature is enabled in target x86-64-linux-avx-avx2-f16c-fma-jit-sse41-user_context
I get the same error when I set HL_TARGET=host-opencl (this also works on the C++ side). And also the same error if I add .with_feature(hl.TargetFeature.CUDA) to the end of the target object creation line.
Tzu-Mao Li
@BachiLi
you want to set HL_JIT_TARGET
with_feature should also work, depending on how you use it
Infinoid
@Infinoid
With HL_JIT_TARGET, it's as if I didn't set anything. target.has_gpu_feature() returns false, so the app decides to use a CPU schedule.
Oh, it works when I set both HL_TARGET and HL_JIT_TARGET. Strange that I should have to set both
Thanks for the tip!
.with_feature doesn't work on its own though.
Tzu-Mao Li
@BachiLi
something is wrong, i don't know what happened
Steven Johnson
@steven-johnson
This sounds like a bug in our python, please file an issue.
Infinoid
@Infinoid
I've only started playing with this recently. What is the expected behavior?
Steven Johnson
@steven-johnson
It sounds like it's reading HL_TARGET when it should be reading HL_JIT_TARGET
Infinoid
@Infinoid
Ok, I'll open an issue for it
Infinoid
@Infinoid
I dug further, and I think the python bindings are fine; the bilateral_grid.py app just has some bugs. It should be calling hl.get_jit_target_from_environment, not hl.get_target_from_environment.
I have a patch that gets it running, but in GPU mode the output it generates is wrong. Looks rotated and interleaved
Infinoid
@Infinoid
PR #4805 is what I have so far.
Steven Johnson
@steven-johnson
So after another day of experimenting, I'm not sure GHA can scale to do the build+test that we need at present -- at least not unless they add a bit more capability to their API. I'm going to put it down for a bit and ponder other things.

The main issue is that there doesn't seem to be any way to throttle actions, unfortunately. (Either to reliably and quietly cancel running tasks, or to reliably defer a task for possible future execution.)

I've come up with at least one thought experiment that maybe could work but would be a rube goldberg construction of heinous awfulness, so, no.

Alexander Fröber
@alexanderfroeber
Hi, in my project I implemented a C++ function (one input buffer, one output buffer) doing some stuff on the GPU and I used Func::define_extern to call the function from Halide. Inside the C++ function (at the end) I copy the result to the device memory of the output buffer. This works fine, if the function is not the last step of the pipeline but it does not if the function is the last step of the pipeline. Then Halide seems to expect the result in the host memory of the output buffer. The problem is that the C++ function does not know that. Can anyone tell me the rules for which memory of the output buffer to write to? Is it (1) alway write to device except host memory is defined or (2) is it my choice and I have to signal this by setting the dirty flag (but it seems to me that the flags are set (overwritten) by Halide when the C++ function has returned)?
Alexander Fröber
@alexanderfroeber
... or (3) something else?
Zalman Stern
@zvookin
@alexanderfroeber Are you setting the device_api argument to define_extern to the GPU API that you are using?
Alexander Fröber
@alexanderfroeber
I set the device_api parameter to Halide::DeviceAPI::CUDA.
zendevil
@zendevil
Hi, are there any efficient open-source background removal pipelines written in halide?
zendevil
@zendevil
Like maybe some way to do image segmentation using colors.
like, is there an opencv analogue of cv2.inRange(hsv, lower_hsv, higher_hsv)?
Infinoid
@Infinoid
Is it possible to compile an object with multiple outputs using Func.compile_to()? I tried including an output Func in the arguments list, but it doesn't like that.
I don't think the python bindings have Generator classes or explicit Input/Output members yet, which is how I would have done it in C++. I'm trying to use Func.compile_to() because that's what I see in the tutorials.
Volodymyr Kysenko
@vksnk
@Infinoid I've never tried it myself, but maybe wrapping multiple functions into Pipeline and then calling compile_to on it may work?
Infinoid
@Infinoid
@vksnk I'll take a look, thanks. I honestly don't know much about Pipelines, time to learn
@vksnk Nice, that built. Now I just have to figure out how to call it
Volodymyr Kysenko
@vksnk
In c++, I think, you could do something like:
Func f, g;
f(x)=x;
g(x)=2*x;
Pipeline p({f, g});
p.compile_to(...)
I guess it should be similar in python
Infinoid
@Infinoid
Yeah, that was perfect. hl.Pipeline([out1, out2]).compile_to(...)
Volodymyr Kysenko
@vksnk
great!
it depends on what you need to generate, but you can generate a header file along with your .o and it will have function declaration