derek-gerstmann on vulkan-phase0-adts
Don't use strncpy for prepend s… (compare)
steven-johnson on autoscheduler-api
Oops (compare)
steven-johnson on autoscheduler-api
Rework autoschduler API (#6788) (compare)
steven-johnson on xtensa-codegen
Remove unused function in calla… Disable testing for apps/linear… Rearrange subdirectories in pyt… and 1 more (compare)
steven-johnson on pygen-three
Scrub Python from Makefile afte… Remove unused function in calla… Disable testing for apps/linear… and 3 more (compare)
steven-johnson on pytests
steven-johnson on main
Rearrange subdirectories in pyt… (compare)
Not as far as I know, but I recently acquired a 3090, so hopefully that will change soon!
halide/Halide#6334 changing now!
If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?
If for a Halide Generator class, when compiled, the "auto_schedule" argument is set to "false", is it possible for Halide engine to use any default parallelization/scheduling technique (e.g., vectorization, parallelization, tiling, loop reversal)? Or it is guaranteed that no scheduling primitives are going to be used?
No scheduling primitives will be used unless you specify them.
Even with target=x86-64-linux-disable_llvm_loop_opt, I notice xmm* registers being used in the output assembly file (fileName.s). Does that mean there is auto vectorization going on somewhere in generation pipeline?
No: Halide assumes that SSE2 is present for all x86-64 architectures, and uses the XMM register for scalar floating point operations
I have this Issue:
Condition failed: in.is_bounded()
Unbounded producer->consumer relationship: Vertices-> FaceNormal
When I try to read an array with a buffer of indices.
ref:
https://github.com/halide/Halide/issues/4108#issuecomment-956546487
halide/Halide#4108
I am working "serializing" a Python object with Expr members to a Halide Func. In the process, I end up having a function with a large number of explicit definitions in one dimension. Unfortunately, I am not able to make those be calculated in an efficient way - once for each value while and sharing potential pre-calcualted values. In particular, this code:
f = Func("f")
f[row, col] = 0.0
f[row, 0] = 1.0 + sqrt(row*row)
f[row, 1] = 2.0 + sqrt(row*row)
f[row, 2] = 3.0 + sqrt(row*row)
f[row, 3] = 4.0 + sqrt(row*row)
g = Func("g")
g[row, col] = f[row, col] + 42.0
g.compile_to_lowered_stmt("out.txt", [], StmtOutputFormat.Text)
print(np.asanyarray(g.realize(2, 4)))
Leads to the following generated code:
for (g.s0.col, g.min.1, g.extent.1) {
...
for (g.s0.row, g.min.0, g.extent.0) {
allocate f[float32 * 1 * (max(t6, 3) + 1)]
produce f {
f[t7] = 0.000000f
f[t8] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 1.000000f
f[t9] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 2.000000f
f[t10] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 3.000000f
f[t11] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 4.000000f
}
consume f {
g[g.s0.row + t12] = f[t7] + 42.000000f
}
Unfortunately, Halide does not notice that only one value of f
is needed, and calculates all of f
for each g
. I guess this is expected.
Calling f.compute_root()
helps reduce the number of calculations, but results in code with 4 four loops over row instead. This is problematic in my actual use-case, because it no longer automatically shares values that can be pre-calculated (such as the sqrt above).
Is there a way to get Halide to calculate f
for each explicitly set col
in one loop over row
?
upgrading from Halide 12 to Halide 14 (tip)
running into a lot of:
Unhandled exception: Error: Cannot split a loop variable resulting from a split using PredicateLoads or PredicateStores.
Right now, it looks like something related to tile() with tailstrategy omitted (i.e the default Auto) . Does this ring a bell? (will dig more in a bit)
backports/N.x
for staging changes to release/N.x
. When it's ready, I open a PR with release/N.x
as the target branch. There is CI set up for this scenario. Be sure to include a commit that bumps the version to 13.0.1
.
release/N.x
is (or ought to be) protected (like master
), so you can't push to it
master
.
backports/N.x
into release/N.x
. I think the cherry-picking history is valuable (as are any separate/additional patches necessary to correctly backport) as is keeping the version number bump separate.