zvookin on fixed_length_vectors
Silence "may be used uninitiali… Update WABT to 1.0.29 (#6748) Update hannk README link to hos… and 16 more (compare)
vksnk on xtensa-codegen
Fix fundamental confusion about… Fix annoying typo in Func.h (#6… Add execute_generator() API (#6… and 4 more (compare)
steven-johnson on deprecate-old-gp
Update Generator.cpp (compare)
steven-johnson on abstract-generator
Fix annoying typo in Func.h (#6… Add execute_generator() API (#6… Allow overriding of `Generator:… and 2 more (compare)
steven-johnson on deprecate-old-gp
Merge branch 'main' into srj/ge… Merge branch 'srj/gen-context' … (compare)
steven-johnson on gen-context
Remove `rounding_halving_sub` a… Augment Halide::Func to allow f… More typed-Func work (#6735) -… and 25 more (compare)
steven-johnson on deprecate-old-gp
Remove `rounding_halving_sub` a… Augment Halide::Func to allow f… More typed-Func work (#6735) -… and 26 more (compare)
I am working "serializing" a Python object with Expr members to a Halide Func. In the process, I end up having a function with a large number of explicit definitions in one dimension. Unfortunately, I am not able to make those be calculated in an efficient way - once for each value while and sharing potential pre-calcualted values. In particular, this code:
f = Func("f")
f[row, col] = 0.0
f[row, 0] = 1.0 + sqrt(row*row)
f[row, 1] = 2.0 + sqrt(row*row)
f[row, 2] = 3.0 + sqrt(row*row)
f[row, 3] = 4.0 + sqrt(row*row)
g = Func("g")
g[row, col] = f[row, col] + 42.0
g.compile_to_lowered_stmt("out.txt", [], StmtOutputFormat.Text)
print(np.asanyarray(g.realize(2, 4)))
Leads to the following generated code:
for (g.s0.col, g.min.1, g.extent.1) {
...
for (g.s0.row, g.min.0, g.extent.0) {
allocate f[float32 * 1 * (max(t6, 3) + 1)]
produce f {
f[t7] = 0.000000f
f[t8] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 1.000000f
f[t9] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 2.000000f
f[t10] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 3.000000f
f[t11] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 4.000000f
}
consume f {
g[g.s0.row + t12] = f[t7] + 42.000000f
}
Unfortunately, Halide does not notice that only one value of f
is needed, and calculates all of f
for each g
. I guess this is expected.
Calling f.compute_root()
helps reduce the number of calculations, but results in code with 4 four loops over row instead. This is problematic in my actual use-case, because it no longer automatically shares values that can be pre-calculated (such as the sqrt above).
Is there a way to get Halide to calculate f
for each explicitly set col
in one loop over row
?
upgrading from Halide 12 to Halide 14 (tip)
running into a lot of:
Unhandled exception: Error: Cannot split a loop variable resulting from a split using PredicateLoads or PredicateStores.
Right now, it looks like something related to tile() with tailstrategy omitted (i.e the default Auto) . Does this ring a bell? (will dig more in a bit)
backports/N.x
for staging changes to release/N.x
. When it's ready, I open a PR with release/N.x
as the target branch. There is CI set up for this scenario. Be sure to include a commit that bumps the version to 13.0.1
.
release/N.x
is (or ought to be) protected (like master
), so you can't push to it
master
.
backports/N.x
into release/N.x
. I think the cherry-picking history is valuable (as are any separate/additional patches necessary to correctly backport) as is keeping the version number bump separate.
share/doc/Halide
folder (at least on Linux), along with the other READMEs.
Loop over output.s0.x has extent output.extent.0. Can only vectorize loops over a constant extent > 1
. Let's say we're dealing with floats and SSE, I don't get why the loop over x
can't simply loop over 1/4 of the extent and process 4 float values at the time (let's ignore the case where extent is not divisible by 4). Do I need to split x
into constant size chunks in order to get vectorization working?
$ objdump -t build/host/ResizeNearestNeighbor.a | grep halide_
00000000 l df *ABS* 00000000 halide_buffer_t.cpp
00000000 *UND* 00000000 halide_error
00000000 *UND* 00000000 halide_msan_annotate_memory_is_initialized
00000000 w F .text.halide_qurt_hvx_lock 000000b0 halide_qurt_hvx_lock
00000000 w F .text.halide_qurt_hvx_unlock 000000ac halide_qurt_hvx_unlock
00000000 w F .text.halide_qurt_hvx_unlock_as_destructor 00000008 halide_qurt_hvx_unl
ock_as_destructor
00000000 *UND* 00000000 halide_string_to_string
00000000 w F .text.halide_vtcm_free 00000008 halide_vtcm_free
00000000 w F .text.halide_vtcm_malloc 0000000c halide_vtcm_malloc