steven-johnson on jit-calls
Update PyCallable.cpp (compare)
steven-johnson on jit-calls
Add Python support for `compile… (compare)
zvookin on target_vector_bits
Formatting. (compare)
abadams on main
LLVM codegen: register AA pipel… (compare)
steven-johnson on exec-gen-2
steven-johnson on main
Move some options from execute_… (compare)
I am working "serializing" a Python object with Expr members to a Halide Func. In the process, I end up having a function with a large number of explicit definitions in one dimension. Unfortunately, I am not able to make those be calculated in an efficient way - once for each value while and sharing potential pre-calcualted values. In particular, this code:
f = Func("f")
f[row, col] = 0.0
f[row, 0] = 1.0 + sqrt(row*row)
f[row, 1] = 2.0 + sqrt(row*row)
f[row, 2] = 3.0 + sqrt(row*row)
f[row, 3] = 4.0 + sqrt(row*row)
g = Func("g")
g[row, col] = f[row, col] + 42.0
g.compile_to_lowered_stmt("out.txt", [], StmtOutputFormat.Text)
print(np.asanyarray(g.realize(2, 4)))
Leads to the following generated code:
for (g.s0.col, g.min.1, g.extent.1) {
...
for (g.s0.row, g.min.0, g.extent.0) {
allocate f[float32 * 1 * (max(t6, 3) + 1)]
produce f {
f[t7] = 0.000000f
f[t8] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 1.000000f
f[t9] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 2.000000f
f[t10] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 3.000000f
f[t11] = (float32)sqrt_f32(float32((g.s0.row*g.s0.row))) + 4.000000f
}
consume f {
g[g.s0.row + t12] = f[t7] + 42.000000f
}
Unfortunately, Halide does not notice that only one value of f
is needed, and calculates all of f
for each g
. I guess this is expected.
Calling f.compute_root()
helps reduce the number of calculations, but results in code with 4 four loops over row instead. This is problematic in my actual use-case, because it no longer automatically shares values that can be pre-calculated (such as the sqrt above).
Is there a way to get Halide to calculate f
for each explicitly set col
in one loop over row
?
upgrading from Halide 12 to Halide 14 (tip)
running into a lot of:
Unhandled exception: Error: Cannot split a loop variable resulting from a split using PredicateLoads or PredicateStores.
Right now, it looks like something related to tile() with tailstrategy omitted (i.e the default Auto) . Does this ring a bell? (will dig more in a bit)
backports/N.x
for staging changes to release/N.x
. When it's ready, I open a PR with release/N.x
as the target branch. There is CI set up for this scenario. Be sure to include a commit that bumps the version to 13.0.1
.
master
.
backports/N.x
into release/N.x
. I think the cherry-picking history is valuable (as are any separate/additional patches necessary to correctly backport) as is keeping the version number bump separate.
share/doc/Halide
folder (at least on Linux), along with the other READMEs.
Loop over output.s0.x has extent output.extent.0. Can only vectorize loops over a constant extent > 1
. Let's say we're dealing with floats and SSE, I don't get why the loop over x
can't simply loop over 1/4 of the extent and process 4 float values at the time (let's ignore the case where extent is not divisible by 4). Do I need to split x
into constant size chunks in order to get vectorization working?