zvookin on target_vector_bits
More formatting. (compare)
zvookin on target_vector_bits
Code formatting fix. (compare)
zvookin on target_vector_bits
Move vector_bits_* Target suppo… (compare)
abadams on main
[miscompile] Don't de-negate an… (compare)
Hi there. I would like to create a basic inheritance structure based on Halide::Generator, so as to avoid duplicated code.
The idea is that the base generator class should have a (virtual) function that is to be overridden by derived classes. Moreover, each derived class should have a specific input parameter, not available in the base class.
In normal C++ this is quite straightforward. My current Halide implementation is all in a single file, having the following classes:
class Base : public Halide::Generator<Base> {
public:
Input<Buffer<float>> input{"input", 2};
Output<Buffer<float>> output{"brighter", 2};
Var x, y;
virtual Func process(Func input);
virtual void generate() {
output = process(input);
output.vectorize(x, 16).parallel(y);
}
};
class DerivedGain : public Base {
public:
Input<float> gain{"gain"};
Func process (Func input) override{
Func result("result");
result(x,y) = input(x,y) * gain;
return result;
}
};
class DerivedOffset : public Base{
public:
Input<float> offset{"offset"};
Func process (Func input) override{
Func result("result");
result(x,y) = input(x,y) + offset;
return result;
}
};
Finally, I only register the two derived classes, since I have no direct interest in the Base one:
HALIDE_REGISTER_GENERATOR(DerivedGain, derived_gain)
HALIDE_REGISTER_GENERATOR(DerivedOffset, derived_offset)
But during compilation, it launches an error suggesting that class Base
was being instantiated (which I do not need to happen) :
in function Base::Base():
my_generators.cpp:(.text._ZN4BaseC2Ev[_ZN4BaseC5Ev]+0x2f): undefined reference to vtable for Base
If instead of using a virtual
function I implement it in the Base class like so:
class Base : public Halide::Generator<Base> {
public:
Input<Buffer<float>> input{"input", 2};
Output<Buffer<float>> output{"brighter", 2};
Var x, y;
// Func process(Func input);
Func process (Func input){
Func result("result");
result(x,y) = input(x,y);
return result;
}
virtual void generate() {
output = process(input);
output.vectorize(x, 16).parallel(y);
}
};
Then everything compiles, but the object and header files with the generated code have the wrong function signatures (noticeable as there are missing gain/offset parameters):
derived_gain.h:
int derived_gain(struct halide_buffer_t *_input_buffer, struct halide_buffer_t *_result_buffer);
derived_offset.h:
int derived_offset(struct halide_buffer_t *_input_buffer, struct halide_buffer_t *_result_buffer);
Therefore, I would like to know which mistake I am introducing in the class definitions and how to solve it.
@popizdeh:
I was just calling vectorize(x) and getting the error. What I find confusing is that your example works, like how is this split factor useful? Documentation says "Split a dimension by the given factor, then vectorize the inner dimension.", but say I split x by a factor of 4, it still needs to know what the width of the image is to do the split, so why doesn't vectorize(x) work like a split with factor 1? It knows the image width! Why does it throw the error?
A split with factor 1 would result in the inner loop being a single iteration; it doesn't really make sense to vectorize a single iteration. On the other hand, splitting with a factor >1 that is static allows vectorization. Note that from the perspective of Halide, we generate code that vectorizes by the width of this inner loop-- LLVM then lowers that to native instructions that are of native length. In addition, the code Halide generates doesn't statically contain the image width, but rather the image width is a parameter (contained in the Buffer
struct).
If you look at the documentation for split()
(https://github.com/halide/Halide/blob/a89041b9563352edfb5e6c8ce4a1de4c490b751f/src/Func.h#L1425) it says
Split a dimension into inner and outer subdimensions with the given names, where the inner dimension iterates from 0 to factor-1
The vectorize()
overload with an integer factor's documentation, in that context, is saying that vectorize(x, n)
is equivalent to split(x, x_outer, x_inner, 4).vectorize(x_inner)
split()
is incorrect. The documentation of split()
that I linked above says the factor is the number of inner iterations, not outer iterations.
compute_at
/store_at
block sizes, as well as vectorization and unrolling, the size of the block is what you care about.)
libhexagon_remote_skel.so
and such in the src
directory but not under build
, and when I modify some code (like setting halide_print
to segfault intentionally in the qurt and sim source) I don't see the modification either in the bindump or the side effects during the simulation run. All I really want is to get halide_print
to utilize the Hexagon FARF
so I can get some debug output from Halide when I run it for Hexagon simulator.
run_main_on_hexagon
just plows ahead like it never even noticed they were gone (the output log even shows it dynamically loading some of these libs)
libhexagon_remote_skel.so
and friends are checked-in in binary form.
run_main_on_hexagon
is still only in the Hexagon SDK, though, I'd imagine. Any idea why the changes I make in the Halide source aren't appearing in the simulator? It looks like libhexagon_remote_skel.so
is the only library I need a copy of, but there are several others (hexagon_sim_remote
, libsimqurt.a
) that are built in Halide's src/runtime/hexagon_remote
dir but whose absence doesn't trigger any linker errors
halide_malloc
in binary files, nothing that I've built in Halide matches. But the Halide_TOOLS
inside the Hexagon SDK does match. It looks like this symbol isn't being included in my build. My pre-build configuration command is cmake -B ./build/ -S . -G Ninja -DCMAKE_BUILD_TYPE=Release -DTARGET_WEBASSEMBLY=OFF -DWITH_TESTS=OFF -DWITH_TUTORIALS=OFF -DWITH_UTILS=OFF -DWITH_PYTHON_BINDINGS=OFF
factor
I read this as "my dimension is X, so width, split by factor of 4 is then width / 4", I'm not sure how to read it to get the behaviour you're explaining. If on the other hand parameter was called `num_elements_in_inner_dimension" there would be little room for confusion! I'm not proposing this name, but trying to illustrate how more explicit name (or more text in the docs) could clear up any confusion once and for all.
Halide::Runtime::Buffer
that is only allocated on device? I have a few AOT compiled functions that will run on device only and it would be nice to not allocate the input and output of these on host.
Halide is an open-source domain-specific language which iterates over up to four dimensions to apply computations.
There is no such restriction...
I see these in the stmt files:
let t7162 = ((t7077 - input.min.0) + t7170)
So I figured, adding something like this:
pipeline.add_requirement(input.dim(0).min()==0, "Min should be 0");
would simplify the stmt and hopefully make things faster.
While it did simplify the stmt, the performance was significantly slower. Any thoughts on why just adding that requirement that all buffers have 0 min might impact performance negatively?
Does someone knows if there is a way to map the Func fields in a Halide::Generator class to their corresponding Func’s in a compiled pipeline (seen in the dumped *schedule.h file dumped by the 2019 auto-scheduler from Andrew Adams)?
Does Halide provide an automated way of pulling this mapping or any other information that could help to derive the map?
For example, the auto-scheduler pipeline has the func1_1 Func local variable, which belongs to the func1 private field in the Halide::Generator class. Another less trivial mapping is with pipeline Func local variables that start with “repeatedge*”. I understand that these are mapped to Func’s fed by a BoundaryConditions expression in the Halide::Generator class, although I am not sure about this. Thanks
For the question above, here is a the sample code. I am looking a way to
automatically map func1 from the HalideGenerator class to func1_1 in the
scheduler pipeline below. And func2 to repeat_edge_1.
class HalideGenerator1 : public Halide::Generator <HalideGenerator1> {
public:
...
void generate() {
...
func2(x, y) = BoundaryConditions::constant_exterior(func1, 0)(x, y);
...
}
void schedule() {
...
}
private:
...
Func func1{"func1"};
Func func2{"func2"};
...
};
inline void apply_schedule_HalideGeneratorName(
::Halide::Pipeline pipeline,
::Halide::Target target
) {
using ::Halide::Func
...
Func func1_1 = pipeline.get_func(28);
Func repeat_edge_1 = pipeline.get_func(27);
...
func1_1
.split(...)
.vectorize(...)
.compute_root()
.parallel(...);
...
repeat_edge_1
.split(...)
.vectorize(...)
.compute_root()
.parallel(...);
...
}