These are chat archives for halide/Halide

23rd
Apr 2018
Steven Johnson
@steven-johnson
Apr 23 20:08
looks like windows builds are all failing in cuda.cpp with
D:\build_bot\worker\win-32-distro-trunk\halide\src\runtime\cuda.cpp:108:5: error: misaligned or large atomic operation may incur significant performance penalty [-Werror,-Watomic-alignment]
interestingly, that section of code doesn’t appear to have changed recently. New warning, maybe? (Perhaps injected by LLVM config?)
Andrew Adams
@abadams
Apr 23 20:11
Hrm, that's an 8-byte type, so it's not large
and its stack alignment should be known
Steven Johnson
@steven-johnson
Apr 23 20:12
it does indeed seem odd
Andrew Adams
@abadams
Apr 23 20:12
Actually this is win-32
so it should be a 4-byte type
Steven Johnson
@steven-johnson
Apr 23 20:12
oh. damn.
ok, but why start failing now?
Andrew Adams
@abadams
Apr 23 20:13
New warning in trunk llvm I assume
Steven Johnson
@steven-johnson
Apr 23 20:13
and the win64 builds are failing with various missing symbols. hmm.
Andrew Adams
@abadams
Apr 23 20:15
Reading the .ll generated by clang 5, I see:
%call = tail call i32 @__atomic_load_4(i8 bitcast (%"struct.Halide::Runtime::Internal::Cuda::CUctx_st"** @_ZN6Halide7Runtime8Internal4Cuda7contextE to i8), i32 2) #4
Ugh. %call = tail call i32 @__atomic_load_4(i8* bitcast (%"struct.Halide::Runtime::Internal::Cuda::CUctx_st"** @_ZN6Halide7Runtime8Internal4Cuda7contextE to i8*), i32 2) #4
Which seems fine to me
32-bit atomic load from a global
Steven Johnson
@steven-johnson
Apr 23 20:16
lots of windows breakages in the week the bots were down, I guess
Steven Johnson
@steven-johnson
Apr 23 20:26
…same failure on linux-64
Steven Johnson
@steven-johnson
Apr 23 20:34
Andrew Adams
@abadams
Apr 23 20:35
So it doesn't understand the global is aligned. Maybe we can give it a hint.
Steven Johnson
@steven-johnson
Apr 23 20:47
um, dunno. maybe? surely it would be emitted aligned properly.
Andrew Adams
@abadams
Apr 23 21:00
I just managed to repro. I'll check the .ll
Andrew Adams
@abadams
Apr 23 21:17
I don't think these atomics are necessary
The spinlock is a global acquire/release memory barrier in the same place as the second two atomics
and I don't see why the first atomic needs an acquire barrier
I'll open a PR and see what Zalman thinks
Zalman Stern
@zvookin
Apr 23 23:12
I'll look at the cuda thing in a bit.