9il on master
Replace deprecated `do` keyword… (compare)
auto uxv = ux[i];
ldmd2 -inline -O -enable-cross-module-inlining -release -boundscheck=off -I mir-algorithm/source/ -output-s matrix_copy.d
ldc2 --version
LDC - the LLVM D compiler (1.3.0git-a969bcf):
based on DMD v2.073.2 and LLVM 4.0.0
built with LDC - the LLVM D compiler (0.17.5git-64a274a)
Default target: x86_64-apple-darwin16.5.0
Host CPU: haswell
http://dlang.org - http://wiki.dlang.org/LDC
opIndex
calls don't get inlined, which is expensive by itself and also prohibits all further optimization via alias analysis and ultimately vectorization.
a[i][j]
style indexing improved the performance by about 2x, but it's still ~720ms vs 30ms.
dub.selections.json
and update mir-algorithm version upto ~>0.6.2
if you use dub.
"dflags-ldc":["-mcpu=native"]
a[i][j]
style indexing now runs in 89 ms, compared to 31 ms from C++, so it's within 3x now.
foreach (i; 0..m)
{
foreach (j; 0..n)
{
// use matrix1[i, j], matrix2[i, j], matrix3[i, j]
}
}
foreach (i; 0..m)
{
auto v1 = matrix1[i];
auto v2 = matrix2[i];
auto v3 = matrix3[i];
foreach (j; 0..n)
{
// use v1[j], v2[j], v3[j]
}
}
ux0
, etc., and only storing the results in the matrices at the end. This enabled some vectorization and brought the timing down to 55 ms. It still doesn't unroll the loop as extensively as clang does and the vectorization isn't quite complete, but we're now within 2x.