Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Mathias L. Baumann
@Marenz
I was wondering if there is a way to get the shape of a ndslice-type
or of a ndslice variable but at compile time
basically I want to construct a new ndslice that is a combination of the dimensions of two other ndslices
but I am unable to access N or _lengths or anything that would help me
Ilya Yaroshenko
@9il
Hey @Marenz:
  1. If you are using new ndslice. *._lengths parameter is public and accessible. Please fill issue if it does not. _lengths.length can be used instead of N. *._lengths are mutable. http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#.Slice._lengths
  2. *.shape, and *.shape.length, http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#.Slice.shape
  3. isSlice!T[0] returns the same value as *.shape.length. http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#.isSlice
Does it work for you?
Ilya Yaroshenko
@9il
Mir Algorithm v0.5.8: Interpolation, Timeseries and 17 new functions http://forum.dlang.org/post/pheyabivuumvqbessaok@forum.dlang.org
dextorious
@dextorious
Not sure if this is the right place to ask this, but as part of looking at mir.ndslice, I was going to port a simple lattice Boltzmann fluid dynamics simulation for learning purposes, starting with a collision kernel:
https://gist.github.com/dextorious/d987865a7da147645ae34cc17a87729d
which is currently a literal, non-idiomatic port of a C++ example:
https://gist.github.com/dextorious/9a65a20e353542d6fb3a8d45c515bc18
Ignoring the non-idiomatic loop syntax and similar details, the D version is over 40x slower (LDC v.1.2.0, release build with -O3 and no bounds checks, compared vs. clang v4.0.0 -O3 on a Haswell CPU), which means I'm doing something horribly wrong. Having gone through the docs (and part of the vision library) and checked that the results are correct, I'm somewhat at a loss.
Does anyone see a glaring error that would lead to this level of performance degradation?
Ilya Yaroshenko
@9il
Hello @dextorious
Yes, the C++ code has single indexing for vectors while D code has doouble indexing got matrixes
You may want to declare vectors in the begining of the outer loop
like auto uxv = ux[i];
and operate with this vectors in the internal loop
D (and probably C/C++) can not vectorise double indexing like Fortran
Ilya Yaroshenko
@9il
Finally the performance should be the same
Keep us in touch, I think it is a good example of porting and you can write a short blog post after (this would be very helpful for others)
Also, you may want to use https://github.com/libmir/mir-random . It implements C++ RNG standrd and more
Johan Engelen
@JohanEngelen
@9il It would help a lot if you can extract a minimal example that shows that things are not vectorized/optimized well. There is so much going on in the current example that it's hard to analyze why things don't optimize well. Part of the problem could be that slices are used which don't optimize so well yet (it's a work-in-progress).
Johan Engelen
@JohanEngelen
yeah I saw the post. But I missed a compilable full example.
Ilya Yaroshenko
@9il
Woow, nothing is inlined Oo
ldmd2 -inline -O -enable-cross-module-inlining -release -boundscheck=off -I mir-algorithm/source/ -output-s matrix_copy.d
@JohanEngelen Both variants are very slow
This is surprising because it is probably regression
I can not confirm it because ndslice is not compatable anymore with older versions
ldc2 --version
LDC - the LLVM D compiler (1.3.0git-a969bcf):
  based on DMD v2.073.2 and LLVM 4.0.0
  built with LDC - the LLVM D compiler (0.17.5git-64a274a)
  Default target: x86_64-apple-darwin16.5.0
  Host CPU: haswell
  http://dlang.org - http://wiki.dlang.org/LDC
Johan Engelen
@JohanEngelen
Can you add it to LDC issue tracker? Thanks.
Ilya Yaroshenko
@9il
yep, but without code reduction
Ilya Yaroshenko
@9il
ldc-developers/ldc#2121
Ilya Yaroshenko
@9il
@dextorious see also new LDC issue ldc-developers/ldc#2121
Ilya Yaroshenko
@9il
Hehe, i found reduced test case
dextorious
@dextorious
Morning. I posted that code as I went to sleep, now I finally had a chance to look at the assembly. As you correctly said, the opIndex calls don't get inlined, which is expensive by itself and also prohibits all further optimization via alias analysis and ultimately vectorization.
Moving over to a[i][j] style indexing improved the performance by about 2x, but it's still ~720ms vs 30ms.
Ilya Yaroshenko
@9il
Yep, working on workaround
dextorious
@dextorious
So I guess at least at present, using ndslice is not recommended for loop-heavy computational kernels.
Ilya Yaroshenko
@9il
no, this is regression
I think it can be fixed during 48 hours
dextorious
@dextorious
Ah, ok. Weird that my random attempt at porting some code was the first to notice it.
Anyhow, I'll be around and very happy to test anything or give any feedback I can.
Ilya Yaroshenko
@9il
Thanks, I will let you know when the new mir-algorithm release is ready
dextorious
@dextorious
Great. Very happy about the quick response here. :)
Ilya Yaroshenko
@9il
@dextorious Fixed in PR https://github.com/libmir/mir-algorithm/pull/41/files, tag v0.6.2
It will be in DUB registry after less then a hour.
Do not forget to remove dub.selections.jsonand update mir-algorithm version upto ~>0.6.2 if you use dub.
Also, you may want to add "dflags-ldc":["-mcpu=native"]