Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Prasun Anand
@wilzbach , D is a lot better than other languages out there. With great performance, there may be some downsides. I am hooked to D for the speed and syntactic sugar(similar to Ruby), it offers :) .
Ilya Yaroshenko
Prasun Anand
@9il : Is --build=release-nobounds parameter necessary for improved performance of mir-glas gemm routine?
I am multiplying two rectangular matrices of shape [1217, 8000] and
[8000, 1217] and benchmarked it for OpenBLAS and mir-glas.
For mir-glas
Time taken for gemm =>1 sec, 267 ms, 309 μs, and 3 hnsecs
For OpenBLAS
Time taken for gemm =>522 ms, 456 μs, and 7 hnsecs
Prasun Anand
Currently, I can't compile with --build=release-nobounds because of dub error.
Ilya Yaroshenko
@prasunanand Please open an issue for mir-glas :-)
Ilya Yaroshenko
@prasunanand GLAS is 2 x slower with LLVM 4.0. Probably you need to use LDC based on LLVM 3.9
Ilya Yaroshenko
Mir random v0.2.x was released. Random ndslice generation was added.
import mir.ndslice: slicedField, slice;
import mir.random;
import mir.random.variable: NormalVariable;
import mir.random.algorithm: field;

auto var = NormalVariable!double(0, 1);
auto rng = Random(unpredictableSeed);
auto sample = rng      // passed by reference
    .field(var)        // construct random field from standard normal distribution
    .slicedField(5, 3) // construct random matrix 5 row x 3 col (lazy, without allocation)
    .slice;            // allocates data of random matrix

import std.stdio;
Prasun Anand
Thank You @9il . I will switch to LLVM 3.9 :)
Mathias L. Baumann
I was wondering if there is a way to get the shape of a ndslice-type
or of a ndslice variable but at compile time
basically I want to construct a new ndslice that is a combination of the dimensions of two other ndslices
but I am unable to access N or _lengths or anything that would help me
Ilya Yaroshenko
Hey @Marenz:
  1. If you are using new ndslice. *._lengths parameter is public and accessible. Please fill issue if it does not. _lengths.length can be used instead of N. *._lengths are mutable. http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#.Slice._lengths
  2. *.shape, and *.shape.length, http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#.Slice.shape
  3. isSlice!T[0] returns the same value as *.shape.length. http://docs.algorithm.dlang.io/latest/mir_ndslice_slice.html#.isSlice
Does it work for you?
Ilya Yaroshenko
Mir Algorithm v0.5.8: Interpolation, Timeseries and 17 new functions http://forum.dlang.org/post/pheyabivuumvqbessaok@forum.dlang.org
Not sure if this is the right place to ask this, but as part of looking at mir.ndslice, I was going to port a simple lattice Boltzmann fluid dynamics simulation for learning purposes, starting with a collision kernel:
which is currently a literal, non-idiomatic port of a C++ example:
Ignoring the non-idiomatic loop syntax and similar details, the D version is over 40x slower (LDC v.1.2.0, release build with -O3 and no bounds checks, compared vs. clang v4.0.0 -O3 on a Haswell CPU), which means I'm doing something horribly wrong. Having gone through the docs (and part of the vision library) and checked that the results are correct, I'm somewhat at a loss.
Does anyone see a glaring error that would lead to this level of performance degradation?
Ilya Yaroshenko
Hello @dextorious
Yes, the C++ code has single indexing for vectors while D code has doouble indexing got matrixes
You may want to declare vectors in the begining of the outer loop
like auto uxv = ux[i];
and operate with this vectors in the internal loop
D (and probably C/C++) can not vectorise double indexing like Fortran
Ilya Yaroshenko
Finally the performance should be the same
Keep us in touch, I think it is a good example of porting and you can write a short blog post after (this would be very helpful for others)
Also, you may want to use https://github.com/libmir/mir-random . It implements C++ RNG standrd and more
Johan Engelen
@9il It would help a lot if you can extract a minimal example that shows that things are not vectorized/optimized well. There is so much going on in the current example that it's hard to analyze why things don't optimize well. Part of the problem could be that slices are used which don't optimize so well yet (it's a work-in-progress).
Johan Engelen
yeah I saw the post. But I missed a compilable full example.
Ilya Yaroshenko
Woow, nothing is inlined Oo
ldmd2 -inline -O -enable-cross-module-inlining -release -boundscheck=off -I mir-algorithm/source/ -output-s matrix_copy.d
@JohanEngelen Both variants are very slow
This is surprising because it is probably regression
I can not confirm it because ndslice is not compatable anymore with older versions
ldc2 --version
LDC - the LLVM D compiler (1.3.0git-a969bcf):
  based on DMD v2.073.2 and LLVM 4.0.0
  built with LDC - the LLVM D compiler (0.17.5git-64a274a)
  Default target: x86_64-apple-darwin16.5.0
  Host CPU: haswell
  http://dlang.org - http://wiki.dlang.org/LDC
Johan Engelen
Can you add it to LDC issue tracker? Thanks.