Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Ilya Yaroshenko
@9il
foreach (i; 0..m)
{
   foreach (j; 0..n)
   {
       // use matrix1[i, j], matrix2[i, j], matrix3[i, j]
   }
}
to
foreach (i; 0..m)
{
   auto v1 = matrix1[i];
   auto v2 = matrix2[i];
   auto v3 = matrix3[i];
   foreach (j; 0..n)
   {
       // use v1[j], v2[j], v3[j]
   }
}
The same for tensors
dextorious
@dextorious
Okay, there was an aliasing issue similar to what I recently encountered in Julia, which I fixed by explicitly operating on temporary variables ux0, etc., and only storing the results in the matrices at the end. This enabled some vectorization and brought the timing down to 55 ms. It still doesn't unroll the loop as extensively as clang does and the vectorization isn't quite complete, but we're now within 2x.
If I understand this correctly, what I did was a more extreme version of what you just suggested with hoisting out the rows?
Anyway, I'll fix a few of the uglier details (C++-style for -> foreach, etc.) and post up my current version as an issue on the repository.
Ilya Yaroshenko
@9il
Yes
Thanks!
dextorious
@dextorious
Posted: libmir/mir-algorithm#42
Curiously, I ran into the same aliasing issue when I wrote a Julia version of the same code, but there manually introducing the scalar temporaries was enough to persuade the compiler to fully vectorize the loops and get to within ~20% of the C++ benchmark.
So I suspect there may be room for improvement in terms of what information LDC exposes to the LLVM optimizer.
dextorious
@dextorious
@9il So I've been going through the code you wrote, experimenting and slowly adding to it, and have finally come to the question of multithreading. In C++, OpenMP pragmas are enough to get a 3x+ speedup on the original benchmark on a quad core CPU. In D, the closest analogue I've seen is taskPool.parallel, which requires a foreach loop and forever ties me to the GC. Is there a betterC-compliant alternative available somewhere? What's the idiomatic way to parallelize code that would preferably only depend on mir libraries?
Ilya Yaroshenko
@9il
@dextorious BetterC multithreading is not implemented in Mir yet. You can replace most outer do while loop with foreach(i; taskPool.parallel(n.length.iota)) { ni = … ui = … }
iota can be found in mir.topology
Mir is compatiable with Phobos
dextorious
@dextorious
Ah, okay, I'll try that then. Thanks!
drug007
@drug007
Hello! @9il is there a way to handle timeseries where there are several identical timestamps with different data assigned?
Ilya Yaroshenko
@9il
@drug007 Can you provide data example and describe how data should be handled?
drug007
@drug007
sure. I have serveral timeseries from different sources that generate different data and process them simultaneously. But there is probability that timestamps in different timeseries will be equal - that is there will be several observations from different sources at the same time. As I understand now time in mir.timeseries.Series is unique and do not let handle two different observations with the same timestamp?
I can use my own type to represent timestamp - for example its low bits can be used to represent order no to distinct different observation with the same time and the rest bits to represent time itself.
Ilya Yaroshenko
@9il
@drug007 Yes, Series indexing algorithms assumes that timestamps are unique. BTW, Series allow to have multiple columns
Special timestamp type looks like a good idea
drug007
@drug007
multiple columns is a good idea too, but how efficient it is? Probability of several observations with the same time is rather low and it will be sparse data structure - more than 90% and even more time will have corresponding data only in single column and the rest will be empty.
Ilya Yaroshenko
@9il
then your aproach is the best here
drug007
@drug007
ok, I see. btw thank you for your good work!
dextorious
@dextorious
Okay, so since the GitHub discussion seemed to indicate that my experience with learning Mir by porting some simple C++ code might have some value, I'll finish up my code and start writing a blog post towards the end of this week. Lots of silly questions incoming, no doubt. :)
On that note, one thing I noticed when I looked at the Mir blog is that even the more recent posts might need some revising. For instance, http://blog.mir.dlang.io/ndslice/algorithm/optimization/2016/12/12/writing-efficient-numerical-code.html makes extensive reference to mir.ndslice.algorithm.ndEach, which doesn't seem to exist any more. I'm not sure if that entire pattern is out of date or if it's just a few names that have changed in the API, but that's not an isolated case.
Ilya Yaroshenko
@9il
nd prefix was removed for all functions
hmm, the article has a lot of brocken links
Ilya Yaroshenko
@9il
Multidimensional chunks are ready for review libmir/mir-algorithm#45
Nicholas Wilson
@thewilsonator
Slight typo, should be 'High Level ...'. English is silly.
Ilya Yaroshenko
@9il
@thewilsonator , do mean tweet or something else?
do you*
Ilya Yaroshenko
@9il
Ah, I see, it is wrong header
Ilya Yaroshenko
@9il
libmir/mir-algorithm#55
fuentes
@aitzkora
Hi, the links for the benchmark https://github.com/libmir/mir/blob/master/benchmarks/glas/gemm_report.d seems to be dead ? Where I cand download it ?
fuentes
@aitzkora
thanks
Jakob Bornecrantz
@Wallbraker
@thewilsonator Heyo, I saw your post on the NG about doing more DCompute work.
Did you get anywhere with the SPIR-V work on LLVM?
Guillaume Piolat
@p0nce
hi, i'm a derelict-cl maintainer and original derelict-cuda author. I'd like to help but unfortunately I'm under considerable time pressure to release two products one before July and after that a Steam release maybe. So work on derelict-cl won't happen on my end before some time.
of course I'll merge anything timely
I can also give you the ownership of derelict-cl and derelict-cuda heh
Nicholas Wilson
@thewilsonator
@Wallbraker yeah, all the compiler stuff works. There's still some compiler development to be done to reach feature parity with OpenCL & CUDA but its mostly plain D development going forward, automating the (very horrible) OpenCL & CUDA APIs. I gave a talk at DConf and I'll be doing a post for the D blog which should clarify the current status and directions.
@p0nce Thanks, good to know. I send some PR your way soon™. Good luck on your releases.
Andrew Benton
@andrewbenton
Is there a full code example available for dcompute?
Nicholas Wilson
@thewilsonator
The kernels are complete examples. the driver is still a WiP although I hope to have the OpenCL 1.2 driver API done tomorrow, after which I will update the docs.
Andrew Benton
@andrewbenton
Are you thinking about having a demo dub project for people to play with?
Nicholas Wilson
@thewilsonator
That is a possibility