These are chat archives for thunder-project/thunder

30th
Jul 2016
Jason Wittenbach
@jwittenbach
Jul 30 2016 13:16
@andins the correlate function is for when you want to compute the correlation between every element of your Series object at a handful of other time series, which you then pass in as an argument.
on the other hand, the cov function computes the pair-wise correlations between all of the elements of your Series object
so the cov function is definitely the one you want…or maybe the grammian function, which computes straight dot products, skipping the cetnering and rescaling.
The function is slightly deceptive though: while it does return a distributed matrix (if you’re in spark mode)
it actually does end up with that matrix being stored locally on your driver at some point
which means, in your case, you need to make sure that you have 3 TB of RAM available for that computation to go through
the algorithm we use parallelizes the computation, but it only parallelizes the storage at the very end.
Jason Wittenbach
@jwittenbach
Jul 30 2016 13:22
@yuruoxin we’re still working on getting that package ready for production in the new 1.0 version. If you would like to try it out ahead of time, you’ll have to download the repository from GitHub.