These are chat archives for thunder-project/thunder

19th
Feb 2015
Jeremy Freeman
@freeman-lab
Feb 19 2015 08:16

Hey @bendichter thanks for the interest! The short answer is that Thunder (and Spark w/ Python) could be super useful for doing almost all of these things. At the same time, few of these are built in to Thunder directly so far.

When you construct one of Thunder's data objects (in this case, probably a TimeSeries), which represents a distribute collection of channel time series, you can easily apply in parallel any operation you've already written as a python function in basically one line, newdata = data.applyValues(lambda x: myfunc(x)), where myfunc computes your Hilbert transform, STRF, etc. So that'll probably be a quick way to speed up your workflows.

Hopefully, in addition, some of the methods already available on the data objects would be useful (e.g. some decoding functionality, Fourier transforms). And if there are particular operations in your workflow that are fairly general (might include the Hilbert transform, and the event-related handling), we'd welcome those as contributions!

bendichter
@bendichter
Feb 19 2015 08:18
Cool, thanks! :-)
bendichter
@bendichter
Feb 19 2015 18:41
ah yes, I'm looking at timeseries.py and I see many of the methods are simply wrappers around that applyValues command. Cool, it looks pretty easy to do whatever I need. Thanks for the help!
bendichter
@bendichter
Feb 19 2015 18:58
OK, so I have another question. We generally compute the Hilbert on 40 different bandpasses of the original signal. I'd really like to parallelize over channel and band. Is there a straightforward way to do this?
Jason Wittenbach
@jwittenbach
Feb 19 2015 19:30
So you start with N time series and end up with 40N transformed time series, is that right?
Jason Wittenbach
@jwittenbach
Feb 19 2015 19:37
If so, then you could start by duplicating each of your series 40 times and tacking on which bandpass will be done on it. You'll have to use Spark's RDD.flatMap function to do that with something like:
duplicated = thunder.Series(data.rdd.flatMap(lambda (k, v): [ (k, [i, v]) for i in xrange(40) ]))
bendichter
@bendichter
Feb 19 2015 19:39
Thanks, this is the type of thing I was looking for. I'll try to work out an implementation
Jason Wittenbach
@jwittenbach
Feb 19 2015 19:40
Then you could pass a function to applyValues that takes the bandpass # as well as the time series and performs the transform
I wonder if the entire "duplicate + apply a function that takes a unique identifier as an argument" is general enough to consider adding as a method at some level. What do you think @freeman-lab ?
Jason Wittenbach
@jwittenbach
Feb 19 2015 19:45
Obviously it's not useful on big datasets where the duplication would be too much to fit in memory. But for medium sized datasets, it's a nice way to milk some extra parallelization.