These are chat archives for thunder-project/thunder

5th
May 2015
sophie63
@sophie63
May 05 2015 02:18
Hi, I am trying to use thunder locally for testing and I have problems loading files with .npy format. I just did a pip install --upgrade so I should the last version.
Here is the error: TypeError: loadSeries() got an unexpected keyword argument 'inputformat'
sophie63
@sophie63
May 05 2015 02:24
In general, is it possible to transform directly a numpy array already loaded to a Series or Images type?
Jeremy Freeman
@freeman-lab
May 05 2015 02:29
@sophie63 I think the issue might just be that the argument is inputFormat instead of inputformat (note the capitalization), as given here
does that fix it?
sophie63
@sophie63
May 05 2015 02:32
yes that corrects the TypeError sorry... However I still cannot open my more than 2D npy arrays, do you know if there are indirect ways to do that?
Jeremy Freeman
@freeman-lab
May 05 2015 02:39
can you explain the use case a bit more? this is for loading multiple 3D arrays (e.g. volumes) into an Images object?
top-level, i do think we should expose simpler methods for loading both Images and Series directly from numpy arrays
sophie63
@sophie63
May 05 2015 02:45
Ideally to load a 4D array: txyz. I still have steps in my pipeline that use external programs or that are easier (faster) to do with an array so I need to go back and forth with the formats.
Jeremy Freeman
@freeman-lab
May 05 2015 02:49

at least for Images, it's a little buried, but you can do the following:

from thunder.rdds.fileio.imagesloader import ImagesLoader
imgs = ImagesLoader(sc).fromArrays(list(arys))

where arys is a numpy array with shape txyz

we can expose that at the top level on the ThunderContext, will create an issue now
and also do something similar for Series
would also be curious to hear what some of those steps are, we're working on some new ideas for speeding up operations that can (or should) be done as local array operations, whether because you're running locally, the data is small, etc.
sophie63
@sophie63
May 05 2015 03:22
Awesome! That s very helpful thanks!
I have just started to move some of my pipeline to python so it might not be very optimized yet... Right now I use thunder to crop the images, but then collect in an array, average and threshold in time to choose the frame when my excitation light is on and save in a nifti format for inspection in imageJ and movement correction in AFNI (3Dvolreg). I then debleach with a parfor loop in matlab on a GPU cluster -I am trying to move that step to thunder-, and inspect in ImageJ again. I then want to put the array back in thunder for SVD / ICA -have used fsl or matlab so far but had to use only downsampled data-.
In brief I mostly need arrays to go back and forth with other programs
Seetha Krishnan
@seethakris
May 05 2015 03:49
Screen Shot 2015-05-05 at 11.39.53 am.png
Hey, I am having trouble running NMF on my data. I routinely run PCA, ICA and kmeans in thunder. Tried NMF yesterday and the kernel seems to stall after a while. No errors but the job never finishes. I am running this on a local machine. The data is quite small, 512x256x3 and there is plenty of free memory left. There is no verbose output, which makes me think it never went into the iterations.
Jason Wittenbach
@jwittenbach
May 05 2015 03:54
@seethakris What version of Spark are you using? There is a issue where Thunder's NMF doesn't work with Spark 1.2, however it should work with the newer Spark 1.3. This might be the problem that you're running into.
Jeremy Freeman
@freeman-lab
May 05 2015 03:56
Just to add, fairly certain the issue was specific to Spark 1.2.1, more at thunder-project/thunder#129
Seetha Krishnan
@seethakris
May 05 2015 04:02
I was running spark 1.2.1. Tried now with 1.3. Still the same stalling. But no error messages.
Jeremy Freeman
@freeman-lab
May 05 2015 04:05
hm, i can try to reproduce locally, can you give the full dimensions of Series object? most useful would be series.nrecords and len(series.index)?
Seetha Krishnan
@seethakris
May 05 2015 05:58
Oh Thanks! It works on a smaller dataset - 'series.nrecords' : 240000 , 'len(series.index)' : 81, 'series.dims.max': (200, 300, 4). But stalls for this larger one - 'series.nrecords' : 568320, 'len(series.index)' : 320, 'series.dims.max': (512, 222, 5). The memory utilised is only 40% of 8GB for the latter and I am not getting any OutOfMemoryErrors.
Seetha Krishnan
@seethakris
May 05 2015 06:34
I have edited spark-env.sh to utilise maximum memory so that isn't the issue either.