These are chat archives for thunder-project/thunder

29th
Jun 2017
Davis Bennett
@d-v-b
Jun 29 2017 00:01
as far as I know, only 1 process can access a .tif file at a time, so I don't really what will happen if you try to parallelize reading data from a small number of files
2000 files isn't a lot, so unless you have a hard constraint here I would just redistribute your data to 1 timepoint per file
Kyle
@kr-hansen
Jun 29 2017 11:36
@d-v-b Thanks. Part of my concern with the above was that it wasn't built on the same spark context I'd initiated, but would work for my small, local test case. Ya, so for some of our datasets we have 55,000 files or so (2047 frames saved across 27 multi-page tif files). I guess I could always write a thunder code that opens the individual tifs then re-saves them as individual files. Thanks for your help!
Kyle
@kr-hansen
Jun 29 2017 18:25
@d-v-b FYI, at least in local mode, it takes much longer (almost 3x more) for Thunder to load a directory of single .tifs than it takes to load the same images saved as 2 multipage tifs, in my experience. Are there any implementations of parallel loading in local mode?
Davis Bennett
@d-v-b
Jun 29 2017 18:26
kyle in local mode you basically have a single core doing all the work, and I think opening & closing single files involves a lot of overhead that you skip if everything is in a single file
if you have multiple cores available on your machine you can use the multiprocessing library to parallelize over cores
Kyle
@kr-hansen
Jun 29 2017 18:27
@d-v-b Cool. I'll look into that. Thanks!
Kyle
@kr-hansen
Jun 29 2017 19:08
@d-v-b and for anyone who looks back on this thread, I'll just note that the multiprocessing library doesn't work to parallelize over reading in .tif files. After starting a parallel pool, and feeding in p.map(td.images.fromtif, [rawimagesdir]) it takes just as long to load as if I'd run td.images.fromtif(rawimagesdir), but also errors out. If anyone is aware of a way to parallelize local loading of individual tif files, I'd be interested to know of it.
Davis Bennett
@d-v-b
Jun 29 2017 19:10
what if you try doing this without thunder? just to simplify things a bit
Kyle
@kr-hansen
Jun 29 2017 19:11
you mean just using tiffile?
Davis Bennett
@d-v-b
Jun 29 2017 19:11
yeah
Kyle
@kr-hansen
Jun 29 2017 19:11
I'll try that really quick
Davis Bennett
@d-v-b
Jun 29 2017 19:11
if you have sci-kit image, you can you use skimage.io.imread, which wraps tifffile
or you can just use tifffile directly
and iterate over timepoints, not paths
Kyle
@kr-hansen
Jun 29 2017 19:13
what do you mean by iterating over timepoints? I'm assuming that would be files, correct?
it looks like skio.imread doesn't take a directory as input, so I'll have to iterate over the files to some degree
Davis Bennett
@d-v-b
Jun 29 2017 19:15
i'm assuming you still have 1-2 big multipage tifs, and you want to load pages from these files in parallel
so you have a lot of timepoints, but not so many files
Kyle
@kr-hansen
Jun 29 2017 19:15
Yes. I've also resaved those 1-2 big multipage tifs as single tifs where each one corresponds to a single timepoint
Davis Bennett
@d-v-b
Jun 29 2017 19:15
yeah so you want to parallelize over timepoints, not files, since you have a bunch of timepoints and a tiny amount of files
Kyle
@kr-hansen
Jun 29 2017 19:16
So I now have another directory with 4000+ .tifs where each one is a single timepoint
Davis Bennett
@d-v-b
Jun 29 2017 19:16
ah ok
Kyle
@kr-hansen
Jun 29 2017 19:16
so should I iterate over the .tifs in this new folder? I still need to iterate over something it seems
Kyle
@kr-hansen
Jun 29 2017 19:24
@d-v-b So skio.imread and iterating over each individual .tifin the new folder is faster in a local method, and only slightly faster than Thunder reading in the multipage tifs.
It took 2 min and 10 sec using the multiprocessing library to load the 4000 individual tifs while it took 2 min 47 sec using the Thunder to load the 2 multipage tifs consisting of 2000 pages each.
However, it was much faster than the 6 min and 16 seconds to load the 4000 individual tifs using Thunder alone in local mode.
I imagine as it gets larger and I'm looking at my full dataset (26,600 timepoints saved over 13 multipage tifs) that difference might start to be noticeable when using multiprocessing to load individual tifs over the multipage tiff loading.
Davis Bennett
@d-v-b
Jun 29 2017 19:29
if you're sticking to local mode I would use multiprocessing because thunder doesn't do any parallelization in local mode
but as soon as you're working with a cluster, thunder in spark mode will be faster
assuming you have 1 file per timepoint
Kyle
@kr-hansen
Jun 29 2017 19:31
Ya, that's what I've gathered. The one issue I've run into using thunder in spark mode is that I can't do any frame-by-frame functions because the data are distributed.
Davis Bennett
@d-v-b
Jun 29 2017 19:31
what do you mean by frame-by-frame functions?
Kyle
@kr-hansen
Jun 29 2017 19:32
For example, if I want to do a homomorphic filter, a step in that would be to take my video, and subtract a spatially gaussian-filtered version of each frame from itself to remove the background.
I'll always get memory crashes because things are distributed and it tries to combine them on a single machine and the result is that it errors out.
for the subtraction.
Davis Bennett
@d-v-b
Jun 29 2017 19:34
hm, that should be a simple map operation
Kyle
@kr-hansen
Jun 29 2017 19:34
I know there are functions written into both thunder and bolt for this (.minus, etc...) but in practice they seem to crash with memory issues.
Davis Bennett
@d-v-b
Jun 29 2017 19:34
first I would ignore as best you can the functions built in to thunder
like minus and plus and whatnot
Kyle
@kr-hansen
Jun 29 2017 19:35
haha ok.
Davis Bennett
@d-v-b
Jun 29 2017 19:35
the way I would do the filter you describe is write a function that takes an image an input and returns a filtered image as an output
Kyle
@kr-hansen
Jun 29 2017 19:35
Well I've played around with using the standard map, which runs into issues.
Davis Bennett
@d-v-b
Jun 29 2017 19:35
like lambda v: v - v.mean() will remove the mean from an image
Kyle
@kr-hansen
Jun 29 2017 19:35
No, I'm able to create the filtered version just fine
Davis Bennett
@d-v-b
Jun 29 2017 19:35
ah ok
so where is the problem?
Kyle
@kr-hansen
Jun 29 2017 19:36
And I'm trying to do this prior to motion correction, so I'm interested in doing the frame by frame removal
So I can create a gaussian filtered array (4000, 1024, 1024) and I have my original array (4000, 1024, 1024)
But if I want to do original-gaussian, in a frame by frame manner, it doesn't work
I can do as you suggest, and take a single composite frame, to subtract from my original array, and that works ok.
Davis Bennett
@d-v-b
Jun 29 2017 19:37
is your gaussian filter 3D or 2D?
Kyle
@kr-hansen
Jun 29 2017 19:37
2D
2D spatial gaussian filter
Davis Bennett
@d-v-b
Jun 29 2017 19:37
ok so you don't ever need a big gaussian filtered array
you need to gaussian filter each image independently
by using something like images.map(lambda v: v - gaussian_filter(v))
Kyle
@kr-hansen
Jun 29 2017 19:39
Ok, I see what you're suggesting
Maybe I'll give that a try again.
I'm taking the processing pipeline we've been using in matlab in our lab and trying to convert it over to a Python mode, hence why I've been trying to use Thunder. The frame-by-frame portion was getting me stuck in Spark, so I shifted to try working only in local mode again, but this might still work in Spark mode.
Thanks. I'll try it out and report back for the benefit of others.
Davis Bennett
@d-v-b
Jun 29 2017 19:41
no problem, and let me know if you have issues with image registration, that requires a few tricks
Kyle
@kr-hansen
Jun 29 2017 20:53
@d-v-b I'll let you know if I have issues. When you talk about a few tricks, what do you mean? I was doing the registration about 6 months ago by taking a mean image as the reference to send to all the executors which seemed to work ok. Are you able to save the motion correction models as .jsons now? I was previously doing it with pickled files.
Davis Bennett
@d-v-b
Jun 29 2017 21:12
the tricks apply if you want to do registration outside of the registration stuff built in to thunder
Kyle
@kr-hansen
Jun 29 2017 21:19
so that's the second time you've mentioned using the functions outside of what is built in to thunder. Is thunder not quite being supported the same as it was 6 months-1 year ago? I've noticed the commits on the repo have basically stopped since a year or so ago.
Davis Bennett
@d-v-b
Jun 29 2017 21:22
i think development has slowed down, but even if it was under active development I find it's much easier and flexible to use thunder for applying functions I wrote instead of using functions that are built in
in my case, I want to use a registration method that isn't built in to thunder so I don't have a choice