These are chat archives for thunder-project/thunder

5th
Aug 2015
Davis Bennett
@d-v-b
Aug 05 2015 15:23
@freeman-lab series.toImages() seems to have very poor performance in my hands, any ideas what I can do to improve its performance?
Davis Bennett
@d-v-b
Aug 05 2015 15:33
e.g. running tsc.loadSeries(data).toImages().cache() takes at least 20m and kills a few workers
and when that's finished, calling first() on the 'cached' data takes a long time and requires shuffle read/write, which seems strange
Jeremy Freeman
@freeman-lab
Aug 05 2015 15:37
re: the second point, are you calling count on the result of the execution? until then it’s not actually cached
Davis Bennett
@d-v-b
Aug 05 2015 15:37
but shouldn't first() be fast?
Jeremy Freeman
@freeman-lab
Aug 05 2015 15:37
not neccessarily
it is on anything that doesn’t involve a shuffle
Davis Bennett
@d-v-b
Aug 05 2015 15:38
even when I ran first() and count(), trying to take a max projection afterward was causing shuffles
and first() and count() together took 54m on 20 nodes
*30 nodes
Jeremy Freeman
@freeman-lab
Aug 05 2015 15:40
i’m not totally following the full sequence, but maybe we can take this particular case offline and go over the exact sequence you used
Davis Bennett
@d-v-b
Aug 05 2015 15:40
sure thing
Jeremy Freeman
@freeman-lab
Aug 05 2015 15:40
in general, i’m definitely aware that the images -> series conversion will be a bottleneck on large jobs
at least as it’s implemented now
and it’s something we’re looking at as we rearchitect some of this
the fundemental issue is that once the data’s been broken up so as to work on each time series independently
putting the whole thing back together in the way you’re trying is expensive
and i suspect an alternative workflow that works entirely with blocks will be much better
Davis Bennett
@d-v-b
Aug 05 2015 15:42
the context for this is me trying to find a performant way to take dff and then max project
Jeremy Freeman
@freeman-lab
Aug 05 2015 15:43
yeah i guessed that
Davis Bennett
@d-v-b
Aug 05 2015 15:43
using the max projection built into series takes a very long time (1-2 hours) for all axes
and it fills my console with the cries of dead workers
Jeremy Freeman
@freeman-lab
Aug 05 2015 15:43
well we don’t want that =)
i basically know there’s a way to this entirely in block land that will be more performant
the problem is more at level of how to expose it
but we could prototype with your data
let’s look at it later today or tomorrow
Davis Bennett
@d-v-b
Aug 05 2015 15:45
sounds good