These are chat archives for thunder-project/thunder

7th
Jul 2015
tomsains
@tomsains
Jul 07 2015 12:54
@jwittenbach so to average out the time series of multiple epochs would you indicate them with the same index number eg. if you had two epochs and you wanted to get a mean time series you would indicate the epochs like this [0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,]?
tomsains
@tomsains
Jul 07 2015 13:02

@freeman-lab I am having trouble aligning a 50gb data set with 800 partitions. I am using 20 nodes each with 32 cores and 57gb RAM which should easily be enough. However when I try to compute a reference image it takes a really long time and then eventually all of my 'workers' are either exited or killed.
This is the error that is returned in the terminal shell:

15/07/07 11:37:45 ERROR ContextCleaner: Error cleaning broadcast 0
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
    at scala.concurrent.Await$.result(package.scala:107)
    at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:137)
    at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:227)
    at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
    at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
    at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:199)
    at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:159)
    at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:150)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:150)
    at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
    at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:144)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618)
    at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:143)
    at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)

any ideas what is going on here?

Jeremy Freeman
@freeman-lab
Jul 07 2015 13:06
@tomsains that seems odd, is this a new error? did this same workflow/dataset previously run ok? what version of Spark/Thunder are you on?
tomsains
@tomsains
Jul 07 2015 13:12
Up until now I had been subsampling my datasets (to reduce computation time) which worked fine. I wanted to redo my analysis at higher resolution so this is actually the first time that i have tried it on this full size dataset. I am using an up to date version of the thunder master branch and spark 1.3.1.
Jeremy Freeman
@freeman-lab
Jul 07 2015 13:17
hm, would you be able to point me to an example dataset/script? (feel free to send it to me off channel)
Jason Wittenbach
@jwittenbach
Jul 07 2015 14:24
@tomsains That’s exactly right. Also, in my earlier reply, I forgott that we have a helper function — series.seriesMeanByIndex — so that you don’t even need to pass in the np.mean function.
tomsains
@tomsains
Jul 07 2015 14:36
@jwittenbach wouldn't that just return two values per voxel (the mean of epochs 0 and the mean of epochs 1) rather than the sterotyped time series for each epoch?
Jason Wittenbach
@jwittenbach
Jul 07 2015 14:40
@tomsains Oh, I see, yeah, you are right. Sorry, I think I misunderstood what you were asking. Let me make sure I understand it. So you might have multiple experimental conditions that are each repeated multiple times. And you want the average time series for each condition, is that right?
tomsains
@tomsains
Jul 07 2015 14:44
that is correct. Sorry for some reason i was finding it difficult to verbally express that!
Jason Wittenbach
@jwittenbach
Jul 07 2015 14:45
Cool. So for that, you need a two-level index. One level representing condition, and the other representing time within the condition.
Something like:
index = [[0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]
          0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]]
series.index = index
series.seriesMeanByIndex(level=[0,1])
tomsains
@tomsains
Jul 07 2015 14:47
ah that makes sense
Jason Wittenbach
@jwittenbach
Jul 07 2015 14:48
so then it will average together every unique timepoint within each condition
and the result will have an index of
[[0, 0, 0, 0, 1, 1, 1, 1]
  0, 1, 2, 3, 0, 1, 2, 3]]
tomsains
@tomsains
Jul 07 2015 14:49
That is great. Thank you
Jason Wittenbach
@jwittenbach
Jul 07 2015 14:51
No problem
Oh shoot, actually, for that code to run, you need to transpose the index:
series.index = index.T
But that should do it :smile_cat: