These are chat archives for thunder-project/thunder

Mar 2015
Noah Young
Mar 10 2015 04:23
Just set spark.speculation true in spark-defaults.conf and left the other speculation parameters as default. Supposedly this causes Spark to launch redundant copies of tasks that seem to be completing slowly. I would run a job with, say 320 partitions, see some SSL timeouts, and then Spark would keep launching stages named stage 1 (retry <n>) with 26 partitions, 2 partitions, then 12 partitions, etc.
Noah Young
Mar 10 2015 04:33
I thought some people here might be interested in my script to upgrade python to 2.7 on the Amazon AMIs that thunder-ec2 gives you. As you may have found out already, yum breaks itself if you upgrade python, which you'll want to do to use the newer versions of IPython and matplotlib. Remember to run the script on the slaves as well!
Mar 10 2015 10:35

@npyoung thank you for that script! It will be nice to use seaborn on the cluster.

@freeman-lab I think the addition of sampling rate or units could be useful. It would certainly make it easier to quickly understand the timescales that neural activity may be oscillating over.
I am more than happy to contribute to the documentation - I will take a look at it when I can to see if there is anything I can think of adding.

Jeremy Freeman
Mar 10 2015 16:24
@npyoung that looks super useful! tried to do this cleanly a long while back and ran into trouble on the spark side. do you think it's worth building this into the standard thunder-ec2 launch process? how long roughly does it take to complete? could of course do in parallel on all workers via pssh.
Ben Shababo
Mar 10 2015 22:27
@freeman-lab I finally started looking at how to implement the SIFTFlow algorithm for image registration... I've found this useful when there is non-rigid motion (which I've seen in some Drosophila data). The code I use for this algorithm is for MATLAB relies on a C++ implementation + MEX for speed. Does thunder have a precedent for including C++ extensions of Python?
Noah Young
Mar 10 2015 23:13
@freeman-lab The first line takes the most time and isn't necessary. The rest of the script completes in under 10 minutes, the rate-limiting step being compiling scipy. I think it's worth including the ability to run it (like setup-notebook), since many of the thunder examples import seaborn.