These are chat archives for thunder-project/thunder

10th
May 2015
Matty G
@meawoppl
May 10 2015 16:04
Hey howdy anyone around?
Jeremy Freeman
@freeman-lab
May 10 2015 16:05
yup
Matty G
@meawoppl
May 10 2015 16:08
there's a familiar face. We just started playing with your tools this last week, great work :)
Jeremy Freeman
@freeman-lab
May 10 2015 16:08
awesome, thanks!
Matty G
@meawoppl
May 10 2015 16:10
any words of wisdom about how to distribute code/modules to spark clusters? It is still a pain point for us
Jeremy Freeman
@freeman-lab
May 10 2015 16:12
hm, if you can bundle custom code into one or more egg files, then you can ship across the cluster by calling sc.addPyFile('path/to/file.egg')
and if launching with the thunder executable you can achieve the same thing with the command line argument --py-files, as in thunder --py-files path/to/file.egg
Matty G
@meawoppl
May 10 2015 16:33
cool, thats not too bad, I was hoping to co-opt conda as we are already using it for local stuffs
do you know of anyone running spark out of docker containers?
Jeremy Freeman
@freeman-lab
May 10 2015 17:09
ah, so you can use conda packages by either running your conda install ... on all the nodes through parallel ssh, or by running the anaconda installation on the driver and just rsyncing to the workers
docker would definitely be "the better way", the first step is spark deployment via docker/packer, which @nchammas and other are working on, but AFAIK it's still in progress https://issues.apache.org/jira/browse/SPARK-3821
Matty G
@meawoppl
May 10 2015 18:22
yeah, I saw that deployment script. The problem I see it that it uses anaconda2 and the root conda env. We need 3 for our stack (though it probably works in 2. . . ). This particular permutation seems a little more tricky, hence leading me to the container approach. It seems pretty easy to make a standalone docker container that accomplishes this, but ya know just general hurdles of IT hoolery