These are chat archives for thunder-project/thunder

1st
Jun 2016
Jason Wittenbach
@jwittenbach
Jun 01 2016 00:58
@nvladimus it will remove singleton dimensions, but underneath the hood, we're keeping track of which dimensions are "distributed" (i.e. stored in the keys as labels) and which are "local" (i.e. stored as an ndarray in each record)
the squeeze function will stop short of getting rid of all of the dimensions of either type.
to put it another way: it will always leave at least 1 local and 1 distributed dimension
just for bookkeeping purposes
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:24
@jwittenbach Hello everyone, I am considering upgrading to Thunder v1.0 but I see a lot of changes, specially related to staring up the Spark cluster. Will I lose thunder-submit and other functionalities like this if I upgrade from v0.6 to v1.0?
My cluster is using spark 1.4.1, should I upgrade it as well?
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:40
@jwittenbach @freeman-lab BTW, you have an error in the description of the frompng method. the default ext is set to TIF
Nikita Vladimirov
@nvladimus
Jun 01 2016 15:44
thanks, @jwittenbach
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:50
@AlexandreLaborde you will lose a lot of the thunder-related scripts
but this is not necessarily a bad thing
things like thunder-submit were simply wrapping spark-submit and creating the thunder-spark context (tsc)
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:51
@jwittenbach But is there a way to run thunder in a non interactive way similar to thunder-submit ?
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:52
in 1.0, thunder no longer uses a context, as wrapping contexts inside of contexts was a little unsustainable
so now you can use the spark-submit that comes with spark
and just import thunder inside of your script
all that thunder-submit was doing was calling spark-submit and then setting up tsc
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:53
and then get the sc and pass it in engine=sc right ?
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:53
the new way is a little less indirect
yep, exactly!
the one trick is that you do need to make sure that all of your workers have acces to the thunder source code
which you can do by either making sure it’s on their PYTHONPATH
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:54
I think I like this way better ;)
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:54
(that’s what we do, we handle it during cluster setup)
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:54
lets see if I can make this work
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:54
or, you can zip up the Thunder source code into a .zip file and pass that to sc.addPyFiles inside of your script
yeah, we like it better too — keeps the boundaries between what Spark is responsible for and what Thunder is responsible for a little more clear
and allows us to have a local mode that runs with NumPy array in place of RDDs for small datasets and rapid prototyping
:)
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:56
yeah the old "local mode" was a bit weird it way just a spark cluster inside your machine
was*
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:56
yeah, good for finding Spark-based bugs, but terrible for performance
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:57
agreed
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:57
and you can still get that mode by running pyspark locally without setting up a cluster, and then using that sc
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:57
should I uprade spark as well?
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:57
we’ve been using 1.5 for a long time and just recently upgraded to 1.6
everything has worked well for us so far!
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 15:59
the last time I upgraded spark I shot myself in the foot
because thunder was not prepared for that version of spark or something in those line
line
lines*
Jason Wittenbach
@jwittenbach
Jun 01 2016 15:59
ah yeah, sorry about that!
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:00
but if you have the new version and it is OK I will upgrade it as well
Jason Wittenbach
@jwittenbach
Jun 01 2016 16:00
we’ve been doing all of the recent Thunder dev against Spark 1.6, so everything should be good to go at least up to that point
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:01
Can I work with 16 images now or not yet ?
16bit sorry
Jason Wittenbach
@jwittenbach
Jun 01 2016 16:01
I think so?
@freeman-lab @boazmohar or @d-v-b might have a better handle on this than I do though
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:04
A few months ago I spend some time here with you and Jeremy trying to fix some weird error and it turned out that 16bit images didn't work in the LoadImages method
som PIL dictionary error
some*
Jason Wittenbach
@jwittenbach
Jun 01 2016 16:04
so that might be ok then
we’re now using tifffille everywhere…I think
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:06
OK. when I manage to link thunder and spark again then I will try to work with 16bit images and I'll let you know
Jason Wittenbach
@jwittenbach
Jun 01 2016 16:07
awesome, that would be great
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:08
Can I use the example images and datasets inside spark ?
the last time I had to manually copy the files to some folder and then use them
the example would only work local mode
Jason Wittenbach
@jwittenbach
Jun 01 2016 16:09
I think this is the same as before
the examples now get downloaded when asked for via thunder, rather than coming packaged with the code
but then will still be downloaded to the driver
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:10
I see...
Jeremy Freeman
@freeman-lab
Jun 01 2016 16:10
@AlexandreLaborde i actually think loading the sample data should work in distributed mode now too
Jason Wittenbach
@jwittenbach
Jun 01 2016 16:10
if you have thunder installed on a network file system so that the workers can find the images at the same path, then you’re good to go
Jeremy Freeman
@freeman-lab
Jun 01 2016 16:10
not actually 100% sure we should test this
Jason Wittenbach
@jwittenbach
Jun 01 2016 16:11
@freeman-lab did we change it so that it will parallelize the images from the driver?
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:11
Well I'll test it and place the results where.
here
Jeremy Freeman
@freeman-lab
Jun 01 2016 16:12
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:13
I guess that should work inside the cluster
Thank you very much for your help
Jeremy Freeman
@freeman-lab
Jun 01 2016 16:14
sure!
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 16:14
I will uprade my system this week
fingers crossed :)
Jeremy Freeman
@freeman-lab
Jun 01 2016 16:14
awesome
good luck!
yeah the one that might not work inside a cluster is the series.fromexample because it first downloads the data file locally https://github.com/thunder-project/thunder/blob/master/thunder/series/readers.py#L394-L443
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 17:11
@freeman-lab @jwittenbach sorry to bother you again, are you sure pip install thunder-factorization is working ? I cant find the repo
Jeremy Freeman
@freeman-lab
Jun 01 2016 17:11
ah that one still needs to be published to pypi
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 17:12
so, should I just download it from GitHub then?
Jeremy Freeman
@freeman-lab
Jun 01 2016 17:16
for now yes, @jwittenbach let's do that publishing today?
alexandrelaborde
@AlexandreLaborde
Jun 01 2016 17:24
Done. If you don't publish it you should have a small note on the webpage because there are a lot of references to that there
:)