These are chat archives for freeman-lab/zebra

8th
Apr 2015
Davis Bennett
@d-v-b
Apr 08 2015 22:00
@freeman-lab lol looks like the new version of local correlation is the one I suggested up here: :point_up: April 6 2015 2:30 AM
I hope it works! current local corr basically doesn't work for big data
Jeremy Freeman
@freeman-lab
Apr 08 2015 22:21
yes it is, just acknowledged you in the PR =)
it's funny, when you suggested it here i thought it was inefficient because it requires a join
but then @sofroniewn realized it could be done with a union!
which i for some reason never thought of
Davis Bennett
@d-v-b
Apr 08 2015 22:24
it probably is inefficient for that reason, but it's gonna be better than what we have
Davis Bennett
@d-v-b
Apr 08 2015 22:34
why are joins inefficient? do they invoke a shuffle?
Jeremy Freeman
@freeman-lab
Apr 08 2015 23:00
no no, we can do it with a union!
much cheaper than a join
joins combine two rdds record by record by matching up keys
that's what we would do if we combined raw and blurred in series town
a union is more like a concatenation
so we union the original images with the blurred ones, do a big images to series conversion on the whole thing, then compute the correlation between the "first half" and "second half" of the time series
the conversion to time series will be beefy
but the data replication is only 2x, and doesn't depend on the neighborhood size
make sense?
Davis Bennett
@d-v-b
Apr 08 2015 23:15
yeah I think I see it
join takes two rdds each with (k,v) and produces a third rdd with (k, (v,w))
Jeremy Freeman
@freeman-lab
Apr 08 2015 23:16
bingo
Davis Bennett
@d-v-b
Apr 08 2015 23:16
union would make (k,v) where k is bigger
Jeremy Freeman
@freeman-lab
Apr 08 2015 23:16
union takes [a,b,c,...] and [e,f,g,...] and makes [a,b,c,...,e,f,g,...]
Davis Bennett
@d-v-b
Apr 08 2015 23:17
gotcha yeah that should be much much simpler than stuffing new data into the keys
Jeremy Freeman
@freeman-lab
Apr 08 2015 23:17
where a, b, etc. are either values or key, value pairs, doesn't matter
yup
once it's merged in you can be the first to tell us how well it scales =)
i know how you like to :rage4:
Davis Bennett
@d-v-b
Apr 08 2015 23:17
nikita has a dataset that couldn't be local corr'd even with 50 nodes
it's the father gascoigne of fish data
Jeremy Freeman
@freeman-lab
Apr 08 2015 23:18
hahaha
we will crush it
with a blunderbuss and a saw
or the kirkhammer
need to compare damage output
Davis Bennett
@d-v-b
Apr 08 2015 23:18
wow I spelled gascoigne correctly
Jeremy Freeman
@freeman-lab
Apr 08 2015 23:18
hahaha