These are chat archives for thunder-project/thunder

20th
Dec 2016
Boaz Mohar
@boazmohar
Dec 20 2016 00:21
@d-v-b @jwittenbach In my experience you need to be careful with the .fromrdd() as you now might not have the same keys do to the filter part. If you leave the comforts of Thunder you need to take care of the keys.
Also @d-v-b I am running into a problem you mentioned a while back. I am trying to do a data.mean() on a dataset where a single volume is 1024x1024x300 int16. I am getting: ValueError: can not serialize object larger than 2G, did you solve that?
Jason Wittenbach
@jwittenbach
Dec 20 2016 16:08
@chenminyeh both of those were removed from the current version of Thunder to make it more modular
The fact that ThunderContext is gone is a definite upgrade. You can now just start Python however you want (IPython, Jupyter Notebook, PySpark, etc) and then simply import Thunder like you would any other package (e.g. import thunder as td)
Jason Wittenbach
@jwittenbach
Dec 20 2016 16:13
As for Colorize, we wanted to make Thunder more modular, so it was removed as it’s a little tangential to image/time-series manipulation. I don’t think the code was ever moved anywhere else, but you can still find it in the old release: https://github.com/thunder-project/thunder/blob/v0.6.0/thunder/viz/colorize.py
chenminyeh
@chenminyeh
Dec 20 2016 16:36
@jwittenbach Got you! I like how one can visualize the results from factorization. I am new to thunder and python but will see if i there are alternatives to do those! Thanks!!
Davis Bennett
@d-v-b
Dec 20 2016 18:11
@boazmohar I haven't had that issue with serializing big objects in a while, probably I found a way around it? I honestly don't remember
are you making an average image, or an average timeseries?
Boaz Mohar
@boazmohar
Dec 20 2016 21:01
@d-v-b I am averaging over time to get an average image. I can load the data I can pull one image with first(), map() and save back out but aggregation seem to fail.
Davis Bennett
@d-v-b
Dec 20 2016 21:10
I don't remember how thunder does averaging across the distributed axis, but you should be careful doing that with int16 data, you might get overflow problems
that's assuming you solve the problems you're currently having :smile:
Boaz Mohar
@boazmohar
Dec 20 2016 21:13
It fails so quickly that I also don't think it is related. Also it works fine as long as my z is under 250 planes (i.e. 1024x1024x250 volumes). Do you average anything with an equivalent number of elements?
Davis Bennett
@d-v-b
Dec 20 2016 21:17
my data are usually ~[120, 2048,2048] maximum
i might have been getting the same errors back when my data were tht big
Boaz Mohar
@boazmohar
Dec 20 2016 21:21
That is around the same number of elements. Sorry to keep bugging, but was there a version of spark that you could push that along?
Davis Bennett
@d-v-b
Dec 20 2016 21:24
I can't guarantee that I ever got around this issue, I haven't worked with data that big in a while
Boaz Mohar
@boazmohar
Dec 20 2016 21:25
Got it. I will be splitting it for now. Thanks!
Davis Bennett
@d-v-b
Dec 20 2016 21:26
looking back, I didn't have your issue when taking a mean, it was when I was trying to use sc.images.fromarray with a big array, and I think I just worked around it
Boaz Mohar
@boazmohar
Dec 20 2016 21:27
From what I can see in the code it is checking for number of elements bigger then 2^32 so I don't get it.