These are chat archives for thunder-project/thunder

May 2016
May 20 2016 18:06

Quick question, is there expected to be a significant difference in processing time between series and images?

For example, if I wanted to find the maximum pixel value across a full series of images, would I expect it to be significantly quicker doing
np.max(images.max()) or
np.max( x: np.max(x))) or
np.max(images.map_as_series(lambda x: np.max(x))) or
np.max( x: np.max(x)))

I'm kind of wondering from a distributed computing consideration viewpoint on Spark. I am still learning the process to think in writing my code in Spark with Images

May 20 2016 22:17
This message was deleted
May 20 2016 22:41

How does work when using with_keys? I was assuming that if I do
adjimgs = (k, v): v - otherimgs[k,:,:], with_keys=True)
when specifying with_keys=True that k is the image # and v is the actual image. otherimgs is another array the same extent of imgs (filtered).

I am doing such a map and using k to index into another array to do a frame by frame subtraction (or division). I know I could use subtract() for subtraction, but am wondering how this works for other operations. After doing such a call, I keep getting this error when I try to use the outputted image object.

Exception: It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation. RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, x: rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the transformation. For more information, see SPARK-5063.

Am I likely doing something wrong with my function call, or am I missing what is happening with_keys?

Actually, correct that. I can't use subtract as it only subtracts a single image from all frames. I'd like to subtract frame by frame of a volume that is the same extent of the original image volume