These are chat archives for thunder-project/thunder
Quick question, is there expected to be a significant difference in processing time between
For example, if I wanted to find the maximum pixel value across a full series of images, would I expect it to be significantly quicker doing
np.max(images.map(lambda x: np.max(x))) or
np.max(images.map_as_series(lambda x: np.max(x))) or
np.max(series.map(lambda x: np.max(x)))
I'm kind of wondering from a distributed computing consideration viewpoint on Spark. I am still learning the process to think in writing my code in Spark with Images
images.map work when using
with_keys? I was assuming that if I do
adjimgs = imgs.map(lambda (k, v): v - otherimgs[k,:,:], with_keys=True)
when specifying with_keys=True that k is the image # and v is the actual image. otherimgs is another array the same extent of imgs (filtered).
I am doing such a map and using k to index into another array to do a frame by frame subtraction (or division). I know I could use
subtract() for subtraction, but am wondering how this works for other operations. After doing such a call, I keep getting this error when I try to use the outputted
Exception: It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation. RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
Am I likely doing something wrong with my function call, or am I missing what is happening with_keys?
subtractas it only subtracts a single image from all frames. I'd like to subtract frame by frame of a volume that is the same extent of the original image volume