These are chat archives for thunder-project/thunder
def func(kv): key, value = kv # do something return value data = data.map(lambda x: func(x), with_keys=True)
.tordd()and use spark methods like
Regarding the max question, if your data is an images object, converting to series or mapping as series won't be faster (I think). I know there is an algorithmic difference between the first two options, but there are also other factors that might dominate the optimization. The main one is the number of partitions
images.max() performs a reduce within all the records of a partition and then your
np.max(...) will work across spatial dimensions. On the other hand,
.map(...) first will work on each time point and then you would need to collect each time point and your
np.max(...) will work across the time domain. For both you need to add
.toarray()' before thenp.max()`.
tl;dr time it with different number of partitions.
.tordd()and using spark methods. I'm anticipating I can do a
tordd().cogroup().reduceByKey(np.subtract())or something like that to do image by image subtraction within a volume. If you're aware of any other options, I'd be happy to know about them.
.save()was removed from the
RegistrationModelfor Thunder 1.0.0? I've tried just saving it using
json, but it says
RegistrationModelis not serializable.
reduceByKey— or doing it as
map. I would be interesting to see which of those is faster.
+operator on the
BoltArraySparkto make that work
reduceshould be faster
RegistrationModelin Thunder 1.0.0? Is there a way to save the whole object, to later load and transform data?
np.save(“filename.npy”, model.toarray())to save out the raw values
RegistrationModelis no longer JSON Serializable? I know that is how it was saved previously.
RegistrationModeland turn it into a standalone package exactly like you’re saying, so that we can use it throughout Thunder
RegistrationModelcontains a collection of