These are chat archives for numpy/numpy

Apr 2016
Matthew Rocklin
Apr 12 2016 23:58
I've noticed that NumPy is serializing significantly more slowly under cloudpickle than pickle
In [1]: import numpy as np

In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)

In [3]: import cloudpickle, pickle

In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
Wall time: 185 ms
Out[4]: 100000161

In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 125 ms, sys: 280 ms, total: 404 ms
Wall time: 405 ms
Out[5]: 100000161
It appears that cloudpickle in Python3 uses pickle._Pickler, the python variant.
Does anyone have thoughts on how this could be sped up when pickling in pure python?