These are chat archives for numpy/numpy

Apr 2016
Remi Rampin
Apr 26 2016 16:04
Hi! I have an array of triples (shape = (1000, 3)) and I want to get for each triple its minimum distance to other triples in another array (shape = 200, 3)
thus end up with a (1000, 1) where row N is the minimum distance between firstarray[N, :] and each of the secondarray[x, :]
Can't seem to vectorize that, if it's at all possible... using apply_along_axis() and Python code is very slow obviously
Remi Rampin
Apr 26 2016 16:09
points = np.array([[1, 2], [3, 4], [5, 6], [5, 4]])
references = np.array([[0, 0], [3, 3], [4, 4]])
distance = lambda x, y: math.sqrt(sum((X - Y) * (X - Y) for X, Y in zip(x, y)))
[min(distance(p, r) for r in references) for p in points]  # Not vectorized
Matthew Rocklin
Apr 26 2016 16:27
I recommend asking questions like these on StackOverflow. I suspect that there are many more people watching the Stackoverflow numpy tag than watch this gitter channel
Remi Rampin
Apr 26 2016 16:32
Oh, ok. I'll do that then!
Richard Otis
Apr 26 2016 19:03
I have to numpy.repeat() everything to create big arrays that I immediately reduce with numpy.ndarray.min(), it hurts my eyes a bit
but performance seems good enough?
Moritz E. Beber
Apr 26 2016 20:29
Go with scipy:
from scipy.spatial.distance import cdist
cdist(points, references).min(axis=1)
array([ 2.23606798,  1.        ,  2.23606798,  1.        ])
That's using your example in the second cell.
On your larger data it runs in micro seconds instead of milli seconds.
Remi Rampin
Apr 26 2016 20:33
hmm ok
Remi Rampin
Apr 26 2016 21:25
You're right of course, it's 10 times faster
Moritz E. Beber
Apr 26 2016 22:50
I did get a caching warning, not sure where that could have come from. Glad it's still significantly faster.