sklearn.neighbors.NearestNeighbors (scikit-learn==0.24.0) to find nearest neighbors
knnModel = NearestNeighbors(n_neighbors=5, algorithm='auto', metric='minkowski', p=2, n_jobs=-1) tes_df.apply(lambda u: knnModel.kneighbors(X=u.values.reshape(1, -1), n_neighbors=5, return_distance=False).ravel().tolist(), axis=1)
test_df contains 5K rows and 77 columns (from one hot encoding) and the execution is around
(train_df's shape => (1754249, 77))
Is it normal to have this high execution ?
Any tips to improve the performance and educe this execution time) ?
handle_unknown='ignore'to ignore it.
tolin the last
n_iter_no_changeiterations. The score can be the loss or an arbitrary scorer and it can be computed on the training set or on the validation set
partial_fitis doing an update