from skmultiflow.data import FileStream
stream = FileStream("./src/skmultiflow/data/datasets/covtype.csv")
stream.prepare_for_use()
stream.n_classes # Output: 7
stream.target_values # Output: [1, 2, 3, 4, 5, 6, 7]
Thanks @jacobmontiel . I run exactly the same codes with you. But the error still exists. However, when I redownload the data, the error is fixed. So, maybe the data source is not saved properly at first. Thanks for your reply.
Glad to hear that it is working. It is strange that it just went away, in any case we will keep it in mind in case somebody else gets the same error.
Hi @jacobmontiel I want to know if all methods for Concept Drift Detection included in skmultiflow.drift_detection only support 1-D data stream. For example, when using a 2-d (size=[2000,5]) data_stream in the following codes, an error will arise.
import numpy as np
from skmultiflow.drift_detection import PageHinkley
ph = PageHinkley()
data_stream = np.random.randint(2, size=[2000,5])
for i in range(999, 2000):
data_stream[i] = np.random.randint(4, high=8,size=5)
for i in range(2000):
ph.add_element(data_stream[i])
if ph.detected_change():
print('Change has been detected in data: ' + str(data_stream[i]) + ' - of index: ' + str(i))
from skmultiflow.drift_detection import PageHinkley
data_stream = np.concatenate((np.random.randint(2, size=1000), np.random.randint(4, size=1000)))
ph = PageHinkley()
for i, val in enumerate(data_stream):
ph.add_element(val)
if ph.detected_change():
print('Change has been detected in data: ' + str(data_stream[i]) + ' - of index: ' + str(i))
ph.reset()
skmultiflow.data.DataStream
sklearn.neighbor.KernelDensity
alongisde scikit-multiflow
. As you mention the sklearn implementation works in batches of data, if you wan to update the densities you have to define data update strategy. This is very similar to how the KNNClassifier
is implemented. You will see there that the data is stored in a sliding window. Regarding drift detection, ADWIN as all other drift detectors take as input 1-dimensional data. You can check the KNNADWINClassifier
which uses ADWIN to monitor the classification performance of the basic KNN model. If ADWIN detects a change in classification performance, then the model is reset.
Hi @dossy , here is one
from skmultiflow.data import DataStream
import numpy as np
n_features = 10
n_samples = 50
X = np.random.random(size=(n_samples, n_feature
y = np.random.randint(2, size=n_samples)
stream = DataStream(data=X, y=y)
# stream.prepare_for_use() # if using the stable version (0.4.1)
stream.n_remaining_samples()
Last line return 50
numpy.ndarray
np.ndarray
as long as you define the index of the target column (last column by default). pandas.DataFrame
are also supported, following the same indications.
scikit-multiflow
is only a small part of the puzzle and there’s a lot of stuff you have to develop yourself around it?