Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
  • Nov 01 12:04
    IlyaDer17 commented #307
  • Sep 05 07:19
    greywolfbrillio edited #311
  • Sep 05 06:48
    greywolfbrillio opened #311
  • Aug 10 20:35
    lsgotti commented #307
  • Aug 10 05:28
    ananya2711 commented #307
  • Jun 22 02:12
    dependabot[bot] labeled #310
  • Jun 22 02:12
    dependabot[bot] opened #310
  • Jun 08 19:44
    denisesato edited #309
  • Mar 25 21:11
    denisesato opened #309
  • Feb 22 03:34
    tigerinus opened #308
  • Feb 09 15:35
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Nov 17 2021 23:03
    lsgotti opened #307
  • Oct 28 2021 14:19
    lambertsbennett edited #268
  • Oct 28 2021 14:19
    lambertsbennett edited #268
  • Oct 19 2021 18:28
    denisesato commented #306
  • Oct 19 2021 17:50
    denisesato edited #306
Saulo Martiello Mastelini
Welcome to river! If you have any questions, you can use Github's discussions to make questions and get feedbacks
Saulo Martiello Mastelini
We understand that some ongoing projects rely on skmultiflow for their functioning. For that reason, we will keep skmultiflow in its current state (stable release) and might apply eventual bug fixes
Nasrin Eshraghi Ivari
Hi All, Can anyone help me please? I have implemented data stream clustering. to simulate the stream, I used scikitmultiflow. But I want a sliding time window model to capture my last data. I do not know how can I implement or use sliding ?Does scikitmultiflow support windowing? I could not find anything related!
Michael Forde
Hey Everyone, I have been wanting to do some multivariate forecasting using the HoeffdingTreeRegressor for streamed data, Scikitmultiflow seems to suit what I want to do very well, but I noticed there isn't built-in forecasting support, I'm curious if there is any workaround I could try to do to achieve multiple step forecasting with this library?
Wannabe Maker
Hi all, Is there any way to install scikit multiflow on new M1 Macs ? I Use Pycharm as IDE and when i tried to install scikit multiflow everytime i got a Error because it try to install multiflow for X64 architecture. Is here someone with similar problem?
Hi @jacobmontiel , how to use concept drift detectors in regression problems (example: ADWIN, DDM,KSWIN)? obs: I don't want to use the error
from skmultiflow.data import AnomalySineGenerator
from skmultiflow.anomaly_detection import HalfSpaceTrees
import numpy as np
import pandas as pd

stream = AnomalySineGenerator(random_state=42, n_samples=10000, n_anomalies=250)
hs_tree = HalfSpaceTrees(n_estimators=10, depth=8)
true_positive = 0
anomalies = 0
predictions = []
y_test = []
max_samples = 10000
n_samples = 0

while n_samples < max_samples and stream.has_more_samples():
    X, y = stream.next_sample()
    y_pred = hs_tree.predict(X)
    if y[0] == 1.0:
        true_positive += 1
        if y_pred[0] == 1.0:
            anomalies += 1

    hs_tree.partial_fit(X, y)
    n_samples += 1

print('The data has {} anomalies'.format(true_positive))
print('Half Space Trees predicted {} anomalies'.format(anomalies))
This code on running it produces some interesting output. It predicts 0's for some time and predicts 1's later on.
Below is the classification report output which is quite poor.
from sklearn.metrics import classification_report
print(classification_report(y_test, predictions))
Hassan Mehmood
@Marilia-Nayara please open up your problem little bit.
@jacobmontiel @all Anyone, please tell me how to use EvaluatePrequentialDelayed with extremely fast decision tree. I want to use the incremental learn part from extremely fast decision tree, while doing delayed evaluation. Please help me!!!!!
Saulo Martiello Mastelini
Hi everyone, just as a reminder, there is not active development in skmultiflow anymore. Skmultiflow and Creme have merged to become River. Now, users can also install River via pip
We encourage the skmultiflow users to make the leap to River. Feel free to open a discussion with your question, or asking for any assistance
Jacob Montiel and I are both maintainers of River too
Saulo Martiello Mastelini
I'll talk with Jacob about the possibility of creating a quick guide, maybe something like: "from skmultiflow to river"
How do I save the results after the test as a file. Instead of saving the assessment as a file
@smastelini @jacobmontiel@all
I am a multi-label classification. I have a total of 14 labels. Where do I set it?
Hello @jacobmontiel @smastelini , i'm new to scikitmultiflow. I would like to train my classification model on one data stream (train data), and make predictions on another datastream(test stream). Could you please help me with a short piece of code that demonstrates this. Thank you.
why my line graph doesnt show any value for the evaluation
5 replies
Hi, is there any way of getting a concept drift's detectors accuracy, or you have to use it with a clasification method? Thankyu

Hi! We have a requirement to train the model using historical data as well as real time data. I am trying to use AdaptiveRandomForestRegressor model but getting error. Firstly, I am training model using data from csv and then will be training model based on real data. I am using the code below where
X and y are my features and labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

regressor = AdaptiveRandomForestRegressor()
regressor.fit(X_train, y_train)

I am getting below error. Can someone please help. Traceback (most recent call last):
File "C:/Users/Neha.Goyal/PycharmProjects/pythonProject/PredictionDataML_Scikit_AdaptiveRandomForest.py", line 43, in <module>
regressor.partial_fit(X_train, y_train)
File "C:\Users\Neha.Goyal\PycharmProjects\pythonProject\venv\lib\site-packages\skmultiflow\meta\adaptive_random_forest_regressor.py", line 296, in partial_fit
X[i].reshape(1, -1), [y[i]], sample_weight=[k],
File "C:\Users\Neha.Goyal\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\series.py", line 824, in getitem
return self._get_value(key)
File "C:\Users\Neha.Goyal\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\series.py", line 932, in _get_value
loc = self.index.get_loc(label)
File "C:\Users\Neha.Goyal\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 0

Hello everyone. I am trying to create a dataset using SEAGenerator with 100 features but as default, it creates with 3 features. How can I create with 100 features??. What I am trying to get is like 0.6987, 0.2568, 0.570, 0.949, 0.1970, … , 0.3285, 0.4474, 0.3355, 0.585, 0.5411, 0 where the last number is the class and others are features.
Hi all, I want to develop an online LSTM anomaly detector. Is there any way to do it on scikit-multiflow? Thanks for any help
Hi, happy to be here.
I need some guidance. I am writing a project on data drift. I have trained an instance segmentation model on the Cityscapes Dataset, to detect pedestrians. The model I trained was on daylight images, and I have exported the same. But now I want to build the drift monitoring solution to pick up data drift when I test the model on nighttime images of pedestrians. I have tried using alibi-detect, but struggled to understand. Is there any way I can use scikit multiflow for my use-case.
I will be extremely grateful for any help. Thank you.
Mariam Benllarch
Hi everyone, is there any solution for treating non-numerical data?
Dear Mr. @jacobmontiel, I tried to get the structure of the tree from HoeffdingTreeRegressor using get_rules_description(). The result was something like this:
Att (17) <= 24.000 and Att (80) <= 4.000 and Att (34) <= 0.000 | class: 2
The problem is that (class: 2) refers to the class index and not to the class value that will be given to the new instance.
How can I get the tree structure including the class value for each rule in that tree?
Best regards
Hello, Can anyone tell me how I can save the dynamic plot as animated gif or mp4 to share with a friend?
hi @jacobmontiel
i am really new to scikit-multilfow and i am using it for my thesis to detect concept drift in evolving streams. i am running into different hurdles, one which is the " 'FileStream' object is not subscriptable". kindly help.
i will also appreciate if there is a repository to learn about Scikit-multiflow.
thank you.
Screen Shot 2021-10-31 at 4.58.34 PM.png


from skmultiflow.data.file_stream import FileStream
from skmultiflow.drift_detection.adwin import ADWIN
import numpy as np
adwin = ADWIN()

Setup the stream

data_stream = FileStream("https://raw.githubusercontent.com/scikit-multiflow/"

Retrieving one sample

for i in range(4100):
if adwin.detected_change():
print('Change detected in data: ' + str(data_stream[i]) + ' - at index: ' + str(i))

the error:

the Error

TypeError Traceback (most recent call last)

<ipython-input-13-1a895cc5f79c> in <module>
1 for i in range(4100):
----> 2 adwin.add_element(data_stream[i])
3 if adwin.detected_change():
4 print('Change detected in data: ' + str(data_stream[i]) + ' - at index: ' + str(i))

TypeError: 'FileStream' object is not subscriptable

please @everyone, kindly help. thanks
Özge Ergün
Hi everyone, I am trying to work on a dataset which contains categorical attributes but FileStream does not accept non-numeric values. Any suggestions?
Mariam Benllarch
@ozgeergun scikit-multiflow only accepts numerical data
Özge Ergün
Hi everyone, I would like to know if I can use evaluate method everytime new data found. So that everytime it evaluates, it does incremental learning and adapting the existing model by using old and newknowledge. Or when it evaluates again, does it build a completely new model?
Özge Ergün

Here is the code below: ht = HoeffdingTreeClassifier()
df = pd.read_csv('/content/drive/My Drive/diabete/diabetes.csv', header=0)
X = df.iloc[:,:-1] #features
y = df.iloc[:,-1:] #labels

kf = KFold(n_splits=10)

for train_index, test_index in kf.split(X):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
stream= DataStream(data=X_train, y=y_train, target_idx=None,n_targets=None)

evaluator = EvaluatePrequential(show_plot=False,
pretrain_size=0, batch_size= 8, max_samples=len(X))

evaluator.evaluate(stream=stream, model=ht)

Armin Sadreddin
Hello everyone, I have two questions. 1- Regarding the perceptron mask, is it a deep learning model or only a single layer? Can we change the topology? 2- Can we use deep learning models in as the base estimator of the scikit-learn multifow learners? for example in Learn++ NSE method. Thank you so much
Liyan Song
Hi there, found a typo in the doc of "skmultiflow.data.RandomTreeGenerator" as the following text "The tree structure is composed on Node objects, which can be either inner nodes or leaf nodes. The choice comes as a function fo the parameters passed to its initializer." I think the "fo" was a typo, right?
JP de Vooght
Hi! I have a riverml quantile question - I am on 0.10.1 and when I compute 5; 15; 1; 3 I get 5 instead of 4 (here's a related GitHub issue online-ml/river#118 which made me think this case was handled)
Shailaja Jadhav
partial_fit() missing 1 required positional argument: 'self' pls guide, i am unable to see graph of prequential evaluation
carlos bahia
@jacobmontiel How are you doing ? I am trying to use scikit-multiflow with streaming data to predict values. I could not find an example where I could adapt to my needs. Do you have one to share ? Thank you so much
carlos bahia
@jdevoo How are you doing ? I am trying to use scikit-multiflow with streaming data to predict values. I could not find an example where I could adapt to my needs. Do you have one to share ? Thank you so much
6 replies
saad iftikhar
Hello sorry for the very basic question im a university student. im trying to use incremental learning on a decision tree classifier i have preprocessed data could anyone guide me on how to apply scikit multi flow to panda arrays or python lists?
Alex Cuof
Hi everyone, i'm new in this community and i want to apologize in advance for any errors i might make in asking the following question. So, i have to implement a classification task using scikit-multiflow for a big dataset (84 feature x 2,5 milion of exemples), processed like a stream. After many and many attempts my code finally run without warnings or errors but there is a problem: i am using the class Evaluate Prequential and its methods for the classification and, by setting adquate metrics to evaluate the goodness of this classification, i obtain very high values for each metric used. This is "strange" considering the dataset i am working on, reason why i want to generate the confusion matrix in order to understand on wich classes my classification algorithm works better and on wich classes it makes more misclassification. Generating confusion matrix is very easy using scikit-learn, but this method needs to have as input parameter true labels and predicted labels and here is the problem: i cannot isolate from Evaluate Prequential, in particular from the method "evaluate", predicted labels, consequently i have no way to generate the confusion matrix because i have not predicted labels to make a comparison with true labels. For sure there is trick to get around this problem but all of my attempts since two days failed and i have no more ideas on how i could do it. Please, do you have an idea on how to solve this problem? Thank you a lot.
Hi every one, I have a problem, when I want to show_plot in both hold out and prequential evaluation, nothing is plotted just the frame of the plot is shown, can anyone guide me please?
Naveen Kumar

Hi everyone! ,
I am new to this community and want to apologize if my concern is not directed to the correct team/group. I am working on drift detection using ADWIN. My concern is very specific to detected_change() method while understanding the code I observe that inner-most for loop "k" is iterated over range(cursor.bucket_size_row - 1). Since bucket_size_row indicates the number of elements present in the bucket and iteration of the inner-most loop in detected_change() we are skipping the last non-zero elements while comparing the absolute value with threshold epsilon. I might be wrong but wanted to understand the reasoning and related concept explained in the research paper. Please advise on the reasoning and logic of the detected_change() method.

Thanks in advance!

1 reply
Muhammad Saqib
Hi everyone, I just joined the community. Could someone kindly guide me on how to use ADWIN with ExtremelyFastDecisionTreeClassifier?
Hello everyone, I'm working on project where I use kafka and pyspark to create real time streams. I want to train a change detection model offline first and then use it to detect changes in real time. Can scikit-multiflow be used for this purpose? In the docs: https://scikit-multiflow.readthedocs.io/en/stable/api/generated/skmultiflow.drift_detection.ADWIN.html only the detection phase is mentioned so I'm wondering if it's possible to train the model.
I would appreciate any info about how to achieve those steps using skmultiflow.