Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 19 18:28
    denisesato commented #306
  • Oct 19 17:50
    denisesato edited #306
  • Oct 19 17:49
    denisesato opened #306
  • Oct 07 14:36
    indialindsay opened #305
  • Sep 25 19:33
    Venoli edited #304
  • Sep 25 19:28
    Venoli edited #304
  • Sep 25 19:28
    Venoli edited #304
  • Sep 25 19:28
    Venoli edited #304
  • Sep 25 19:12
    Venoli edited #304
  • Sep 25 19:12
    Venoli edited #304
  • Sep 25 19:12
    Venoli edited #304
  • Sep 25 19:12
    Venoli edited #304
  • Sep 25 19:12
    Venoli edited #304
  • Sep 25 19:12
    Venoli opened #304
  • Sep 05 14:04
    CHIMAWAN001 commented #303
  • Sep 05 14:04
    CHIMAWAN001 closed #303
  • Aug 31 03:36
    CHIMAWAN001 commented #303
  • Aug 31 03:14
    CHIMAWAN001 edited #303
  • Aug 31 03:14
    CHIMAWAN001 edited #303
  • Aug 31 03:13
    CHIMAWAN001 opened #303
nuwangunasekara
@nuwangunasekara
Cool... thanks @jacobmontiel !
tlfields
@tlfields
Hello @jacobmontiel can you please help me to access random forest using sci-kit mulutiflow? I am trying to compare the performance of Random forest with and without ADWIN . I see the Adaptive Random Forest is already implemented but I dont see how to bring in a Random Forest. Please and Thank you
Jacob
@jacobmontiel
RandomForest is the batch version based on Decision Trees. AdaptiveRandomForest is the stream version based on Hoeffding Trees. AdaptiveRandomForest can be used with or without the drift detection. If you want to use AdaptiveRandomForest without drift detection you must initialize it as AdaptiveRandomForest(drift_detection_method=None)
Emanuel Rodrigues
@emanueldosreis_twitter
# Imports
from skmultiflow.anomaly_detection import HalfSpaceTrees
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(30, 3), columns=['x', 'y', 'z'])
# Access raw numpy array inside the dataframe
X_array = df.values
# Setup Half-Space Trees estimator
half_space_trees = HalfSpaceTrees(random_state=1, n_estimators=5) #, n_features=2)
# Pre-train the model with one sample
# the sample is a 1D array and we must pass a 2D array, thus np.asarray([X_array[0]])
half_space_trees.partial_fit(np.asarray([X_array[0]]), [0])

anomaly_cnt = 0
# Train the estimator(s) with the samples provided by the data stream
for X in X_array[1:]:
   y_pred = half_space_trees.predict([X])
   if y_pred[0] == 1:
       anomaly_cnt += 1
   half_space_trees = half_space_trees.partial_fit(np.asarray([X]), [0])
# Display results
print('Half-Space Trees anomalies detected: {}'.format(anomaly_cnt))
Thank you so much @jacobmontiel
asad1907
@asad1907
Hi everyone, i have a problem related with plot_show. I set plot_show = True but it doesn't work. What should i do? Do you have any idea?
Saulo Martiello Mastelini
@smastelini
Hi @asad1907, could you provide a MWE to help us figure out your problem?
asad1907
@asad1907
@smastelini
from skmultiflow.data.data_stream import DataStream
from skmultiflow.evaluation import EvaluatePrequential
from skmultiflow.trees import HoeffdingTree

stream  = DataStream(X_train, y = y_train)
stream.prepare_for_use()

ht = HoeffdingTree()
evaluator = EvaluatePrequential(show_plot=True,
                                pretrain_size=5000,
                                max_samples=20000,
                                metrics = ['accuracy', 'running_time','model_size'],
                                output_file='results.csv')

evaluator.evaluate(stream=stream, model=ht);
image.png
Just i got this
Saulo Martiello Mastelini
@smastelini

How many instances does your dataset have?

Did you try to decrease the pretrain_size to, let say, pretrain_size=200?

asad1907
@asad1907

@smastelini

X_train shape : (20631, 16)
y_train shape : (20631,)

@smastelini yes, i did it but it doesn't change
Saulo Martiello Mastelini
@smastelini
Are you using jupyter notebooks? You might need to change your matplotlib backend
asad1907
@asad1907
@smastelini I am using JupyterLab . I tried %matplotlib widget and then I got following problem
image.png
Saulo Martiello Mastelini
@smastelini
That's indeed strange. I am assuming that by setting show_plot=False your code runs normally (is it correct?). It seems that your problem is related to the matplotlib backend used in jupyter. Probably the solution is to set a proper backend for your interactive plot
tlfields
@tlfields
@jacobmontiel thank you so much. I am very new to scikit-multiflow, would you direct me to tutorials that have been compiled to explain how to compare the performance of algorithms?
tlfields
@tlfields
@jacobmontiel so I am trying to see the results of a Random Forest with no drift and and Adaptive random forest
tlfields
@tlfields
@jacobmontiel .. I think I figured it out, by taking your advise to set one of the Adaptive Random forest AdaptiveRandomForest(drift_detection_method=None). thank you
barnettjv
@barnettjv
@jacobmontiel Hi Jacob, is there a way that I can get access to the actual values predicted per data segment during the evaluations? I have 1 million SEAGen data points and need to perform McNemar's Statistical Significance formula which requires me to know which labels classifier A got incorrect vs classifier B.. etc. etc. As such I need to record the actual values predicted by each classifier.
barnettjv
@barnettjv
@jacobmontiel I'm assuming that I'll need to use the predict(X) fn, but honestly was hoping for a quick solution.
tlfields
@tlfields
@jacobmontiel how do we add LSTM and MLP deep learning algorithms to scikit-multiflow?
asad1907
@asad1907
@smastelini thanks a lot sir for your help. I solved it :)
barnettjv
@barnettjv
@automater0 I'm guessing the Kappa T stands for temporal. Bifet refers to it as Kper. see pg. 91 Bifet, A., Gavaldá, R., Holmes, G., & Pfahringer, B. (2017). Machine learning for data streams: with practical examples in MOA (Adaptive computation and machine learning series). MIT Press.
Jacob
@jacobmontiel

@smastelini thanks a lot sir for your help. I solved it :)

@asad1907 Can you share your solution? Support for dynamic plots in Jupyter Lab has not improved much since its release.

@jacobmontiel .. I think I figured it out, by taking your advise to set one of the Adaptive Random forest AdaptiveRandomForest(drift_detection_method=None). thank you

Glad to help.

@jacobmontiel Hi Jacob, is there a way that I can get access to the actual values predicted per data segment during the evaluations? I have 1 million SEAGen data points and need to perform McNemar's Statistical Significance formula which requires me to know which labels classifier A got incorrect vs classifier B.. etc. etc. As such I need to record the actual values predicted by each classifier.

If you are using an evaluator you can add true_vs_predicted to metrics to get predicted values. In this case you also need to set n_wait=1. As a suggestion, in this case deactivate the plot as n_wait=1 implies a high refresh rate in the plot which is a lot of overhead.

@automater0 I'm guessing the Kappa T stands for temporal. Bifet refers to it as Kper. see pg. 91 Bifet, A., Gavaldá, R., Holmes, G., & Pfahringer, B. (2017). Machine learning for data streams: with practical examples in MOA (Adaptive computation and machine learning series). MIT Press.

That is correct.

Jacob
@jacobmontiel

@jacobmontiel how do we add LSTM and MLP deep learning algorithms to scikit-multiflow?

those are open questions still, since those methods are usually trained on batches

@tlfields scikit-multiflow does not include any implementation (yet). If for your use-case using batch-incremental instead of instance-incremental learning is fine, the you could do something similar to the BatchIncremental model. This is a simple class to show how you can do batch-incremental learning using batch methods from scikit-learn. But you are not restricted to models from that library.
tlfields
@tlfields
@jacobmontiel thank you so much for your response
barnettjv
@barnettjv
Jacob, I added the the 'true_vs_predicted' and set the pretrain to 50 on a data set of 200, along with n_wait=1 and aren't getting any predicted values.
I'm just getting the Accuracy, which is the only other metric I'm sending.
oh never mind. figured it out :D
asad1907
@asad1907
@barnettjv @barnettjv You can see true and predictive values in results.csv. Using true_vs_predicted in metrics and output_file='results.csv'
asad1907
@asad1907
@jacobmontiel I have solved that on Jupiter Notebook using %matplotlib notebook. Now I am trying to use on JupiterLab. If I can, i will share gladly
Jacob
@jacobmontiel

@jacobmontiel I have solved that on Jupiter Notebook using %matplotlib notebook. Now I am trying to use on JupiterLab. If I can, i will share gladly

Thanks for letting us know

tlfields
@tlfields
@jacobmontiel thank you for the video from anaconda con. I have watched it several times and I am learning so much from you. I have ran the notebok you provided and I have a question as to how to use the page-hinkley or the other drift detectors using the "agr_a_20k.csv" instead of the stream dataset. Is this somehting you can help me with? I want to see which of the detectors pick up the drift in specifically in the agr_a_20k.csv
tlfields
@tlfields
@jacobmontiel .... I think I got it to work... I am so HAPPY!!
Jacob
@jacobmontiel

@jacobmontiel .... I think I got it to work... I am so HAPPY!!

Glad to hear that

Just as a comment: be careful when testing different drift detectors, for example DDM and EDDM expect input data (error) encoded in the oposite way to ADWIN
tlfields
@tlfields
@jacobmontiel when I ran the PageHinkly on the agr_a_20k.csv, it picked up only two drifts one at index 5165 and the other 15408, It did not pick up any drift at the 1000-1100 range. The adwin picked up 5 drifts between index 5535-5855, it picked picked up 9 drifts in the index range from 10463-11007 and it picked up up 8 in the range of 15679-17407.. so that tells me from this particular dataset, the adwin was a bit more sensitive to the drift. for some reason I did not get the inpurt data error that you mentioned.
tlfields
@tlfields
@jacobmontiel , in reference to the the agr_a_20k.csv, did you add the drifts at certain points or was it all ready there? I am trying to figure out how to find the actual points where drift was inserted. thank you
Jacob
@jacobmontiel
Those are 3 synthetic abrupt drifts, every 7500 samples
tlfields
@tlfields
@jacobmontiel thank you Sir!
Santhosh Sahini
@santoshsahini19
Hi, When I’m implementing “from skmultiflow.data import AnomalySineGenerator”, there is an error popping up saying cannot import name 'AnomalySineGenerator' from 'skmultiflow.data'. Is there any way where I can resolve this issue. Thanks!
Jacob
@jacobmontiel
AnomalySineGenerator is only available in the development version. You must install it from GitHub
$ pip install -U git+https://github.com/scikit-multiflow/scikit-multiflow
Santhosh Sahini
@santoshsahini19
Thank you so much @jacobmontiel
Jacob
@jacobmontiel

Those are 3 synthetic abrupt drifts, every 7500 samples

@tlfields , I was reviewing this and drifts are actually placed every 5000 samples