Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 05 14:04
    CHIMAWAN001 commented #303
  • Sep 05 14:04
    CHIMAWAN001 closed #303
  • Aug 31 03:36
    CHIMAWAN001 commented #303
  • Aug 31 03:14
    CHIMAWAN001 edited #303
  • Aug 31 03:14
    CHIMAWAN001 edited #303
  • Aug 31 03:13
    CHIMAWAN001 opened #303
  • Aug 26 07:27
    rabitwhte opened #302
  • Jun 09 00:12
    yuritpinheiro closed #301
  • Jun 09 00:12
    yuritpinheiro commented #301
  • Jun 08 23:53
    yuritpinheiro opened #301
  • May 17 10:07
    binzhang-u5f6c opened #300
  • Apr 09 17:15
    odmarkj opened #299
  • Apr 05 11:30
    asmaafawzy25 reopened #298
  • Apr 05 11:30
    asmaafawzy25 closed #298
  • Mar 31 14:24
    asmaafawzy25 opened #298
  • Mar 08 10:06
    Linfengscat opened #297
  • Mar 05 20:31
    ginop commented #281
  • Mar 02 01:14
    gilbertoolimpio closed #296
  • Mar 02 00:45
    gilbertoolimpio edited #296
  • Mar 01 21:27
    gilbertoolimpio opened #296
asad1907
@asad1907
@smastelini
from skmultiflow.data.data_stream import DataStream
from skmultiflow.evaluation import EvaluatePrequential
from skmultiflow.trees import HoeffdingTree

stream  = DataStream(X_train, y = y_train)
stream.prepare_for_use()

ht = HoeffdingTree()
evaluator = EvaluatePrequential(show_plot=True,
                                pretrain_size=5000,
                                max_samples=20000,
                                metrics = ['accuracy', 'running_time','model_size'],
                                output_file='results.csv')

evaluator.evaluate(stream=stream, model=ht);
image.png
Just i got this
Saulo Martiello Mastelini
@smastelini

How many instances does your dataset have?

Did you try to decrease the pretrain_size to, let say, pretrain_size=200?

asad1907
@asad1907

@smastelini

X_train shape : (20631, 16)
y_train shape : (20631,)

@smastelini yes, i did it but it doesn't change
Saulo Martiello Mastelini
@smastelini
Are you using jupyter notebooks? You might need to change your matplotlib backend
asad1907
@asad1907
@smastelini I am using JupyterLab . I tried %matplotlib widget and then I got following problem
image.png
Saulo Martiello Mastelini
@smastelini
That's indeed strange. I am assuming that by setting show_plot=False your code runs normally (is it correct?). It seems that your problem is related to the matplotlib backend used in jupyter. Probably the solution is to set a proper backend for your interactive plot
tlfields
@tlfields
@jacobmontiel thank you so much. I am very new to scikit-multiflow, would you direct me to tutorials that have been compiled to explain how to compare the performance of algorithms?
tlfields
@tlfields
@jacobmontiel so I am trying to see the results of a Random Forest with no drift and and Adaptive random forest
tlfields
@tlfields
@jacobmontiel .. I think I figured it out, by taking your advise to set one of the Adaptive Random forest AdaptiveRandomForest(drift_detection_method=None). thank you
barnettjv
@barnettjv
@jacobmontiel Hi Jacob, is there a way that I can get access to the actual values predicted per data segment during the evaluations? I have 1 million SEAGen data points and need to perform McNemar's Statistical Significance formula which requires me to know which labels classifier A got incorrect vs classifier B.. etc. etc. As such I need to record the actual values predicted by each classifier.
barnettjv
@barnettjv
@jacobmontiel I'm assuming that I'll need to use the predict(X) fn, but honestly was hoping for a quick solution.
tlfields
@tlfields
@jacobmontiel how do we add LSTM and MLP deep learning algorithms to scikit-multiflow?
asad1907
@asad1907
@smastelini thanks a lot sir for your help. I solved it :)
barnettjv
@barnettjv
@automater0 I'm guessing the Kappa T stands for temporal. Bifet refers to it as Kper. see pg. 91 Bifet, A., Gavaldá, R., Holmes, G., & Pfahringer, B. (2017). Machine learning for data streams: with practical examples in MOA (Adaptive computation and machine learning series). MIT Press.
Jacob
@jacobmontiel

@smastelini thanks a lot sir for your help. I solved it :)

@asad1907 Can you share your solution? Support for dynamic plots in Jupyter Lab has not improved much since its release.

@jacobmontiel .. I think I figured it out, by taking your advise to set one of the Adaptive Random forest AdaptiveRandomForest(drift_detection_method=None). thank you

Glad to help.

@jacobmontiel Hi Jacob, is there a way that I can get access to the actual values predicted per data segment during the evaluations? I have 1 million SEAGen data points and need to perform McNemar's Statistical Significance formula which requires me to know which labels classifier A got incorrect vs classifier B.. etc. etc. As such I need to record the actual values predicted by each classifier.

If you are using an evaluator you can add true_vs_predicted to metrics to get predicted values. In this case you also need to set n_wait=1. As a suggestion, in this case deactivate the plot as n_wait=1 implies a high refresh rate in the plot which is a lot of overhead.

@automater0 I'm guessing the Kappa T stands for temporal. Bifet refers to it as Kper. see pg. 91 Bifet, A., Gavaldá, R., Holmes, G., & Pfahringer, B. (2017). Machine learning for data streams: with practical examples in MOA (Adaptive computation and machine learning series). MIT Press.

That is correct.

Jacob
@jacobmontiel

@jacobmontiel how do we add LSTM and MLP deep learning algorithms to scikit-multiflow?

those are open questions still, since those methods are usually trained on batches

@tlfields scikit-multiflow does not include any implementation (yet). If for your use-case using batch-incremental instead of instance-incremental learning is fine, the you could do something similar to the BatchIncremental model. This is a simple class to show how you can do batch-incremental learning using batch methods from scikit-learn. But you are not restricted to models from that library.
tlfields
@tlfields
@jacobmontiel thank you so much for your response
barnettjv
@barnettjv
Jacob, I added the the 'true_vs_predicted' and set the pretrain to 50 on a data set of 200, along with n_wait=1 and aren't getting any predicted values.
I'm just getting the Accuracy, which is the only other metric I'm sending.
oh never mind. figured it out :D
asad1907
@asad1907
@barnettjv @barnettjv You can see true and predictive values in results.csv. Using true_vs_predicted in metrics and output_file='results.csv'
asad1907
@asad1907
@jacobmontiel I have solved that on Jupiter Notebook using %matplotlib notebook. Now I am trying to use on JupiterLab. If I can, i will share gladly
Jacob
@jacobmontiel

@jacobmontiel I have solved that on Jupiter Notebook using %matplotlib notebook. Now I am trying to use on JupiterLab. If I can, i will share gladly

Thanks for letting us know

tlfields
@tlfields
@jacobmontiel thank you for the video from anaconda con. I have watched it several times and I am learning so much from you. I have ran the notebok you provided and I have a question as to how to use the page-hinkley or the other drift detectors using the "agr_a_20k.csv" instead of the stream dataset. Is this somehting you can help me with? I want to see which of the detectors pick up the drift in specifically in the agr_a_20k.csv
tlfields
@tlfields
@jacobmontiel .... I think I got it to work... I am so HAPPY!!
Jacob
@jacobmontiel

@jacobmontiel .... I think I got it to work... I am so HAPPY!!

Glad to hear that

Just as a comment: be careful when testing different drift detectors, for example DDM and EDDM expect input data (error) encoded in the oposite way to ADWIN
tlfields
@tlfields
@jacobmontiel when I ran the PageHinkly on the agr_a_20k.csv, it picked up only two drifts one at index 5165 and the other 15408, It did not pick up any drift at the 1000-1100 range. The adwin picked up 5 drifts between index 5535-5855, it picked picked up 9 drifts in the index range from 10463-11007 and it picked up up 8 in the range of 15679-17407.. so that tells me from this particular dataset, the adwin was a bit more sensitive to the drift. for some reason I did not get the inpurt data error that you mentioned.
tlfields
@tlfields
@jacobmontiel , in reference to the the agr_a_20k.csv, did you add the drifts at certain points or was it all ready there? I am trying to figure out how to find the actual points where drift was inserted. thank you
Jacob
@jacobmontiel
Those are 3 synthetic abrupt drifts, every 7500 samples
tlfields
@tlfields
@jacobmontiel thank you Sir!
Santhosh Sahini
@santoshsahini19
Hi, When I’m implementing “from skmultiflow.data import AnomalySineGenerator”, there is an error popping up saying cannot import name 'AnomalySineGenerator' from 'skmultiflow.data'. Is there any way where I can resolve this issue. Thanks!
Jacob
@jacobmontiel
AnomalySineGenerator is only available in the development version. You must install it from GitHub
$ pip install -U git+https://github.com/scikit-multiflow/scikit-multiflow
Santhosh Sahini
@santoshsahini19
Thank you so much @jacobmontiel
Jacob
@jacobmontiel

Those are 3 synthetic abrupt drifts, every 7500 samples

@tlfields , I was reviewing this and drifts are actually placed every 5000 samples

sorry about that
tlfields
@tlfields
@jacobmontiel thank you for clarifying, when I looked at your demo, it did say 5, 10 and 15K. Sir, can you please help me to understand how to add different amounts (magnitudes) of drifts to the agr_a_20 ? Additionally how would I add it so that it is considers, gradual drift rather than abrupt drift? Has there been a guide created to do this?
Juan Cardona
@Juancard

Hi everyone, i have a problem related with plot_show. I set plot_show = True but it doesn't work. What should i do? Do you have any idea?

I have the same issue in Codelab, and could not solve it using %matplotlib inline nor %matplotlib notebook. Anyone tried this in codelab before?

Juan Cardona
@Juancard
# -*- coding: utf-8 -*-

!pip install scikit-multiflow

%matplotlib inline
from skmultiflow.data import WaveformGenerator
from skmultiflow.trees import HoeffdingTree
from skmultiflow.evaluation import EvaluatePrequential

# 1. Create a stream
stream = WaveformGenerator()
stream.prepare_for_use()

# 2. Instantiate the HoeffdingTree classifier
ht = HoeffdingTree()

# 3. Setup the evaluator
evaluator = EvaluatePrequential(show_plot=True,
                                pretrain_size=200,
                                max_samples=20000)

# 4. Run evaluation
evaluator.evaluate(stream=stream, model=ht)
Santhosh Sahini
@santoshsahini19
Hi everyone, Does anyone have idea about how the structure of the data (in .csv file) should be while using 'skmultiflow.data.file_stream module' to perform HSTrees for anomaly detection? Thanks!