Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 05 07:19
    greywolfbrillio edited #311
  • Sep 05 06:48
    greywolfbrillio opened #311
  • Aug 10 20:35
    lsgotti commented #307
  • Aug 10 05:28
    ananya2711 commented #307
  • Jun 22 02:12
    dependabot[bot] labeled #310
  • Jun 22 02:12
    dependabot[bot] opened #310
  • Jun 08 19:44
    denisesato edited #309
  • Mar 25 21:11
    denisesato opened #309
  • Feb 22 03:34
    tigerinus opened #308
  • Feb 09 15:35
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Nov 17 2021 23:03
    lsgotti opened #307
  • Oct 28 2021 14:19
    lambertsbennett edited #268
  • Oct 28 2021 14:19
    lambertsbennett edited #268
  • Oct 19 2021 18:28
    denisesato commented #306
  • Oct 19 2021 17:50
    denisesato edited #306
  • Oct 19 2021 17:49
    denisesato opened #306
Jacob
@jacobmontiel

@jacobmontiel I actually have 8 other sources of data (Hyperplane, LED, Electricity, etc.) So here are the results of the noise0% vs noise 20% with ARF......

:+1:

Those results are interesting
notice also that noisy data results in longer training time and larger models (in memory) which is expected :-)
barnettjv
@barnettjv
image.png
image.png
@jacobmontiel sorry, not entirely used to Gitter
Jacob
@jacobmontiel
looks good in my opinion, although I would consider intermediate noise leves 5,10,20 to potentially show the impact in performance (decrement)
although that would depend in. the amount of resources you have and the time to run all the experiments
barnettjv
@barnettjv
@jacobmontiel Thank you. I think your suggestion is a good one. I'll try and get them in and will post on here the results.
Jacob
@jacobmontiel
Nice :+1:
barnettjv
@barnettjv
@jacobmontiel Hi Jacob, it appears that the accuracy is reduced by nearly the exact amount of the noise (i.e. 5% noise leads to ~95% accuracy, 10% equates to ~90% accuracy and so on). Does this seem right? Also can you help me to understand better as to how exactly the evaluator uses the stream? I know the docs so that they serve two purpose (i.e. test and train). but how is it testing? and then how is it training?
Jacob
@jacobmontiel

it appears that the accuracy is reduced by nearly the exact amount of the noise (i.e. 5% noise leads to ~95% accuracy, 10% equates to ~90% accuracy and so on)

This might change depending on the estimator used, other than that seems reasonable (I assume noise is generated from a normal distribution)

but how is it testing? and then how is it training?

The data from the strem is first used to get a prediction (test(X)) and the predicted value is compraed against the true value to estimate track the performance of the estimator. Then the same sample is used to train the estimator (train(X, y))

we must perform the test before using the data for training :-)
barnettjv
@barnettjv
@jacobmontiel The noise parameter was used with the SEAGenerator when building the stream. You mention "This might change depending on the estimator used". I didn't realize that we had a choice of picking estimators. I thought is was done internal to the EvaluatePrequential class? Also, wrt the Testing/Training time metrics in my post a couple of days earlier, the EvaluatePrequential class performed the test first taking 1177.81 seconds and then trained for 14063.29 seconds. Do I have this correct? After training, can I send it another stream to be processed using the "trained" classifier? So many questions, my apologies for having this many. It's just that my Dissertation is due in a few weeks....
Jacob
@jacobmontiel
We you run evaluator.evaluate() you can pass the estimator (classifier) you wan to use, it could be ARF, Naive Bayes, Hoeffding Tree, etc. As mentioned earlier, EvaluatePrequential is just in charge of managing the flow of data and the order for the test and train steps.
Yes, the times are correct, as expected training takes more time since it is updating the model

After training, can I send it another stream to be processed using the "trained" classifier?

Yes, there is a way to run another evaluation task with a previously trained model. You must first make sure to set the parameter restart_stream=False in EvaluatePrequential. This way the firsst evaluate call will train the model without restarting it at the end. If you call again evaluate with a new stream and the same model, the model will continue learning.

Jacob
@jacobmontiel
In this case you must use different streams. Using the same stream is incorrect since the model has already “seen” and learn from that data.
barnettjv
@barnettjv
@jacobmontiel Jacob something looks odd with my results. I wasn't expecting to see training time for the unseen stream evaluation run.... here is code.
ARFnoise20.png
result.png
@barnettjv and here is the result...
@jacobmontiel My intention was not to have the classifier retrain on the second stream. Did I code this correctly?
barnettjv
@barnettjv
@jacobmontiel Jacob, if I don't want the model to continue learning, should I set restart_stream=True or False?
barnettjv
@barnettjv
@jacobmontiel I noticed that regardless of whether I restart_stream=False or restart_stream=True, both evals show training time.
barnettjv
@barnettjv
@jacobmontiel it just dawned on me that perhaps all of your classifiers continuously learn by their very design? Is this the same with the classifiers with MOA?
laoheico3
@laoheico3
@jacobmontiel Hi Jacob, I had another problem. I wanted to predict all the target values in 20 time steps after time t, and when I used EvaluatePrequentia, the parameters in it didn't seem to do the job.I can only do this by changing the source code? I hope you can give me some help with this problem. Thank you
Jacob
@jacobmontiel
@barnettjv Sorry for the delay. The evaluators will always perform training, regardless if the model is new or has been pre-trained. There is no mechanism to disable the training phase.

@jacobmontiel Jacob, if I don't want the model to continue learning, should I set restart_stream=True or False?

This parameter is not intended to be used like that. This parameter indicates if the model should be re-started (True) or not (False) at the end of the evaluation. If restart_stream=True it means that after the evaluation the model instance will remain in the last status from the evaluation. You can either continue treaining or use it only to get predictions. That is up to you to define (and code). However, as mentioned earlier, EvaluatePrequential always performs both tessting and training.

Jacob
@jacobmontiel

@jacobmontiel Hi Jacob, I had another problem. I wanted to predict all the target values in 20 time steps after time t, and when I used EvaluatePrequentia, the parameters in it didn't seem to do the job.I can only do this by changing the source code? I hope you can give me some help with this problem. Thank you

This is not currently supported by the EvaluatePrequential, we are working on a new feature for this case, but it might take some time until it is available. In the meantime the best option is, as you mention, to manually implement it. You can take a look into PR #222 for reference :-)

barnettjv
@barnettjv
@jacobmontiel Hi again jacob, i've made a lot of progress since my last post. Quick question (hopefully) regarding the VFDT, ARF, DWM, LevBag classifiers, the default training is immediate right? Is there a way to put in a delay for the classifier to get access to the label during training?
Jacob
@jacobmontiel
not in the release version, we are currently working in the deveopment of such functionality
nuwangunasekara
@nuwangunasekara
Hi guys,
can someone please explain me what is meant by 'model_size' in skmultiflow.evaluation.EvaluatePrequential() 'metrics' parameter?
Is it something similar to MOA, model cost (RAM-Hours)?
Jacob
@jacobmontiel
Yes in the sense that it is a way to track the amount of memory used
In our case it refers to the size of a model in memory
nuwangunasekara
@nuwangunasekara
cool! Thanks @jacobmontiel !
barnettjv
@barnettjv
@jacobmontiel Jacob I'm getting a array warning/error when using hold out.... is this normal?
Separating 5000 holdout samples.
^M #################### [100%] [22165.92s]
Separating 5000 holdout samples.
Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a
single sample.
Processed samples: 1000000
Mean performance:
AdaptiveRandomForest-Holdout - Accuracy : 0.7894
AdaptiveRandomForest-Holdout - Training time (s) : 20975.01
AdaptiveRandomForest-Holdout - Testing time (s) : 278.86
AdaptiveRandomForest-Holdout - Total time (s) : 21253.87
AdaptiveRandomForest-Holdout - Size (kB) : 445215.2520
Jacob
@jacobmontiel
I am not familiar with that error message, perhaps is something external? Can you provide a MWE?
barnettjv
@barnettjv
so i am reading the data from a csv file that was generated with my testwrite.py script. and read in using ARFClassifier.py script.
data file link above
barnettjv
@barnettjv
It appears that the issue is either how SEAWrite is saving the stream to the csv file or how ARFClassifier is reading from the file and assigning to the stream for the classifier. What is interesting is that I wrote a MWE that took out the file i/o portion which does not give me the warning, but I was shocked to see that both versions (file i/o vs direct) give the same accuracy results. Also the file i/o version seems to be faster. Would you be able to take a look at it and see what I'm doing wrong?
barnettjv
@barnettjv
The example above is using the VFDTClassifier, no difference between it and ARFClassifier aside from which classifier is being used.