Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Aug 10 20:35
    lsgotti commented #307
  • Aug 10 05:28
    ananya2711 commented #307
  • Jun 22 02:12
    dependabot[bot] labeled #310
  • Jun 22 02:12
    dependabot[bot] opened #310
  • Jun 08 19:44
    denisesato edited #309
  • Mar 25 21:11
    denisesato opened #309
  • Feb 22 03:34
    tigerinus opened #308
  • Feb 09 15:35
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Feb 09 15:34
    lambertsbennett edited #268
  • Nov 17 2021 23:03
    lsgotti opened #307
  • Oct 28 2021 14:19
    lambertsbennett edited #268
  • Oct 28 2021 14:19
    lambertsbennett edited #268
  • Oct 19 2021 18:28
    denisesato commented #306
  • Oct 19 2021 17:50
    denisesato edited #306
  • Oct 19 2021 17:49
    denisesato opened #306
  • Oct 07 2021 14:36
    indialindsay opened #305
  • Sep 25 2021 19:33
    Venoli edited #304
Jacob
@jacobmontiel
Yes, the times are correct, as expected training takes more time since it is updating the model

After training, can I send it another stream to be processed using the "trained" classifier?

Yes, there is a way to run another evaluation task with a previously trained model. You must first make sure to set the parameter restart_stream=False in EvaluatePrequential. This way the firsst evaluate call will train the model without restarting it at the end. If you call again evaluate with a new stream and the same model, the model will continue learning.

Jacob
@jacobmontiel
In this case you must use different streams. Using the same stream is incorrect since the model has already “seen” and learn from that data.
barnettjv
@barnettjv
@jacobmontiel Jacob something looks odd with my results. I wasn't expecting to see training time for the unseen stream evaluation run.... here is code.
ARFnoise20.png
result.png
@barnettjv and here is the result...
@jacobmontiel My intention was not to have the classifier retrain on the second stream. Did I code this correctly?
barnettjv
@barnettjv
@jacobmontiel Jacob, if I don't want the model to continue learning, should I set restart_stream=True or False?
barnettjv
@barnettjv
@jacobmontiel I noticed that regardless of whether I restart_stream=False or restart_stream=True, both evals show training time.
barnettjv
@barnettjv
@jacobmontiel it just dawned on me that perhaps all of your classifiers continuously learn by their very design? Is this the same with the classifiers with MOA?
laoheico3
@laoheico3
@jacobmontiel Hi Jacob, I had another problem. I wanted to predict all the target values in 20 time steps after time t, and when I used EvaluatePrequentia, the parameters in it didn't seem to do the job.I can only do this by changing the source code? I hope you can give me some help with this problem. Thank you
Jacob
@jacobmontiel
@barnettjv Sorry for the delay. The evaluators will always perform training, regardless if the model is new or has been pre-trained. There is no mechanism to disable the training phase.

@jacobmontiel Jacob, if I don't want the model to continue learning, should I set restart_stream=True or False?

This parameter is not intended to be used like that. This parameter indicates if the model should be re-started (True) or not (False) at the end of the evaluation. If restart_stream=True it means that after the evaluation the model instance will remain in the last status from the evaluation. You can either continue treaining or use it only to get predictions. That is up to you to define (and code). However, as mentioned earlier, EvaluatePrequential always performs both tessting and training.

Jacob
@jacobmontiel

@jacobmontiel Hi Jacob, I had another problem. I wanted to predict all the target values in 20 time steps after time t, and when I used EvaluatePrequentia, the parameters in it didn't seem to do the job.I can only do this by changing the source code? I hope you can give me some help with this problem. Thank you

This is not currently supported by the EvaluatePrequential, we are working on a new feature for this case, but it might take some time until it is available. In the meantime the best option is, as you mention, to manually implement it. You can take a look into PR #222 for reference :-)

barnettjv
@barnettjv
@jacobmontiel Hi again jacob, i've made a lot of progress since my last post. Quick question (hopefully) regarding the VFDT, ARF, DWM, LevBag classifiers, the default training is immediate right? Is there a way to put in a delay for the classifier to get access to the label during training?
Jacob
@jacobmontiel
not in the release version, we are currently working in the deveopment of such functionality
nuwangunasekara
@nuwangunasekara
Hi guys,
can someone please explain me what is meant by 'model_size' in skmultiflow.evaluation.EvaluatePrequential() 'metrics' parameter?
Is it something similar to MOA, model cost (RAM-Hours)?
Jacob
@jacobmontiel
Yes in the sense that it is a way to track the amount of memory used
In our case it refers to the size of a model in memory
nuwangunasekara
@nuwangunasekara
cool! Thanks @jacobmontiel !
barnettjv
@barnettjv
@jacobmontiel Jacob I'm getting a array warning/error when using hold out.... is this normal?
Separating 5000 holdout samples.
^M #################### [100%] [22165.92s]
Separating 5000 holdout samples.
Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a
single sample.
Processed samples: 1000000
Mean performance:
AdaptiveRandomForest-Holdout - Accuracy : 0.7894
AdaptiveRandomForest-Holdout - Training time (s) : 20975.01
AdaptiveRandomForest-Holdout - Testing time (s) : 278.86
AdaptiveRandomForest-Holdout - Total time (s) : 21253.87
AdaptiveRandomForest-Holdout - Size (kB) : 445215.2520
Jacob
@jacobmontiel
I am not familiar with that error message, perhaps is something external? Can you provide a MWE?
barnettjv
@barnettjv
so i am reading the data from a csv file that was generated with my testwrite.py script. and read in using ARFClassifier.py script.
data file link above
barnettjv
@barnettjv
It appears that the issue is either how SEAWrite is saving the stream to the csv file or how ARFClassifier is reading from the file and assigning to the stream for the classifier. What is interesting is that I wrote a MWE that took out the file i/o portion which does not give me the warning, but I was shocked to see that both versions (file i/o vs direct) give the same accuracy results. Also the file i/o version seems to be faster. Would you be able to take a look at it and see what I'm doing wrong?
barnettjv
@barnettjv
The example above is using the VFDTClassifier, no difference between it and ARFClassifier aside from which classifier is being used.
barnettjv
@barnettjv
After much debugging, it appears that the extra line at the end of the csv file (required by Posix) is throwing EvalHoldout off at the end.
does not affect the result
Jacob
@jacobmontiel
Hi @barnettjv
thanks for the examples and further analysis
I agree that results seem to be fine, the warning seems to be coming from a mishandled corner case
sorry for the delay in my answer, while I was reviewing the code I actually found a bug that impacts (as long as I can see) some variants of the ARF. I will create an issue with a clear explanation.
barnettjv
@barnettjv
Thank you Jacob. I'm happy to be of service.
barnettjv
@barnettjv
@jacobmontiel Hi Jacob, I was wondering (hoping actually) that there is a way for me to use my graphics cards to speed up the data processing of the scikit multiflow functions (i.e. evaluatePrequential). numbapro? cuda?
Jacob
@jacobmontiel
Unfortunately, that is not possible. Most stream algorithms are sequential in nature, which makes it very challenging to parallelize. An alternative is to launch multiple jobs in parallel, depending on the amount of resources you have
barnettjv
@barnettjv
I see. Well my machine has 512GB, 2 CPUs. To run in parallel I've been using the & for each python job and have been getting about 40% cpu load but 180F temp. Is there a better way?
Jacob
@jacobmontiel
Not that I am aware of (any suggestion on this is welcomed). What you describe is what we usually do in our servers
barnettjv
@barnettjv
After I finish my dissertation report in May, I'm planning on exploring GPU parallelization possibilities with scikit. I think I've just about maxed out the 72 cores on my server with the current scikit architecture. My nieve thought is that perhaps NVidia or Radeon have parallelization libraries that can be easily imported into my python projects (based on scikit).
Jacob
@jacobmontiel
That is a very interesting topic and would be really nice to see how it can be applied to stream learning. I am going to drop here a reference to https://dask.org/ which seems promissing but we have not enough resources to explore at the moment.
Jiao Yin
@JoanYinCQ
New to skmultiflow. When I load the 'covtype.csv' using filestream and after stream.prepare_for_use(), it was interpreted as a regression problem, with y=1.0/2.0/3.0/4.0/5.0. But it actually a multi-classification problem, and y=1/2/3/4/5 in the 'covtype.csv' . How can I make the filestream interpret the data in a right way?
Jacob
@jacobmontiel
Hi @JoanYinCQ. I am not able to reproduce this error, can you share a MWE?
This is what I used:
from skmultiflow.data import FileStream
stream = FileStream("./src/skmultiflow/data/datasets/covtype.csv")
stream.prepare_for_use()
stream.n_classes    # Output: 7
stream.target_values    # Output: [1, 2, 3, 4, 5, 6, 7]