Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Oct 05 03:41
    liyakun opened #929
  • Oct 04 06:34
    JurajSlivka commented #772
  • Sep 27 00:51
    codecov[bot] commented #928
  • Sep 27 00:50
    codecov[bot] commented #928
  • Sep 27 00:49
    codecov[bot] commented #928
  • Sep 27 00:47
    codecov[bot] commented #928
  • Sep 27 00:40
    HackUnit opened #928
  • Sep 21 17:10
    balvisio commented #927
  • Sep 21 16:49
    rjurney commented #927
  • Sep 21 16:00
    codecov[bot] commented #927
  • Sep 21 15:53
    codecov[bot] commented #927
  • Sep 21 15:52
    balvisio synchronize #927
  • Sep 21 15:52
    codecov[bot] commented #927
  • Sep 21 15:51
    balvisio synchronize #927
  • Sep 21 15:18
    balvisio edited #927
  • Sep 21 15:17
    balvisio commented #707
  • Sep 21 15:16
    balvisio commented #340
  • Sep 21 05:17
    codecov[bot] commented #927
  • Sep 21 05:15
    balvisio edited #927
  • Sep 21 05:15
    balvisio edited #927
Sana Shahir
@sanashahir
ImportError: cannot import name 'BalanceCascade'
how to solve this error???
roth-mh
@roth-mh
I thought ADASYN supported non-numeric data? I am not sure why I am getting a conversion error from string to float
am i wrong?
i have tried SMOTE-NC and it is too slow/memory intensive for my dataset
roth-mh
@roth-mh
seems like this is an example of someone using it for non-numeric data (alongside SMOTE-NC) https://towardsdatascience.com/imbalanced-class-sizes-and-classification-models-a-cautionary-tale-3648b8586e03
Anders
@swanderz
Henrique Voni
@henrique-voni
Hello, i'm testing RandomOverSampler with my dataset that has 1024 columns + 1 (label). When it fits my dataset and generate the oversampled data, it is automatically removing columns. My classes are numeric (1 to 5) and the removed columns are coincidentally [1,2,3,4,5].
Any help would be great. Thanks in advance.
Christos Aridas
@chkoar
Open a new issue with a minimum reproducible example in order to verify the behaviour.
Ilkin Bayramli
@ibayramli2001
Hi all! I just wanted to ask if BalancedRandomForestClassifier object resamples the test examples as well when the predict method is called. My guess is that it doesn't because there BalancedRandomForestClassifier does not have a predict method per se but inherits it from RandomForestClassifier which does not resample test examples (also, the prediction metrics like the precision and recall would be affected by it), but want to clarify it nevertheless.
joeltok
@joeltok

Hey all, Appreciating the great work that has been put in to make this library so easy to use.

Just a question: For SMOTE, when a sample is generated through oversampling, is there a way to link back to the original sample that was used to generate it (for example through some kind of id, or some fields that are guaranteed to be immutable by the over-sampler)?

Anushiya Thevapalan
@anushiya-thevapalan
Hi TypeError: __init__() got an unexpected keyword argument 'ratio I am getting this error when I simply execute the below code. sm = SMOTE(random_state=42, ratio=0.6). Any suggestions for what's going wrong?
Guillaume Lemaitre
@glemaitre
You should not use ratio anymore because it has been deprecated
and removed since version 0.5 I think
instead use sampling_strategy
You still have the compatibility with a float.
@joeltok We don't have this feature implemented
joeltok
@joeltok
Thanks for the reply. It is not a problem anymore, I cloned the repository and made some code changes to shoehorn the "feature" in. Are you guys looking at introducing this as a feature?
Anushiya Thevapalan
@anushiya-thevapalan
Thanks @glemaitre . It works
Guillaume Lemaitre
@glemaitre
@joeltok We might include a parameter which would store the indices of the 2 samples as an attribute and have it at False as a default.
The only thing is that it should consistent across all the SMOTE variants
joeltok
@joeltok
@glemaitre Thank you.
player1024
@player1024
Hi everyone, anyone knows how to extract just the new synthetic rows in a dataframe after running SMOTE, smote_tomek, etc ?
player1024
@player1024

Hey all, Appreciating the great work that has been put in to make this library so easy to use.

Just a question: For SMOTE, when a sample is generated through oversampling, is there a way to link back to the original sample that was used to generate it (for example through some kind of id, or some fields that are guaranteed to be immutable by the over-sampler)?

similarly to what I just asked above - I just saw this - again, how do we know which rows were actually generated by SMOTE and how can we extract them?

Guillaume Lemaitre
@glemaitre
On the last question, you cannot for the moment
For the first question
the new rows are concatenated at the end of the original X
so you could find these row by knowing the original size of X
player1024
@player1024
@glemaitre thank you , works like a charm
albattawi
@albattawi

Hello everyone, please i need help with TypeError: init() got an unexpected keyword argument 'random_state' when i try import imblearn , i install it by pip and conda, and using python 3.8 , >>> import imblearn

print(imblearn.version)
0.7.0

Christian Hacker
@christianhacker

Greetings. I'm working with some keras autoencoder models, and would like to use the imblearn keras batch generator with them. But imblearn samplers only work with targets that are either class labels or single-output continuous (regression); you get an error if you pass targets that are multi-output continuous. My datasets have class labels, but with autoencoders the targets are supposed to be the same input data that you pass to the model. A keras model function call would look like:

model.fit(X=X, y=X, ...)

The imblearn.keras batch generator can't do this. It doesn't seem like it would be too difficult, though; you still would need the class labels for the sampling strategy to work, but instead of passing the labels to the model, you just pass the input features as the targets as well.

Anyone have ideas on how to get this to work? Thank you.

thrylos2307
@thrylos2307
I am dealing with multiclass targeted values, i got the ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict, while implementing RandomOverSampler ,it mentions to use dict what kind of dictionary is it referring to, i mean what kind of key and values should be the dict consist of for this ?
Andrea Lorenzon
@andrealorenzon
the dict should be keys = classes, values = required # of samples
{1:100, 2:100}
György Kovács
@gykovacs
Hi All, cost-sensitive learning is a large portion of all the imbalanced learning material, I'm into cost-sensitive learning with instance dependent cost matrices, but couldn't find a single paper about it. Everyone supposes that the instance level cost matrices are given. Have you came across any papers or methods estimating instance level cost matrices?
Guillaume Lemaitre
@glemaitre
usually the user is defining the costs
this is usually linked to the application and thus this is not a parameter to be optimized
You can imagine the following with credit-card fraud detection
in which false positive and false negative will not have the same cost
but the cost could be defined as a real cost in dollars
linked to the business side
György Kovács
@gykovacs
Yep, this makes total sense. On the flipside, instance-level "complicatedness"/"hard-to-learn-ness" is usually appearing in oversampling techniques, which might serve as the basis of local costs. I found this so obvious that I just wondered if I am searching for wrong terms to find anything like this.
Also, I could imagine some local density estimation based costs. If the density is smaller, then the cost of misclassification should be higher.
In order to balance for differences in the densities of the classes, not only balancing the difference in the number of samples.
As global, class cardinality proportional weights do.
PanZiwei
@PanZiwei
Hi is it possible to get the element index so that I can know what data comes from the original dataset in the SMOTE upsampled dataset?

For reference I am using the SMOTE method for oversampling:

smoter = SMOTE(random_state=42, n_jobs=-1, sampling_strategy = 'not majority')

X_train_smote, y_train_smote = smoter.fit_resample(X_train, y_train)

To be more specific, I am wondering whether it is possible to know the index for X_train in the X_train_smote dataset.

Guillaume Lemaitre
@glemaitre
nop this is currently not possible
Jan Zyśko
@FrugoFruit90
Has anyone used imblearn SMOTE together with Pipeline and some explainability framework? e.g. shap, lime, eli5
My problem is that I try to explain my predictions, but for that I need to first transform the data to the state of "just before fitting the last estimator" (because the transformers in the Pipeline like custom vectorizers create new columns) before running the explainability package together with the last step - the estimator. For that, I can in principle use pipeline[:-1].transform(np.array(X_train)). However, I then get the error "AttributeError: 'SMOTE' object has no attribute 'transform'". I don't know how to proceed.