Hey all, Appreciating the great work that has been put in to make this library so easy to use.
Just a question: For SMOTE, when a sample is generated through oversampling, is there a way to link back to the original sample that was used to generate it (for example through some kind of id, or some fields that are guaranteed to be immutable by the over-sampler)?
similarly to what I just asked above - I just saw this - again, how do we know which rows were actually generated by SMOTE and how can we extract them?
Greetings. I'm working with some keras autoencoder models, and would like to use the imblearn keras batch generator with them. But imblearn samplers only work with targets that are either class labels or single-output continuous (regression); you get an error if you pass targets that are multi-output continuous. My datasets have class labels, but with autoencoders the targets are supposed to be the same input data that you pass to the model. A
keras model function call would look like:
model.fit(X=X, y=X, ...)
imblearn.keras batch generator can't do this. It doesn't seem like it would be too difficult, though; you still would need the class labels for the sampling strategy to work, but instead of passing the labels to the model, you just pass the input features as the targets as well.
Anyone have ideas on how to get this to work? Thank you.
For reference I am using the SMOTE method for oversampling:
smoter = SMOTE(random_state=42, n_jobs=-1, sampling_strategy = 'not majority') X_train_smote, y_train_smote = smoter.fit_resample(X_train, y_train)
To be more specific, I am wondering whether it is possible to know the index for X_train in the X_train_smote dataset.
pipeline[:-1].transform(np.array(X_train)). However, I then get the error "AttributeError: 'SMOTE' object has no attribute 'transform'". I don't know how to proceed.
import numpy as np from imblearn.over_sampling import RandomOverSampler from imblearn.over_sampling import SMOTE x = np.array([['aaa'] * 100, ['bbb'] * 100]).T y = np.array( * 10 +  * 90) ros = RandomOverSampler() x_res, y_res = ros.fit_sample(x, y) smote = SMOTE() x_res, y_res = smote.fit_sample(x, y)
Hello team. I have a question about Borderline SMOTE:
The variant 2 is supposed to interpolate between the minority in danger and other neighbors from the minority, and the minority in danger and some neighbors from the majority.
In line https://github.com/scikit-learn-contrib/imbalanced-learn/blob/4162d2d/imblearn/over_sampling/_smote.py#L352
we train a KNN only on the minority class and then derive the neighbors nns from it, which we use for the interpolation.
Then we use that nns to obtain the neighbors from the majority class in the second part (https://github.com/scikit-learn-contrib/imbalanced-learn/blob/4162d2d/imblearn/over_sampling/_smote.py#L397) of the borderline-2 code. But would not nns contain only neighbours from the minority? as it is derived from a knn trained only in the minority class?