Python module to perform under sampling and over sampling with various techniques.
glemaitre on 0.9.X
[doc build] (compare)
glemaitre on master
MNT update setup.py (compare)
glemaitre on 0.9.1
glemaitre on master
DOC add whats new 0.9.1 (compare)
glemaitre on 0.9.X
MNT adapt for scikit-learn 1.1 … DOC add whats new 0.9.1 REL make 0.9.1 release (compare)
glemaitre on master
MNT rename CI build (compare)
glemaitre on master
MNT adapt for scikit-learn 1.1 … (compare)
This question got to do with SMOTEBoost implementation found here https://github.com/gkapatai/MaatPy but I believe the issue is relayed to imblearn
library.
I tried using the library to re-sample all classes in a multiclass problem. Caught by AttributeError: 'int' object has no attribute 'flatten'
error:
How to reproduce (in Colab nb):
Clone repo:
!git clone https://github.com/gkapatai/MaatPy.git
cd MaatPy/
from maatpy.classifiers import SMOTEBoost
Dummy data:
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=6, weights=[.1, .15, .75])
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=.2, random_state=123)
And then:
from maatpy.classifiers import SMOTEBoost
model = SMOTEBoost()
model.fit(xtrain, ytrain)
/usr/local/lib/python3.7/dist-packages/imblearn/over_sampling/_smote.py in _make_samples(self, X, y_dtype, y_type, nn_data, nn_num, n_samples, step_size)
106 random_state = check_random_state(self.random_state)
107 samples_indices = random_state.randint(
--> 108 low=0, high=len(nn_num.flatten()), size=n_samples)
109 steps = step_size * random_state.uniform(size=n_samples)
110 rows = np.floor_divide(samples_indices, nn_num.shape[1])
AttributeError: 'int' object has no attribute 'flatten'
You just need to operate proper reshaping. I once worked with a time series activity data in which I created chunks of N-size time-steps. The shape of my input was (1, 100, 4)
. So for the training sample, I have (n_samples, 1, 100, 4)
and was a five-class, multi-minority problem, that I want to oversample using SMOTE.
The way I go about it was to flatten the input, like so:
#..reshape (flatten) Train_X for SMOTE resanpling
nsamples, k, nx, ny = Train_X.shape
#Train_X = Train_X.reshape((nsamples,nx*ny))
#smote = SMOTE('not majority', random_state=42, k_neighbors=5)
#X_reample, Y_resample = smote.fit_sample(Train_X, Train_Y)
And then reshape the instance back to the original input shape, like so:
#..reshape input back to CNN xture
X_reample = X_reample.reshape(len(X_reample), k, nx, ny)
strategy_sampling
parameter as well.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
model = Pipeline(steps=[
("preprocessor", StandardScaler()),
("classifier", KNeighborsClassifier(n_neighbors=5)),
])