Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    G.Satish
    @aryansatish7
    hello all , @solegalli Thanks much for creating this, looking forward to contribute and learn a lot
    ShyamGurunath
    @ShyamGurunath
    Hi everyone, Feature engine is one of my favorite. @solegalli Really glad you've created this community.
    Soledad Galli
    @solegalli
    Hello everyone. I wanted to pick your brains regarding a new transformer that we are considering for Feature-engine. In short, the transformer will learn the existing variables (names and order) in the train set during the fit method. And during transform() it will look at the variables of the dataset, and if it has variables that were not present in the train set, it will drop it, and if the train set has variables that are not present in the test set, it will add it either with nan, or zeros or a user desired input. Here is the issue: solegalli/feature_engine#132
    my question is, would you find that transformer useful? It would be really helpful if you could add your thoughts to that PR, or here if it makes it easier. Thank you!
    ShyamGurunath
    @ShyamGurunath
    @solegalli This transformer is good.Most of the time the variables in the train set is not present in the test set.So to handle that, this will be very useful.
    Joe
    @joeanton719
    Hi @solegalli . I recently encountered an error when working with RareLabel Encoder. I posted the question on Stackoverflow: https://stackoverflow.com/questions/68847937/feature-engine-rarelabelencoder-valueerror-could-not-convert-string-to-float
    Would really appreciate anyone's help regarding this, thanks!
    Joe
    @joeanton719

    Hi @solegalli . I recently encountered an error when working with RareLabel Encoder. I posted the question on Stackoverflow: https://stackoverflow.com/questions/68847937/feature-engine-rarelabelencoder-valueerror-could-not-convert-string-to-float

    UPDATE: Managed to solve the issue. I have added the answer in the same stack over flow forum.

    Soledad Galli
    @solegalli
    glad you had it sorted @joeanton719 . Feature-engine transformers have the option to select the variables directly. No need to use the column transformer at all. I added that to your thread in stack overflow. Cheers
    jonathanhexner
    @jonathanhexner
    Hi,
    Very cool package. Question about DecisionTreeDiscretiser. It seems to return the features discretized according to the response values and not the features values... Is this the intention?
    for example for the houses prices data set used in the example, applying it to GrLivArea it returns values in the range up to 500000, which corresponds to the response output and not the feature value.
    Soledad Galli
    @solegalli
    yep. that is the indented result. Did you have something else in mind?
    jonathanhexner
    @jonathanhexner
    It seems a little counter intuitive to me, but it's possible it's just new to me.
    Is there any way to "decode" the feature value?
    I guess there is no completely intuitive way of encoding it. I was thinking of the range mean,
    similar to how the other discretizers do it.
    Soledad Galli
    @solegalli
    Not with feature-engine. But if you have a specific way of discretizing in mind, feel free to create an issue to add the functionality you are after. Make sure to include links and clear guidelines as to what the transformer should do, and what is the desired output. Also, check if that functionality is not requested already before creating the issue if poss :)
    Daoud Chami
    @daoudchami:matrix.org
    [m]
    Hi,
    Tank you very much for your amazing package ! I have a quick question : when I use the DatetimeFeatures transformer in a ColumnTransformer object, it can't produce feature names
    And from my investigations it is beacause sklearn is asking DatetimeFeatures to produce the features names from a numpy array instead of a list
    hence the following error : ValueError: input_features must be a list. Got ['creation_date'] instead.
    Did anyone had this error before ? Thanks !
    Soledad Galli
    @solegalli
    Hey Daoud. Why would you like to use DatetimeFeatures within a column transformer? you can select the features to modify directly from DatetimeFeatures. Maybe it helps if you paste the code you are trying to execute?
    Daoud Chami
    @daoudchami:matrix.org
    [m]
    Thank you for your quick answer. I'm using the DatetimeFeatures alongside other transformers to process numerical & categorical values. And since I'm missing a transformer for dates, I used the DatetimeFeatures
    Soledad Galli
    @solegalli
    Please copy and paste the code that you are trying to execute and the entire error thread, either here or better in stackoverflow or github issues in our repo, and link it here, so I can have a look
    Segun Adelowo
    @segunadelowo
    Hi Everyone! I am new here. @solegalli thanks for the feature_engine tool and your udemy courses I learnt a lot. I am going through the "good first issue" list to see what I can pickup and contribute. :)
    Soledad Galli
    @solegalli
    Hi @segunadelowo thank you! that's very nice to hear :)
    meDerekD
    @meDerekD17

    Hi, I am having a problem installing feature-engine. I am following the instructions in the book Python Feature Engineering to chapter 2. As per chapter 1 I am running Jupyter Notebook under Anaconda. For install of feature-engine I did:
    $ pip install feature-engine
    as written in the book, page 46. I have done a few Restarts of the operating system (Mac).
    On the feature-engine web site it suggested install command when using Anaconda:
    $ conda install -c conda-forge feature-engine

    This did update to condo to 22.9.0 and new install to feature-engine-1.5.0 and python-abi-3.9

    I have searched over the internet, and they all say the pip install feature-engine will solve the problem

    Any suggestions? Thanks

    Soledad Galli
    @solegalli
    Hi I am not sure I understand the problem that you are having. Does it not install? could you paste an error message? or maybe explain the problem a bit more?
    did you try re-starting the kernel of the notebook?
    meDerekD
    @meDerekD17
    in Jupyter Notebook I run:

    import pandas as pd

    to split the data sets

    from sklearn.model_selection import train_test_split

    to impute missing data with sklearn

    from sklearn.impute import SimpleImputer

    to impute missing data with feature-engine

    from feature_engine.missing_data_imputers import MeanMedianImputer
    print('didit')

    which gives:

    ModuleNotFoundError Traceback (most recent call last)
    Input In [1], in <cell line: 7>()
    5 from sklearn.impute import SimpleImputer
    6 # to impute missing data with feature-engine
    ----> 7 from feature_engine.missing_data_imputers import MeanMedianImputer
    8 print('didit')

    ModuleNotFoundError: No module named 'feature_engine.missing_data_imputers'

    Bettre text if I type in:
    maybe not, I cannot type in very well when carriage return gives a new message, I think
    meDerekD
    @meDerekD17
    It appears to have installed. Anaconda Environments tab I see feature-engine-1.5.0, and python_abi-3.9, and it appears to have updated conda-22.9.0
    I did a restart of the Mac OS. I am not sure what a restart of the kernel of the Notebook, but presume OS restart does that
    meDerekD
    @meDerekD17

    ModeuleNotFoundError Traceback (most recent call last)
    ----->from feature_engine.missing_data_imputers import MeanMedianImputer
    ModuleNotFoundError: No module named 'feature_engine.missing_data_imputers'
    meDerekD
    @meDerekD17
    I just did a restart kernel ( listening to a Jupyter Notebooks webcast)
    and ran it in JupyterLab, but both gave the same error: could not find module
    meDerekD
    @meDerekD17
    I have to go now, but back in a couple of hours, and will check tomorrow
    Soledad Galli
    @solegalli
    The book that you've got is using a very old version of Feature-engine, the module names have changed since. So the correct way of importing now is: from feature_engine.imputation import MeanMedianImputer. Check the documentation for more information. Not sure you can get hold of the second edition of the book? is coming up next month. In any case, the principles described in the first edition still hold, but a big chunck of the code will not run due to the updates to the different libraries.
    meDerekD
    @meDerekD17
    Thanks. That worked. It may have been a feature, as now I know to look at the documentation if things don't work. The documentation has: feature_engine.imputation.MeanMedianImputer.
    I am relatively new to ML and Mac, so I am not sure what my two install attempts would have done. I first did, in a terminal window; pip install feature-engine. Later I did in a terminal window (as am running under Anaconda): conda install -c conda-forge feature_engine.
    meDerekD
    @meDerekD17
    From what I think I saw in Anaconda.Navigator, in the environments Tab, the first install, I think, put feature-engine into Anaconda, but the 2nd install updated conda-22.9.0 and installed python_abi-3.9. I assume both installed feature-engine to the same place, and the 2nd, may have allowed everything to work
    Segun Adelowo
    @segunadelowo
    @solegalli Pls i need help, I am setting up my development workbench for feature_engineer following the setup steps in the contribute code doc, i ran "pip install -e ." and got ERROR: Could not find a version that satisfies the requirement scikit-learn>=1.0.0 ERROR: No matching distribution found for scikit-learn>=1.0.0
    I am using Python 3.6.8
    Segun Adelowo
    @segunadelowo
    I see that scikit-learn 1.0.0 requires Python (>= 3.7) i hope this is fine and won't break the code in any way?
    Segun Adelowo
    @segunadelowo
    Just saw in setup.py, required python version for feature_engine is ">=3.8.0" I will upgrade now, i guess i will be fine.
    Soledad Galli
    @solegalli
    Hi @segunadelowo . Feature-engine depends on Scikit-learn, so we support the versions that Scikit-learn supports. And if I remember correctly, in their last release, it is from Python 3.8 onwards. I also had to update my Python installation. Hope you managed to get it up and running.
    1 reply