Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    geoHeil
    @geoHeil
    @kmax12 unfortunately I fear this will not be possible due to NDA ...
    Fabio Votta
    @favstats

    Hi everyone! I love featuretools and the idea to automically engeineer features. Unfortunately I can't seem to add interesting variables and I would be happy if someone could help out :)

    I suspect that it has something to do with my data because I can reproduce the example in the docs just fine..

    https://stackoverflow.com/questions/52673694/specifying-interesting-variables-with-featuretools-does-not-work

    Maybe this is an easy question for Pythonistas.. I am an ardent R user so maybe there is something I am just not seeing.
    Max Kanter
    @kmax12
    @favstats thanks for posting. we're taking a look and will put up an answer shortly.
    Fabio Votta
    @favstats
    Thanks a lot, really! :) Saw your comment on the initial post and I accidentally deleted the whole post when I wanted to edit it, sorry for that.
    It's not high priority though, just something that I couldn't figure out. Enjoy your sunday everyone :)
    Max Kanter
    @kmax12
    Happy to help! Will ping you here once I have an answer.
    Fabio Votta
    @favstats
    @kmax12 works well for me :)
    Max Kanter
    @kmax12

    @favstats This looks like incorrect behavior, thanks for sharing with us. I just made of fix for it on a branch. Can you try to install that branch of featuretools and run your code again? You can install that branch using pip with this command

    pip install -e git://github.com/featuretools/featuretools.git@interesting-values-direct-features#egg=featuretools

    Let us know if it helps!

    there's also a github pull request here if you'd like to comment there: Featuretools/featuretools#279
    Fabio Votta
    @favstats
    Oh wow, I was certain that the issue would be on my part. I'll try this out immediately. Thanks!
    Fabio Votta
    @favstats
    This worked perfectly! Thank you so much!
    Fabio Votta
    @favstats

    @kmax12 one thing I just noticed.. I mistyped the value name at first and it gave me back only NaN values, which makes sense since it can't match the arguments. However, I wonder if this is intended behaviour or if it should say something like "value not found" and throw an error. Just thinking out loud :)

    Anyway, thank you again for answering this so fast! :)

    Max Kanter
    @kmax12
    ya, that is correct behavior. its still a valid feature even if the value is nan for your particular data
    Fabio Votta
    @favstats
    alright :) great
    Max Kanter
    @kmax12
    happy to help! let us know if you have any other questions
    Fabio Votta
    @favstats
    Will do! :) So far I'm good
    Fabio Votta
    @favstats

    Hello everyone :) I have a question and hope someone can help out.

    Say I would want to calculate features for specific timeframes since the cut-off value, so for example:

    My cut-off value is 1 January 2005. I want to count the number of products a customer bought in the last month/ the last three months and the last year before that and have them all in the same feature matrix.

    I know I can do something like this (from reading this):

    feature_matrix, features = ft.dfs(
                                      target_entity="customers", 
                                      agg_primitives=["count"],
                                      cutoff_time=pd.Timestamp('January 1, 2005'),
                                      training_window=ft.Timedelta("30 days"), 
                                      entityset=es,
                                      verbose=True
                                     )

    But this would of course only give me features for the 30 days before January 1, 2005, however I would like them for different time ranges as well (i.e. three months, a year or really any other time range that would interest me).

    So I am not sure how to get to my goal right now. Would I need to create a new primitive for this task or can this be done with already existing functions?

    Max Kanter
    @kmax12
    @favstats take a look at this answer on SO. let me know if it helps. https://stackoverflow.com/a/52593818/8964531
    Fabio Votta
    @favstats
    @kmax12 This does the trick! Thanks! :)
    Max Kanter
    @kmax12
    :thumbsup:
    Fabio Votta
    @favstats

    Hello everyone!

    I encountered a problem when I tried to create relationships between entititysets (using my own data). There is no error, but it just doesn't create features for one of my entities (the "prods" entity), although everything should be connected just fine. In some ways, this is similar to the first issue I encountered, only that it occurs just with this specific entity set up. Unfortunately, I can't share my data this time but attached you will find a minimal example with some mock data, where this problem also occurs.

    Hope somebody can help and thank you for your awesome support!

    Best, Fabio

    Max Kanter
    @kmax12
    @favstats can you post the first part our question about getting product features on stackoverflow?
    rather than include a notebook, you can just put your code / comments in the question
    Fabio Votta
    @favstats
    @kmax12 Will do!
    Fabio Votta
    @favstats

    Done:

    https://stackoverflow.com/questions/53067099/features-are-not-being-generated-for-my-entityset-set-up-in-featuretools

    It's just a lot of code so I thought a python notebook would be a bit more compact :)

    Max Kanter
    @kmax12
    thanks. will answer shortly
    Fabio Votta
    @favstats
    Thank you! :)
    Max Kanter
    @kmax12
    answer posted. let us know if you have any other questions
    Fabio Votta
    @favstats
    Ooooooh, I see. Well this is definetely something I should have known. Thank you for the quick help! Everything works as expected.
    Max Kanter
    @kmax12
    happy to help!
    Dan Houghton
    @dah33
    target = read_feather("../input/gstore-2-prep/target.feather")
    
    # Bug: featuretools doesn't like datetime64[ns, UTC]
    target.cut_off_time = target.cut_off_time.astype("datetime64[ns]")
    Note, this is pyarrow.read_feather
    It assumes a UTC timezone
    Max Kanter
    @kmax12
    @dah33 what is the error you end up getting?
    Dan Houghton
    @dah33
    @kmax12 Cannot convert column last_sessions_time to <class 'featuretools.variable_types.variable.DatetimeTimeIndex'>
    The workaround is the astype conversion above.
    Max Kanter
    @kmax12
    got it. would you mind post this on github as an issue?
    Dan Houghton
    @dah33
    @kmax12 Done!
    Max Kanter
    @kmax12
    thanks! we'll keep the issue up to date as we fix
    Dan Houghton
    @dah33
    In EntitySet.normalize_entity I've been using the time_index_reduce parameter. In my example, I can request the last instance of a user's details as I normalise the sessions table. However, this appears to not be time-aware. The last instance of a user's details, can appear AFTER the cut_off_time.
    Am I correct? It's unlikely to make any difference to my business question (the GStore competition on Kaggle), but it seems inconsistent with the way time is handled elsewhere. I think the alternative is to leave all the instances of the user details in the sessions table, and let the DFS (which is time-aware) extract the correct feature.
    Max Kanter
    @kmax12
    @dah33 you're right. we haven't actually found a good use case for time_index_reduce being anything other than first and will likely remove it from Featuretools soon
    Junghyun Kim
    @Dpnia
    Hello, I want to ask about an algorithm in paper. (http://www.jmaxkanter.com/static/papers/DSAA_DSM_2015.pdf) In algorithm 1, is line 7 is correct? <Fj = Fj∪RFEAT(Ei, Ej)> I guess it should be <Fi = Fi∪RFEAT(Ei, Ej)> not Fj, but Fi. Please, tell me what wrong with me
    Max Kanter
    @kmax12
    @Dpnia that is correct. It should be Fi as you point out
    Salman-Jawad91
    @Salman-Jawad91
    Hi
    I am unable to install featuretools on azure juptyer notebook or even using azure ml studio and getting errors as below:
    Installing collected packages: dask, pandas, future, msgpack, psutil, distributed, jmespath, urllib3, botocore, s3transfer, boto3, s3fs, tqdm, featuretools
    Found existing installation: dask 0.15.3
    Uninstalling dask-0.15.3:
    Successfully uninstalled dask-0.15.3
    Found existing installation: pandas 0.20.3
    Uninstalling pandas-0.20.3:
    Successfully uninstalled pandas-0.20.3
    Found existing installation: future 0.15.2
    DEPRECATION: Uninstalling a distutils installed project (future) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling future-0.15.2:
    Successfully uninstalled future-0.15.2
    Running setup.py install for future ... done
    Found existing installation: psutil 2.1.1
    DEPRECATION: Uninstalling a distutils installed project (psutil) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling psutil-2.1.1:
    Successfully uninstalled psutil-2.1.1
    Running setup.py install for psutil ... error
    Complete output from command /home/nbuser/anaconda2_20/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-T67_lN/psutil/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-gEaonv-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/psutil
    copying psutil/_psosx.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_exceptions.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_pswindows.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_common.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_compat.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_psbsd.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_pslinux.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/init.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_pssunos.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_psposix.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_psaix.py -> build/lib.linux-x86_64-2.7/psutil
    creating build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/main.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_misc.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_sunos.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_unicode.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_memory_leaks.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_posix.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_aix.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_linux.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_windows.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_osx.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/init.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_connections.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_bsd.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_process.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_contracts.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_system.py -> build/lib.linux-x86_64-2.7/psutil/tests
    running build_ext
    building 'psutil._psutil_linux' extension