Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Dan Houghton
    @dah33
    target = read_feather("../input/gstore-2-prep/target.feather")
    
    # Bug: featuretools doesn't like datetime64[ns, UTC]
    target.cut_off_time = target.cut_off_time.astype("datetime64[ns]")
    Note, this is pyarrow.read_feather
    It assumes a UTC timezone
    Max Kanter
    @kmax12
    @dah33 what is the error you end up getting?
    Dan Houghton
    @dah33
    @kmax12 Cannot convert column last_sessions_time to <class 'featuretools.variable_types.variable.DatetimeTimeIndex'>
    The workaround is the astype conversion above.
    Max Kanter
    @kmax12
    got it. would you mind post this on github as an issue?
    Dan Houghton
    @dah33
    @kmax12 Done!
    Max Kanter
    @kmax12
    thanks! we'll keep the issue up to date as we fix
    Dan Houghton
    @dah33
    In EntitySet.normalize_entity I've been using the time_index_reduce parameter. In my example, I can request the last instance of a user's details as I normalise the sessions table. However, this appears to not be time-aware. The last instance of a user's details, can appear AFTER the cut_off_time.
    Am I correct? It's unlikely to make any difference to my business question (the GStore competition on Kaggle), but it seems inconsistent with the way time is handled elsewhere. I think the alternative is to leave all the instances of the user details in the sessions table, and let the DFS (which is time-aware) extract the correct feature.
    Max Kanter
    @kmax12
    @dah33 you're right. we haven't actually found a good use case for time_index_reduce being anything other than first and will likely remove it from Featuretools soon
    Junghyun Kim
    @Dpnia
    Hello, I want to ask about an algorithm in paper. (http://www.jmaxkanter.com/static/papers/DSAA_DSM_2015.pdf) In algorithm 1, is line 7 is correct? <Fj = Fj∪RFEAT(Ei, Ej)> I guess it should be <Fi = Fi∪RFEAT(Ei, Ej)> not Fj, but Fi. Please, tell me what wrong with me
    Max Kanter
    @kmax12
    @Dpnia that is correct. It should be Fi as you point out
    Salman-Jawad91
    @Salman-Jawad91
    Hi
    I am unable to install featuretools on azure juptyer notebook or even using azure ml studio and getting errors as below:
    Installing collected packages: dask, pandas, future, msgpack, psutil, distributed, jmespath, urllib3, botocore, s3transfer, boto3, s3fs, tqdm, featuretools
    Found existing installation: dask 0.15.3
    Uninstalling dask-0.15.3:
    Successfully uninstalled dask-0.15.3
    Found existing installation: pandas 0.20.3
    Uninstalling pandas-0.20.3:
    Successfully uninstalled pandas-0.20.3
    Found existing installation: future 0.15.2
    DEPRECATION: Uninstalling a distutils installed project (future) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling future-0.15.2:
    Successfully uninstalled future-0.15.2
    Running setup.py install for future ... done
    Found existing installation: psutil 2.1.1
    DEPRECATION: Uninstalling a distutils installed project (psutil) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling psutil-2.1.1:
    Successfully uninstalled psutil-2.1.1
    Running setup.py install for psutil ... error
    Complete output from command /home/nbuser/anaconda2_20/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-T67_lN/psutil/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-gEaonv-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/psutil
    copying psutil/_psosx.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_exceptions.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_pswindows.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_common.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_compat.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_psbsd.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_pslinux.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/init.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_pssunos.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_psposix.py -> build/lib.linux-x86_64-2.7/psutil
    copying psutil/_psaix.py -> build/lib.linux-x86_64-2.7/psutil
    creating build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/main.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_misc.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_sunos.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_unicode.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_memory_leaks.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_posix.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_aix.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_linux.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_windows.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_osx.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/init.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_connections.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_bsd.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_process.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_contracts.py -> build/lib.linux-x86_64-2.7/psutil/tests
    copying psutil/tests/test_system.py -> build/lib.linux-x86_64-2.7/psutil/tests
    running build_ext
    building 'psutil._psutil_linux' extension
    creating build/temp.linux-x86_64-2.7/psutil
    gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_VERSION=548 -DPSUTIL_LINUX=1 -I/home/nbuser/anaconda2_20/include/python2.7 -c psutil/_psutil_common.c -o build/temp.linux-x86_64-2.7/psutil/_psutil_common.o
    In file included from /home/nbuser/anaconda2_20/include/math.h:71:0,
    from /home/nbuser/anaconda2_20/include/python2.7/pyport.h:325,
    from /home/nbuser/anaconda2_20/include/python2.7/Python.h:58,
    from psutil/_psutil_common.c:9:
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:63:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (cos,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:65:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (sin,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:81:22: error: unknown type name ‘sincos’
    MATHDECL_VEC (void,sincos,,
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:81:29: error: expected declaration specifiers or ‘...’ before ‘,’ token
    MATHDECL_VEC (void,sincos,,
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:82:3: error: expected declaration specifiers or ‘...’ before ‘(’ token
    (Mdouble x, Mdouble *sinx, Mdouble cosx));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:100:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (exp,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:109:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (log,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:153:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (pow,, (Mdouble x, Mdouble y));
    ^
    In file included from /home/nbuser/anaconda2_20/include/math.h:94:0,
    from /home/nbuser/anaconda2_20/include/python2.7/pyport.h:325,
    from /home/nbuser/anaconda2_20/include/python2.7/Python.h:58,
    from psutil/_psutil_common.c:9:
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:63:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (cos,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:65:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (sin,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:81:22: error: unknown type name ‘sincos’
    MATHDECL_VEC (void,sincos,,
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:81:29: error: expected declaration specifiers or ‘...’ before ‘,’ token
    MATHDECL_VEC (void,sincos,,
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:82:3: error: expected declaration specifiers or ‘...’ before ‘(’ token
    (Mdouble __x, Mdouble
    sinx, Mdouble *cosx));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:100:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (exp,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:109:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (log,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:153:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (pow,, (Mdouble x, Mdouble __y));
    ^
    In file included from /home/nbuser/anaconda2_20/include/math.h:141:0,
    from /home/nbuser/anaconda2_20/include/python2.7/pyport.h:325,
    from /home/nbuser/anaconda2_20/include/python2.7/Python.h:58,
    from psutil/_psutil_common.c:9:
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:63:21: error: expected ‘)’ before ‘,’ token

    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:65:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (sin,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:81:22: error: unknown type name ‘sincos’
    MATHDECL_VEC (void,sincos,,
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:81:29: error: expected declaration specifiers or ‘...’ before ‘,’ token
    MATHDECL_VEC (void,sincos,,
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:82:3: error: expected declaration specifiers or ‘...’ before ‘(’ token
    (Mdouble x, Mdouble *sinx, Mdouble *cosx));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:100:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (exp,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:109:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (log,, (Mdouble x));
    ^
    /usr/include/x86_64-linux-gnu/bits/mathcalls.h:153:21: error: expected ‘)’ before ‘,’ token
    MATHCALL_VEC (pow,, (Mdouble x, Mdouble y));
    ^
    error: command 'gcc' failed with exit status 1

    ----------------------------------------

    Rolling back uninstall of psutil
    Command "/home/nbuser/anaconda2_20/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-T67_lN/psutil/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-gEaonv-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-T67_lN/psutil/
    You are using pip version 9.0.3, however version 18.1 is available.
    You should consider upgrading via the 'pip install --upgrade pip' command

    However, I was able to install on-premise using the command
    pip install featuretools , but on azure jupyter notebook I am unable
    tried even !pip install --ignore-installed featuretools but not wokring

    Any Help!

    Markus
    @mloning
    Hi there, I came across featuretools today, I usually used tsfresh for this type of task, are you aware of tsfresh? Couldn't find any reference in the paper or Github repo, and do you know of any other notable package that do automatic feature engineering? Thanks for your help and great work on relational feature engineering!
    Max Kanter
    @kmax12
    @mloning we are aware of tsfresh and like the library. if you scroll up in the chatroom, you'll see @MaxBenChrist (developer of tsfresh) was here recently
    i'm not sure of any other notable packages for automated feature engineering
    Markus
    @mloning
    Alright, thanks for the quick reply!
    Darío López Padial
    @bukosabino

    Hi guys, I have some problems with circleci (python2.7) validation. I am working on this pull request: Featuretools/featuretools#323

    I try to reproduce locally the errors, but I can not. I do something like this:

    virtualenv -p python2.7 env
    source env/bin/activate
    pip install -r test-requirements.txt
    make installdeps lint

    But, I have no errors. What can I do? Do you have any developer documentation? This is my first time using circleci...

    Max Kanter
    @kmax12
    @bukosabino have you run make test? this will run the tests
    Darío López Padial
    @bukosabino
    thanks. solved :=)
    Albert Carter
    @RogerTangos

    Hi there - I usually post on SO for FT questions, but thought that this discussion might need some more interaction.

    Today, I was looking at advanced custom primitives and came across a stackoverflow question: https://stackoverflow.com/questions/53579465/how-to-use-featuretools-to-create-features-from-multiple-columns-in-single-dataf

    The user is trying to create a primitive which sums columns conditionally, based on whether the row is within a timedelta. So, sum only cells where the timestamp is within the last 3 days.

    I think that this is possible if the user creates a transform primitive, which just outputs the value if the cell is within a time range, and 0 if otherwise. Then, they can use the sum aggregation primitive.

    However, I'm curious to know if this is possible in a single aggregation primitive, or whether there is another mechanism for achieving this. It seems very wasteful to store a column of mostly zeros just to take its sum later on.

    Max Kanter
    @kmax12
    @RogerTangos You can create primitives that take in more than one column. here's an example in the docs: https://docs.featuretools.com/automated_feature_engineering/primitives.html#multiple-input-types
    the case in that specific question is a little tricky, but it should be possible, working on posting an answer with an example primitive soon
    Max Kanter
    @kmax12
    @RogerTangos just put the answer up!
    Albert Carter
    @RogerTangos
    Thanks @kmax12 , that's very interesting to see, and I'm glad that it's possible. I really appreciate you taking the time to answer these. It's very kind of you.
    pabloazurduy
    @pabloazurduy
    Hi, I was reading the documentation but i couldn't find an automatic way to make "row_window features", i understand that is possible to use the training_windowin ft.dfs but that only gives you an lower bound (as i understand). What i mean is for example to create the next features:
    COUNT(orders) in 0to1 day
    COUNT(orders) in 1to2 day
    COUNT(orders) in 2to3 day
    etc...
    its there an easy way to create that kind of features ?
    Max Kanter
    @kmax12
    @pabloazurduy just put up a quick answer on how to approach it. let me know if that helps or if a specific code example is needed
    we would consider support this functionality more natively in the future. would you mind making an issue on our github to document your use case / request?
    thanks for trying out featuretools!
    Gray
    @grayskripko
    hi guys. Is there a simple way to use "mean" as an aggregation primitive, skipping missing values? Or the only way is to write a custom primitive?
    Max Kanter
    @kmax12
    @grayskripko for the time being you'd have to do a custom primitive. in a future update we'll allow you to configure how missing value are handled @grayskripko
    Marco Spoel
    @marcospoel
    Gray
    @grayskripko
    thanks!
    Gray
    @grayskripko
    the shortest way was to use numpy.nanmean()
    Max Kanter
    @kmax12
    :thumbsup:
    Albert Carter
    @RogerTangos
    Hey FT Team - I was running through some of the docs recently, and tried the WordCount example: https://docs.featuretools.com/automated_feature_engineering/primitives.html. Unfortunately, I haven't been able to get the same results as in the docs - that is, I can't get the WordCount feature to show up, at all. I've copy pasted code, and tested FT versions 0.5.0, 0.4.0, 0.3.0, 0.2.1, and 0.2.0, and am never able to get this feature to show. Is this an error in the docs, or is there some behavior I don't understand here?
    from featuretools.primitives import IdentityFeature
    import featuretools as ft
    
    es = ft.demo.load_mock_customer(return_entityset=True)
    from featuretools.primitives import make_agg_primitive, make_trans_primitive
    from featuretools.variable_types import Text, Numeric   
    
    def word_count(column):
        '''
        Counts the number of words in each row of the column. Returns a list
        of the counts for each row.
        '''
        word_counts = []
        for value in column:
            words = value.split(None)
            word_counts.append(len(words))
        return word_counts
    
    # Next, we need to create a custom primitive from the word_count function.
    
    WordCount = make_trans_primitive(function=word_count,
                                     input_types=[Text],
                                     return_type=Numeric)
    
    # Since WordCount is a transform primitive, we need to add it to the list of transform primitives DFS can use when generating features.
    
    feature_matrix, features = ft.dfs(entityset=es,
                                      target_entity="customers",
                                      agg_primitives=["sum", "mean", "std"],
                                      trans_primitives=[WordCount])
    Max Kanter
    @kmax12
    @RogerTangos looks like we have a hidden cell in the documentation. can you replace line 4 with
    from featuretools.tests.testing_utils import make_ecommerce_entityset
    es = make_ecommerce_entityset()
    Albert Carter
    @RogerTangos
    yep. that uses the WordCount, though the output is still different from what the docs show
    sonnehansen
    @sonnehansen

    Hi, Thanks for an excellent toolbox :)
    I have question regarding the subsequent prediction when we have extended time series windows.

    I work with sensor data from wind turbines and a situation somewhat similar to your RUL demo. Assume that a window of 14 days works well. Then:

    1. We detect an error from online data
    2. The problem is fixed within the next couple of days
    3. Now the turbine is in "quarantine" for ~12 days with respect to this model because the signal from the corrected error will pollute the signal

    My questions is then:

    • Do you have any previous experience with the above problem? And/or any ideas how to mitigate / resolve it?

    Of course we could shorten the time window but this would not really solve the issue. Also, I imagine that statistics such as "trend" would start to suffer from

    Max Kanter
    @kmax12

    @/all Hi Everyone - We are officially moving all Featuretools discussions (help, feature suggestions, release notes, etc) to Slack. Please join us at the following link to the continue the conversation!

    https://join.slack.com/t/featuretools/shared_invite/enQtNTEwODEzOTEwMjg4LTZiZDdkYjZhZTVkMmVmZDIxNWZiNTVjNDQxYmZkMzI5NGRlOTg5YjcwYmJiNWE2YjIzZmFkMjc1NDZkNjBhZTQ