Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Darío López Padial
    @bukosabino
    thanks. solved :=)
    Albert Carter
    @RogerTangos

    Hi there - I usually post on SO for FT questions, but thought that this discussion might need some more interaction.

    Today, I was looking at advanced custom primitives and came across a stackoverflow question: https://stackoverflow.com/questions/53579465/how-to-use-featuretools-to-create-features-from-multiple-columns-in-single-dataf

    The user is trying to create a primitive which sums columns conditionally, based on whether the row is within a timedelta. So, sum only cells where the timestamp is within the last 3 days.

    I think that this is possible if the user creates a transform primitive, which just outputs the value if the cell is within a time range, and 0 if otherwise. Then, they can use the sum aggregation primitive.

    However, I'm curious to know if this is possible in a single aggregation primitive, or whether there is another mechanism for achieving this. It seems very wasteful to store a column of mostly zeros just to take its sum later on.

    Max Kanter
    @kmax12
    @RogerTangos You can create primitives that take in more than one column. here's an example in the docs: https://docs.featuretools.com/automated_feature_engineering/primitives.html#multiple-input-types
    the case in that specific question is a little tricky, but it should be possible, working on posting an answer with an example primitive soon
    Max Kanter
    @kmax12
    @RogerTangos just put the answer up!
    Albert Carter
    @RogerTangos
    Thanks @kmax12 , that's very interesting to see, and I'm glad that it's possible. I really appreciate you taking the time to answer these. It's very kind of you.
    pabloazurduy
    @pabloazurduy
    Hi, I was reading the documentation but i couldn't find an automatic way to make "row_window features", i understand that is possible to use the training_windowin ft.dfs but that only gives you an lower bound (as i understand). What i mean is for example to create the next features:
    COUNT(orders) in 0to1 day
    COUNT(orders) in 1to2 day
    COUNT(orders) in 2to3 day
    etc...
    its there an easy way to create that kind of features ?
    Max Kanter
    @kmax12
    @pabloazurduy just put up a quick answer on how to approach it. let me know if that helps or if a specific code example is needed
    we would consider support this functionality more natively in the future. would you mind making an issue on our github to document your use case / request?
    thanks for trying out featuretools!
    Gray
    @grayskripko
    hi guys. Is there a simple way to use "mean" as an aggregation primitive, skipping missing values? Or the only way is to write a custom primitive?
    Max Kanter
    @kmax12
    @grayskripko for the time being you'd have to do a custom primitive. in a future update we'll allow you to configure how missing value are handled @grayskripko
    Marco Spoel
    @marcospoel
    Gray
    @grayskripko
    thanks!
    Gray
    @grayskripko
    the shortest way was to use numpy.nanmean()
    Max Kanter
    @kmax12
    :thumbsup:
    Albert Carter
    @RogerTangos
    Hey FT Team - I was running through some of the docs recently, and tried the WordCount example: https://docs.featuretools.com/automated_feature_engineering/primitives.html. Unfortunately, I haven't been able to get the same results as in the docs - that is, I can't get the WordCount feature to show up, at all. I've copy pasted code, and tested FT versions 0.5.0, 0.4.0, 0.3.0, 0.2.1, and 0.2.0, and am never able to get this feature to show. Is this an error in the docs, or is there some behavior I don't understand here?
    from featuretools.primitives import IdentityFeature
    import featuretools as ft
    
    es = ft.demo.load_mock_customer(return_entityset=True)
    from featuretools.primitives import make_agg_primitive, make_trans_primitive
    from featuretools.variable_types import Text, Numeric   
    
    def word_count(column):
        '''
        Counts the number of words in each row of the column. Returns a list
        of the counts for each row.
        '''
        word_counts = []
        for value in column:
            words = value.split(None)
            word_counts.append(len(words))
        return word_counts
    
    # Next, we need to create a custom primitive from the word_count function.
    
    WordCount = make_trans_primitive(function=word_count,
                                     input_types=[Text],
                                     return_type=Numeric)
    
    # Since WordCount is a transform primitive, we need to add it to the list of transform primitives DFS can use when generating features.
    
    feature_matrix, features = ft.dfs(entityset=es,
                                      target_entity="customers",
                                      agg_primitives=["sum", "mean", "std"],
                                      trans_primitives=[WordCount])
    Max Kanter
    @kmax12
    @RogerTangos looks like we have a hidden cell in the documentation. can you replace line 4 with
    from featuretools.tests.testing_utils import make_ecommerce_entityset
    es = make_ecommerce_entityset()
    Albert Carter
    @RogerTangos
    yep. that uses the WordCount, though the output is still different from what the docs show
    sonnehansen
    @sonnehansen

    Hi, Thanks for an excellent toolbox :)
    I have question regarding the subsequent prediction when we have extended time series windows.

    I work with sensor data from wind turbines and a situation somewhat similar to your RUL demo. Assume that a window of 14 days works well. Then:

    1. We detect an error from online data
    2. The problem is fixed within the next couple of days
    3. Now the turbine is in "quarantine" for ~12 days with respect to this model because the signal from the corrected error will pollute the signal

    My questions is then:

    • Do you have any previous experience with the above problem? And/or any ideas how to mitigate / resolve it?

    Of course we could shorten the time window but this would not really solve the issue. Also, I imagine that statistics such as "trend" would start to suffer from

    Max Kanter
    @kmax12

    @/all Hi Everyone - We are officially moving all Featuretools discussions (help, feature suggestions, release notes, etc) to Slack. Please join us at the following link to the continue the conversation!

    https://join.slack.com/t/featuretools/shared_invite/enQtNTEwODEzOTEwMjg4LTZiZDdkYjZhZTVkMmVmZDIxNWZiNTVjNDQxYmZkMzI5NGRlOTg5YjcwYmJiNWE2YjIzZmFkMjc1NDZkNjBhZTQ

    Max Kanter
    @kmax12
    This message was deleted
    Dr. Hanan Shteingart
    @chanansh
    @bgoel2003 did you manage to have progress in extracting the DAG of features to use lets say dbt to execute it externally from FeatureTools?