Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Solymi
    @Solymi90
    Hi everyone! Is it possible to get a clear list of features that Featuretools can generate and mark which one of these can be properly used for time series data?
    Max Kanter
    @kmax12
    @Solymi90 can you give an example of a feature that featuretools generates that cannot be used for time series?
    to see a list of features that featuretools can generate without calculating the actual values, try running DFS with the features_only parameter. you can see an example on this page: https://docs.featuretools.com/automated_feature_engineering/primitives.html
    bgoel2003
    @bgoel2003
    Hi, is it possible in featuretools to specify if few specific primitives can't be applied before some other specific primitive?
    Max Kanter
    @kmax12
    @bgoel2003 primitives have a few internal properties that control this behavior. are you trying to do this for existing primitives or a custom primitive?
    bgoel2003
    @bgoel2003
    @kmax12 i was trying to do this for existing primitives.
    Another query i have is it possible to add new primitive definitions in featuretool, runtime, from some function call without actually adding these functions in transform_primitive and aggregation_primitives code files. And use these newly added primitives in features generation?
    Max Kanter
    @kmax12
    @bgoel2003 yep, you can write custom primitives using the instructions here: https://docs.featuretools.com/automated_feature_engineering/primitives.html#defining-custom-primitives
    we're very interested in improving the experience for creating custom primitives, so curious to hear more about you're trying to do and how we can help
    Jan Koch
    @datajanko
    Hey, my intention is to have something like a back-testing framework for time-series data with splits similar to TimeSeriesSplit of sklearn. Right now, we have the training window size and cutoff_time and (make_temporal_cutoffs). However, this does not yet help with label creation in an observation period. Assume for example that you have to select customers 1 month before they receive a print mailing (this is still common for some companies) and you want to attribute all purchases after the customer got the mailing to that mailing for 4 weeks (here we have the problem that every customer could receive the mailing at different times, but assuming uniform times is a senseful approximation). So for the labels we need some kind of look-ahead-time and an evaluation period. What are your thought on that? Would this be interesting? Should this be contained in the package?
    bgoel2003
    @bgoel2003
    @kmax12 i tried to do something like this

    def pd_is_in(array, list_of_outputs=None):
    return ""

    def isin_generate_name(self):
    return u"%s.isin(%s)" % (self.base_features[0].get_name(),str(self.kwargs['list_of_outputs']))

    make_trans_primitive(function=pd_is_in,input_types=[Numeric],return_type=Boolean,name="is_in",
    description="For each value of the base feature, checks whether it is in a list that provided.",cls_attributes={"generate_name": isin_generate_name})

    feature_defs, all_feature, feature_tree = ft.dfs(entityset=es,target_entity="accounts",agg_primitives=["avg","count","first","last","min","max","stddev","sum","variance","nuniq","valuecount","indicatorcount"],trans_primitives=["day","year","month","weekday","numwords","numcharacters","plusonelog","is_in"],max_depth=3, ignore_variables=ignore_variables)
    feature_tree.table_feature_operations_mapping_dag

    but it throws an exception like this
    ValueError: ('Unknown transform primitive is_in. ', 'Call ft.primitives.list_primitives() to get', ' a list of available primitives')
    basically what i am trying to do is to get primitives definition from some outside system and feed that primitive definition to featuretools so that this new primitive can be used in creating new features suggested by featuretools
    Max Kanter
    @kmax12
    @datajanko have you seen the option to use a dataframe as the input for cutoff_time? you can pass in a different cutoff time for each instance id. you'd have to write the code to determine what the cutoff time per customer is, but once you do you can pass it to featuretools. does that apply to your situation?
    @bgoel2003 you need to pass the python definition rather than the string
    custom_primitive = make_trans_primitive(...)
    ft.dfs(trans_primitives=[custom_primitive],...)
    Jan Koch
    @datajanko
    @kmax12 no I didn’t realize that. Thanks for the hint. I think I can leverage cut_off_times and training window, to also construct my validation window. Then, I’d only have to do this multiple times for all the “folds” I am interested in. I hope, I’ll find some time to work on this.
    Max Kanter
    @kmax12
    @datajanko exactly. let us know how it goes. if you have any other questions, feel free to message here, or if you think it'd be helpful for other in the future post on StackOverflow with the featuretools tag: https://stackoverflow.com/questions/tagged/featuretools
    tomasgreif
    @tomasgreif
    Is there a way to use multiple training windows? I am trying to generate features for last 3/6/9/12... months. https://stackoverflow.com/questions/51865267/get-features-by-different-time-windows
    Jan Koch
    @datajanko
    maybe, this can be achieved by using interesting values, not very sure though
    Max Kanter
    @kmax12
    @tomasgreif the recommended way to do that now is to make multiple calls to calculate_feature_matrix with the same list of feature definitions but different training_windows and then combine the result
    we'll follow up and answer on stack overflow as well. thanks for posting
    tomasgreif
    @tomasgreif
    I see, thank you. Would that be something you would consider adding? In the area I am working in (financial services, credit scoring), having different time windows is one the most typical feature engineering tasks.
    Max Kanter
    @kmax12
    @tomasgreif it is something we'd consider adding!
    Maximilian Christ
    @MaxBenChrist
    hi featuretools team. your featuretools library looks great!
    unfortunately, I have been working solely on time series for the last years so I did not have a dataset to try it out
    now, I am the maintainer of tsfesh (https://github.com/blue-yonder/tsfresh), we also perform feature extraction, but on time series instead of relational data
    but maybe we can embed tsfresh into featuretools in some way? I worked on Data Science problems where I had to process time series and relational data at the same time. A fully automated feature extraction framework would have helped a lot
    Max Kanter
    @kmax12
    Hey @MaxBenChrist ! Thanks for the kind words. tsfresh is an interesting library as well. We actually have experimented with embedding tsfresh into featuretools by using custom primitives. You can see an example of that in this notebook: https://github.com/Featuretools/predict-remaining-useful-life/blob/master/Advanced%20Featuretools%20RUL.ipynb
    what are your thoughts on the best way to integrate the two libraries?
    Maximilian Christ
    @MaxBenChrist
    @kmax12 yes, I saw that notebook. I still have to use a the featuretools library on a few datasets over the weekend to get more experience with it.
    in any case, we could have a brainstorming session over skype sometime next week and discuss possible starting points for a collaboration?
    Max Kanter
    @kmax12
    yep, let's do that!
    dugland123
    @dugland123
    Hello - under what circumstances would an entity variable from EntitySet es and defined as index show up as id when listing es.entities? I'm trying to resolve the following warning: both an index level and a column label.
    Defaulting to column, but this will raise an ambiguity error in a future version
    end_entity_id=child_eid)
    Max Kanter
    @kmax12
    you can read my full reponse there, but the warning you are seeing is unrelated to the featuretools variable type and will go away in the next release of Featuretools.
    Max Kanter
    @kmax12
    let us know if you have any questions
    dugland123
    @dugland123
    Thank you.
    Silvio Normey Gómez
    @silviogn
    Hi everyone.
    I'm Silvio.. from Uruguay
    I have a question.
    Is Featuretools capable of building useful attributes from semistructured data such as xml, json or rfd?
    Or is necessary convert the dataset to a tabular form?
    Max Kanter
    @kmax12
    @silviogn correct, you'd have to convert to tabular form to use featuretools.
    Waco Holve
    @WacoHolve_gitlab

    Hi all,

    I've been using this tool the past few days and it has been great so far. I work heavily with financial data and noticed that when I'm creating my EntitySet if I have column names as dtype int I get a failure message.

    I was wondering if this is desired behavior that the column names need to be dtype str for the entity set to work.

    Thank you for the awesome product.
    Waco Holve

    Max Kanter
    @kmax12
    @WacoHolve_gitlab can you share the stack trace and the some code to reproduce it?
    we'd like to support that since they are valid pandas column names