Hi there - I usually post on SO for FT questions, but thought that this discussion might need some more interaction.
Today, I was looking at advanced custom primitives and came across a stackoverflow question: https://stackoverflow.com/questions/53579465/how-to-use-featuretools-to-create-features-from-multiple-columns-in-single-dataf
The user is trying to create a primitive which sums columns conditionally, based on whether the row is within a timedelta. So, sum only cells where the timestamp is within the last 3 days.
I think that this is possible if the user creates a transform primitive, which just outputs the value if the cell is within a time range, and 0 if otherwise. Then, they can use the
sum aggregation primitive.
However, I'm curious to know if this is possible in a single aggregation primitive, or whether there is another mechanism for achieving this. It seems very wasteful to store a column of mostly zeros just to take its sum later on.
ft.dfsbut that only gives you an lower bound (as i understand). What i mean is for example to create the next features:
from featuretools.primitives import IdentityFeature import featuretools as ft es = ft.demo.load_mock_customer(return_entityset=True) from featuretools.primitives import make_agg_primitive, make_trans_primitive from featuretools.variable_types import Text, Numeric def word_count(column): ''' Counts the number of words in each row of the column. Returns a list of the counts for each row. ''' word_counts =  for value in column: words = value.split(None) word_counts.append(len(words)) return word_counts # Next, we need to create a custom primitive from the word_count function. WordCount = make_trans_primitive(function=word_count, input_types=[Text], return_type=Numeric) # Since WordCount is a transform primitive, we need to add it to the list of transform primitives DFS can use when generating features. feature_matrix, features = ft.dfs(entityset=es, target_entity="customers", agg_primitives=["sum", "mean", "std"], trans_primitives=[WordCount])
Hi, Thanks for an excellent toolbox :)
I have question regarding the subsequent prediction when we have extended time series windows.
I work with sensor data from wind turbines and a situation somewhat similar to your RUL demo. Assume that a window of 14 days works well. Then:
My questions is then:
Of course we could shorten the time window but this would not really solve the issue. Also, I imagine that statistics such as "trend" would start to suffer from
@/all Hi Everyone - We are officially moving all Featuretools discussions (help, feature suggestions, release notes, etc) to Slack. Please join us at the following link to the continue the conversation!