I've been using this tool the past few days and it has been great so far. I work heavily with financial data and noticed that when I'm creating my EntitySet if I have column names as dtype int I get a failure message.
I was wondering if this is desired behavior that the column names need to be dtype str for the entity set to work.
Thank you for the awesome product.
locations.PERCENT_TRUE(appointments.sms_received)gives the percent of rows for which
True, given a single location. I'd expect that column to be the same for all rows of a single location, because that's what it was conditioned on, but I'm not finding that to be the case. Any ideas why?
fm.loc[fm.neighborhood == 'HORTO', 'locations.PERCENT_TRUE(appointments.sms_received)'].describe()I get:
cutoff_timeyou'd like to create features at and a
training_windowwhich specifies how much historical data to use. So, you can create the different time period features you want by make multiple calls to
ft.calculate_feature_matrixfor each window. you can read more about handling time here: https://docs.featuretools.com/automated_feature_engineering/handling_time.html
the error essentially comes down to the categories being different between the categorical variables you are trying to relate. See this code example
import pandas as pd from pandas.api.types import is_dtype_equal s = pd.Series(["a","b","a"], dtype="category") s2 = pd.Series(["b","b","a"], dtype="category") s3 = pd.Series(["a","b","c"], dtype="category") is_dtype_equal(s.dtype, s2.dtype) # this is True is_dtype_equal(s.dtype, s3.dtype) # this is False
You need update your dataframe before loading it into Featuretool to make sure the Pandas Categoricals have the same values category values. Here's how you do that
# if s is missing categories from s3 new_s = s.astype(s3.dtype) is_dtype_equal(new_s.dtype, s3.dtype) # this is True # if both are missing categories from each other import pandas.api.types as pdtypes s4 = pd.Series(["b","c"], dtype="category") categories = set(s.dtype.categories + s4.dtype.categories) new_s = s.astype("category", categories=categories) new_s4 = s4.astype("category", categories=categories) is_dtype_equal(new_s.dtype, new_s4.dtype) # this is True
please also post on SO where I can give a more detailed answer for everyone else