the error essentially comes down to the categories being different between the categorical variables you are trying to relate. See this code example
import pandas as pd
from pandas.api.types import is_dtype_equal
s = pd.Series(["a","b","a"], dtype="category")
s2 = pd.Series(["b","b","a"], dtype="category")
s3 = pd.Series(["a","b","c"], dtype="category")
is_dtype_equal(s.dtype, s2.dtype) # this is True
is_dtype_equal(s.dtype, s3.dtype) # this is False
You need update your dataframe before loading it into Featuretool to make sure the Pandas Categoricals have the same values category values. Here's how you do that
# if s is missing categories from s3
new_s = s.astype(s3.dtype)
is_dtype_equal(new_s.dtype, s3.dtype) # this is True
# if both are missing categories from each other
import pandas.api.types as pdtypes
s4 = pd.Series(["b","c"], dtype="category")
categories = set(s.dtype.categories + s4.dtype.categories)
new_s = s.astype("category", categories=categories)
new_s4 = s4.astype("category", categories=categories)
is_dtype_equal(new_s.dtype, new_s4.dtype) # this is True
please also post on SO where I can give a more detailed answer for everyone else
Hi everyone! I love featuretools and the idea to automically engeineer features. Unfortunately I can't seem to add interesting variables and I would be happy if someone could help out :)
I suspect that it has something to do with my data because I can reproduce the example in the docs just fine..
@favstats This looks like incorrect behavior, thanks for sharing with us. I just made of fix for it on a branch. Can you try to install that branch of featuretools and run your code again? You can install that branch using pip with this command
pip install -e git://github.com/featuretools/featuretools.git@interesting-values-direct-features#egg=featuretools
Let us know if it helps!
@kmax12 one thing I just noticed.. I mistyped the value name at first and it gave me back only NaN values, which makes sense since it can't match the arguments. However, I wonder if this is intended behaviour or if it should say something like "value not found" and throw an error. Just thinking out loud :)
Anyway, thank you again for answering this so fast! :)