Hello everyone :) I have a question and hope someone can help out.
Say I would want to calculate features for specific timeframes since the cut-off value, so for example:
My cut-off value is 1 January 2005. I want to count the number of products a customer bought in the last month/ the last three months and the last year before that and have them all in the same feature matrix.
I know I can do something like this (from reading this):
feature_matrix, features = ft.dfs(
target_entity="customers",
agg_primitives=["count"],
cutoff_time=pd.Timestamp('January 1, 2005'),
training_window=ft.Timedelta("30 days"),
entityset=es,
verbose=True
)
But this would of course only give me features for the 30 days before January 1, 2005, however I would like them for different time ranges as well (i.e. three months, a year or really any other time range that would interest me).
So I am not sure how to get to my goal right now. Would I need to create a new primitive for this task or can this be done with already existing functions?
Hello everyone!
I encountered a problem when I tried to create relationships between entititysets (using my own data). There is no error, but it just doesn't create features for one of my entities (the "prods" entity), although everything should be connected just fine. In some ways, this is similar to the first issue I encountered, only that it occurs just with this specific entity set up. Unfortunately, I can't share my data this time but attached you will find a minimal example with some mock data, where this problem also occurs.
Hope somebody can help and thank you for your awesome support!
Best, Fabio
Done:
It's just a lot of code so I thought a python notebook would be a bit more compact :)
astype
conversion above.
EntitySet.normalize_entity
I've been using the time_index_reduce
parameter. In my example, I can request the last
instance of a user's details as I normalise the sessions table. However, this appears to not be time-aware. The last
instance of a user's details, can appear AFTER the cut_off_time
.
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:65:21: error: expected ‘)’ before ‘,’ token
MATHCALL_VEC (sin,, (Mdouble x));
^
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:81:22: error: unknown type name ‘sincos’
MATHDECL_VEC (void,sincos,,
^
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:81:29: error: expected declaration specifiers or ‘...’ before ‘,’ token
MATHDECL_VEC (void,sincos,,
^
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:82:3: error: expected declaration specifiers or ‘...’ before ‘(’ token
(Mdouble x, Mdouble *sinx, Mdouble *cosx));
^
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:100:21: error: expected ‘)’ before ‘,’ token
MATHCALL_VEC (exp,, (Mdouble x));
^
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:109:21: error: expected ‘)’ before ‘,’ token
MATHCALL_VEC (log,, (Mdouble x));
^
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:153:21: error: expected ‘)’ before ‘,’ token
MATHCALL_VEC (pow,, (Mdouble x, Mdouble y));
^
error: command 'gcc' failed with exit status 1
----------------------------------------
Rolling back uninstall of psutil
Command "/home/nbuser/anaconda2_20/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-T67_lN/psutil/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-gEaonv-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-T67_lN/psutil/
You are using pip version 9.0.3, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command
However, I was able to install on-premise using the command
pip install featuretools , but on azure jupyter notebook I am unable
tried even !pip install --ignore-installed featuretools but not wokring
Any Help!
Hi guys, I have some problems with circleci (python2.7) validation. I am working on this pull request: Featuretools/featuretools#323
I try to reproduce locally the errors, but I can not. I do something like this:
virtualenv -p python2.7 env
source env/bin/activate
pip install -r test-requirements.txt
make installdeps lint
But, I have no errors. What can I do? Do you have any developer documentation? This is my first time using circleci...
Hi there - I usually post on SO for FT questions, but thought that this discussion might need some more interaction.
Today, I was looking at advanced custom primitives and came across a stackoverflow question: https://stackoverflow.com/questions/53579465/how-to-use-featuretools-to-create-features-from-multiple-columns-in-single-dataf
The user is trying to create a primitive which sums columns conditionally, based on whether the row is within a timedelta. So, sum only cells where the timestamp is within the last 3 days.
I think that this is possible if the user creates a transform primitive, which just outputs the value if the cell is within a time range, and 0 if otherwise. Then, they can use the sum
aggregation primitive.
However, I'm curious to know if this is possible in a single aggregation primitive, or whether there is another mechanism for achieving this. It seems very wasteful to store a column of mostly zeros just to take its sum later on.