Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Gilles Vandewiele
@GillesVandewiele
Hi
Oops sorry bout that. @Alsen57 are you on Windows? Have you tried setting n_jobs=1 ?
Alsen57
@Alsen57
@GillesVandewiele i already tested with n_jobs=0 and yes i am on Windows. I got an Memory Error when i let it run over night.
Alsen57
@Alsen57
frameslist=[]
j=0
while j<len(ALLDATA):    
    Messung=ALLDATA[j][0]
    Messung=Messung.dropna()
    del Messung["time"]

    Messung=Messung.drop(Messung.index[0])
    id_vector=Messung["ist_Zyklus"]
    del Messung["ist_Zyklus"]
    id_vector=id_vector*2

    Messung["id_vector"]=id_vector
    Messung=Messung.reset_index(drop=True)

    frameslist.append(Messung)
    j=j+1
    print(j)
#################
featureslist=[]
i=0
cycle1=1
while i<len(frameslist):


    while cycle1<=20:
        part=Messung.loc[Messung['id_vector'] == cycle1]
        features=tf.extract_features(part,column_id="id_vector",n_jobs=4)
        featureslist.append(features)
        cycle1=cycle1+1

    print(i)
    i=i+1
so this is my whole code. the first part is for turning the cycles from 0.5-10 to 1-20 and append them to a list. then i calculate the features from each dataframe of each cycle .
Alsen57
@Alsen57

When i run this code for generating a specific number of samples from my dataframes everything is fine.

frameslist=[]
j=0
bandwith=250
#zum trainieren
cycle1=0.5
cycle2=1.0

#für die validierung
#cycle1=3.5
#cycle2=4.0
######

while j<len(ALLDATA): 

    part1=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle1]
    part2=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle2]
#    part3=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle3]
    Messung=pd.concat([part1,part2],axis=0)

    del Messung["time"]
    del Messung["ist_Zyklus"]
    x=len(Messung)//bandwith
    y=x*bandwith#runde dataframe ab!!!!
    Messung=Messung[:y].copy()
    i=0
    idlist=[]
    while i<bandwith:
        idlist.append([i]*x)
        i=i+1
    flattened_list = [y for x in idlist for y in x]
    Messung["id_vector"]=flattened_list
    frameslist.append(Messung)
    print(j)
    j=j+1

off course the calculating part differs than a little bit. then i dont have the while loop over the cycle1 iterator.

Gilles Vandewiele
@GillesVandewiele
Try it with n_jobs=1 (not 0) (I've heard of issues with multiprocessing and windows before). Further, try to see if it works on a smaller subset of your data. One option, if it works on smaller data, would be to extract features from each ts individually
Alsen57
@Alsen57
thank you very much @GillesVandewiele for the fast response. I am gonna run it over night.
Alsen57
@Alsen57
i get an memory error. So it seems like the id-vector for one the first id or the id´s in general are to long.
It has this shape (24818,2) wouldnt have thought that this is to much for tsfresh
rhagan09
@rhagan09
HI. Would anyone know if it is possible to predict 4 values into the future?
I have a timeseries with values recorded every 15 minutes and would like to be able to predict the values an hour in advance. I have utiliseda tsfresh forecasting
** I have utilized the tsfresh forecasting to predict the next value (and get good results) but would like to know if it possible to predict 4 values in advance? Thank you
Michael Mann
@mmann1123
Does anyone know if it is possible to run feature extraction on a moving window? For instance if I want to capture each 3 month period mean, assuming that i have 24 month of data
Michael Mann
@mmann1123
@mmann1123 I'm thinking that aggregate_on_chunks is what I am looking for, but I am not sure how to apply it when extracting my features.
Iceorfrog
@Iceorfrog
Hello everyone! I'm new here. How can I use tsfresh with windows?Can I ues tsfresh in python 3.7?
Román Rey
@rooom13
Hello everyone, how do I extract features per each date (sorting value)? I read about rolling time series on the documentation but couldn't figure out how. I already have some prepared daa
metaswirl
@metaswirl
Hello everyone, I have a question on the forecasting entry and would be glad if somebody could help me there. In the given example it looks like the old 'id' is discarded while generating the rolling dataset, correct? So that means, that the reference to the oringal object (e.g. the robot in the doc is lost). Doesn't this render the method useless?
flamby
@flamby
Hi everyone, when launching extract_relevant_features in a celery task, I get this error "daemonic processes are not allowed to have children", which is a well known error described here https://stackoverflow.com/questions/54727821/running-threaded-module-with-celery-daemonic-processes-are-not-allowed-to-have
is there anybody having successfully monkeypatched the multiprocessing module used in tsfresh, like here: https://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic#
florianhumblot
@florianhumblot
Can anyone tell me what the difference is between tsfresh.feature_extraction.feature_calculators.sum_of_reoccurring_data_points(x) and tsfresh.feature_extraction.feature_calculators.sum_of_reoccurring_values(x)? IT looks exactly the same to me
thanhd10
@thanhd10
Hello everyone, I once used tsfresh inside a sklearn pipeline to classify sensor data collected by android smartphones over time. My goal now is to do classification directly and locally inside an Android App. While I was doing research on how to get a model running inside an app I came across tensorflow lite. My idea was to create a tensorflow model that does the same classification like my sklearn pipeline and save it as a tensorflow lite model. So my question is: Is it possible to do the feature extraction of tsfresh inside a tensorflow model so I'm able to execute it inside an android app? Thanks in advance.
florianhumblot
@florianhumblot
I'd figure you need to find the relevant features and implement the calculations in your app, would probably also be faster than trying to run python code within your app
Will Flowers
@flowersw

Does the tsfresh.utilities.dataframe_functions.roll_time_series() function expect each "time" element to be different? In the example provided at https://tsfresh.readthedocs.io/en/latest/text/forecasting.html, this is true. And it also would seem true, since the new ID's that are created are are based off of the timestamp.

How would you use tsfresh.utilities.dataframe_functions.roll_time_series() to create a set of features at each time stamp, using the robot dataset in quick start?

metaswirl
@metaswirl

Hi there, another question into the void ;). I get a lot of warnings with select_features. Some features should be binary, but aren't, such as norm_cost__change_quantiles__f_agg_, norm_cost__quantile__q_0.4, etc. Is anyone else getting this? Is this a bug or can it be ignored?

The full text ist WARNING:tsfresh.feature_selection.significance_tests:[target_binary_feature_binary_test] A binary feature should have only values 1 and 0 (incl. True and False). Instead found {0.0, -1.2765957446808511e-05} in feature ''norm_cost__change_quantiles__f_agg_"mean"__isabs_False__qh_0.4__ql_0.0''.

Raubsau
@raubsau_gitlab
@thanhd10 The approach of @florianhumblot sounds reasonable.
It will be more efficient to only extract the relevant features directly on the phone without the Python intermediate and have the TF Lite program classify on that vector.
(Alternatively, you could also try to train directly on the sensor data and use raw data without feature extraction for your application.)
Wahid El Chaar
@wahidelchaar
Hi, I set up a tsfresh Docker image which I am currently using on Amazon SageMaker for training. I used the extract_relevant_features() convenience function (with the EfficientFCParameters) to extract the relevant features and wrote the resulting feature set to S3, then I trained an XGBoost classifier in SageMaker's native XGBoost container. Now suppose that my model is fully trained and I want to make predictions on incoming field data? How do I immediately extract the same features that were found to be relevant during training? Would the only proper solution be to create a sklearn transformer, fit it to the training data, then save it as a pickled file and call a transform on it on the incoming field data?
demontamer
@demontamer
Hi you can save the features to a dataframe and extract just those with kind_to_fc_parameters. Saves yuuuge amt of time.
feat = pd.DataFrame(columns= feat.index.tolist()) parameters = from_columns(feat)
kind_to_fc_parameters=parameters
see: https://tsfresh.readthedocs.io/en/latest/text/feature_extraction_settings.html#for-the-ambitious-how-do-i-set-the-parameters-for-different-type-of-time-series
Wahid El Chaar
@wahidelchaar
@demontamer thanks for your response. I ended up trying this, but got a weird error:
image.png
roy Ian
@royian11_gitlab

Hi, I was using tsfresh 0.12.2 and i got 794 features from extract_features() method. but, when i updated tsfresh to 0.15.1 I'm getting an error saying "794 fields are required while 756 values are provided". Seems like the new version is returning only 756 features for the same dataset . Can some one assist me on this?

And further, please suggest me a way to filter the best features from the returned features form tsfresh

🅼🆄🅰🅰🅳🅾 ™
@Muaado0_twitter
Hi. i am trying to do a rolling window where the Y column of all previous quarters are rolled
but when i do this, it creates a data leak.
my dataset has 6 years monthly sales data.
I want to do a rolling mean which for every row considering all quarters of the previous years
OJL96
@OJL96
hello, just wanted to say thank you to everyone who contributed to the tsfresh package. Its been extremely helpful for my final year project; classifying LFP signals acquired from neural probes . Was not looking forward to doing manual feature extraction so this has saved me a lot of time :)
OJL96
@OJL96
Also i do have a question: how does one manual select which features to calculate?
liujie0903
@liujie0903
Hi,I am trying to do a regression prediction, and want to use trfresh to extract the relevant features, the example robot failures is a classification problem, by doing: df, y = load_robot_execution_failures(), it get a dataframe df and the row is 1320, and a series y which includes 88 rows , that means it has 88 ids and 15 time stamp. But if my regression proble also have 88 ids and 15 time stamp and each time stamp has a target y value , that means there are also 1320 values in targer y. how can I get a series y only includes 88 values, so that I can use extract_features module to eatrct features. Thanks!
Arman
@techtide
Hi, I don't know how active this Gitter is, so I posted my question as an issue. blue-yonder/tsfresh#682 I'd greatly appreciate any help. Thanks for the library
Surya Krishnamurthy
@SuryaThiru
Hi. Is the gitter active?
Can someone help me with usign the make_forecasting_frame function? I have a dataset with columns "date", "shop id" and "quantity sold". How can I generate a forecast frame with tsfresh features?
Navaneeth Sen
@Navaneethsen

Hi, I would like to get some pointers in using tsfresh with databricks. I tried to create the grouped dataframe and was trying to run the below command

features = spark_feature_extraction_on_chunk(df_grouped, column_id="id", column_kind="kind", column_sort="date", column_value="value", default_fc_parameters=ComprehensiveFCParameters())
features.show()

This process is not finishing even after 2 hours.. I have 2,800,000 rows in my original dataframe with 13 starting features.

Can someone point me to the right docs to get this working on databricks?

jsnleong
@jsnleong
Hi everyone, happy to be part of the community. I have some issues with time-sseries forecasting using TSFresh.
I have data spanning over 3 years but the data points are not equally spaced. I would like to create windows of 2 weeks each. However, for the notebook examples on your site, I realised that it is creating windows of different sizes from my min_timestamp to max_timestamp. Eg. if my min=5, max=20, it creates a window of 1 - 20 Jan, 2 - 20 Jan, 3 - 20 Jan, ... etc up to 20 - 20 Jan.
Why does it create such a window? I was hoping it creates a single window 1 - 20 Jan, then subsequent windows would be 2 - 21 Jan, etc
Hope someone can advice on this, thank you so much
Ghayth AlMahadin
@ghayth82
the equation used to calculate cid_ce is different from the original paper as the upper limit of sum is (n-1) in the paper while you put (n-2lag) in the documentation?
any idea why?
binxiaoxiaobin
@binxiaoxiaobin
hello guys, there are an exception like"malformed node or string" thrown, I got the sklearn pipeline to do the work, it is said that the reason maybe is the featurenames use quotes, like "z_earth_useragg_linear_trendfagg"var"_chunk_len_50__attr"slope"" . Now, I want to know how to fix this. thx a lot.
Christian Hacker
@christianhacker

Greetings,

I'm trying to get a custom feature nolds.lyap_e working with tsfresh, but I don't think it qualifies as either a simple or combiner feature extractor. What it does is calculate n lyapunov exponents for a 1D timeseries, where n is variable . So the result should be multi-output ($n$) like a combiner, but it only uses 1 set of params like a simple feature extractor. I can't really think of an easy way to get this working.

Any ideas?

ff12353
@ff12353
Hello everyone, I have following problem with feature extraction: 'MultiprocessingDistributor' object has no attribute 'pool'. Maybe someone can tell me why this error message comes up.