Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
Alsen57
@Alsen57

When i run this code for generating a specific number of samples from my dataframes everything is fine.

frameslist=[]
j=0
bandwith=250
#zum trainieren
cycle1=0.5
cycle2=1.0

#für die validierung
#cycle1=3.5
#cycle2=4.0
######

while j<len(ALLDATA): 

    part1=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle1]
    part2=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle2]
#    part3=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle3]
    Messung=pd.concat([part1,part2],axis=0)

    del Messung["time"]
    del Messung["ist_Zyklus"]
    x=len(Messung)//bandwith
    y=x*bandwith#runde dataframe ab!!!!
    Messung=Messung[:y].copy()
    i=0
    idlist=[]
    while i<bandwith:
        idlist.append([i]*x)
        i=i+1
    flattened_list = [y for x in idlist for y in x]
    Messung["id_vector"]=flattened_list
    frameslist.append(Messung)
    print(j)
    j=j+1

off course the calculating part differs than a little bit. then i dont have the while loop over the cycle1 iterator.

Gilles Vandewiele
@GillesVandewiele
Try it with n_jobs=1 (not 0) (I've heard of issues with multiprocessing and windows before). Further, try to see if it works on a smaller subset of your data. One option, if it works on smaller data, would be to extract features from each ts individually
Alsen57
@Alsen57
thank you very much @GillesVandewiele for the fast response. I am gonna run it over night.
Alsen57
@Alsen57
i get an memory error. So it seems like the id-vector for one the first id or the id´s in general are to long.
It has this shape (24818,2) wouldnt have thought that this is to much for tsfresh
rhagan09
@rhagan09
HI. Would anyone know if it is possible to predict 4 values into the future?
I have a timeseries with values recorded every 15 minutes and would like to be able to predict the values an hour in advance. I have utiliseda tsfresh forecasting
** I have utilized the tsfresh forecasting to predict the next value (and get good results) but would like to know if it possible to predict 4 values in advance? Thank you
Michael Mann
@mmann1123
Does anyone know if it is possible to run feature extraction on a moving window? For instance if I want to capture each 3 month period mean, assuming that i have 24 month of data
Michael Mann
@mmann1123
@mmann1123 I'm thinking that aggregate_on_chunks is what I am looking for, but I am not sure how to apply it when extracting my features.
Iceorfrog
@Iceorfrog
Hello everyone! I'm new here. How can I use tsfresh with windows?Can I ues tsfresh in python 3.7?
Román Rey
@rooom13
Hello everyone, how do I extract features per each date (sorting value)? I read about rolling time series on the documentation but couldn't figure out how. I already have some prepared daa
metaswirl
@metaswirl
Hello everyone, I have a question on the forecasting entry and would be glad if somebody could help me there. In the given example it looks like the old 'id' is discarded while generating the rolling dataset, correct? So that means, that the reference to the oringal object (e.g. the robot in the doc is lost). Doesn't this render the method useless?
flamby
@flamby
Hi everyone, when launching extract_relevant_features in a celery task, I get this error "daemonic processes are not allowed to have children", which is a well known error described here https://stackoverflow.com/questions/54727821/running-threaded-module-with-celery-daemonic-processes-are-not-allowed-to-have
is there anybody having successfully monkeypatched the multiprocessing module used in tsfresh, like here: https://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic#
florianhumblot
@florianhumblot
Can anyone tell me what the difference is between tsfresh.feature_extraction.feature_calculators.sum_of_reoccurring_data_points(x) and tsfresh.feature_extraction.feature_calculators.sum_of_reoccurring_values(x)? IT looks exactly the same to me
thanhd10
@thanhd10
Hello everyone, I once used tsfresh inside a sklearn pipeline to classify sensor data collected by android smartphones over time. My goal now is to do classification directly and locally inside an Android App. While I was doing research on how to get a model running inside an app I came across tensorflow lite. My idea was to create a tensorflow model that does the same classification like my sklearn pipeline and save it as a tensorflow lite model. So my question is: Is it possible to do the feature extraction of tsfresh inside a tensorflow model so I'm able to execute it inside an android app? Thanks in advance.
florianhumblot
@florianhumblot
I'd figure you need to find the relevant features and implement the calculations in your app, would probably also be faster than trying to run python code within your app
Will Flowers
@flowersw

Does the tsfresh.utilities.dataframe_functions.roll_time_series() function expect each "time" element to be different? In the example provided at https://tsfresh.readthedocs.io/en/latest/text/forecasting.html, this is true. And it also would seem true, since the new ID's that are created are are based off of the timestamp.

How would you use tsfresh.utilities.dataframe_functions.roll_time_series() to create a set of features at each time stamp, using the robot dataset in quick start?

metaswirl
@metaswirl

Hi there, another question into the void ;). I get a lot of warnings with select_features. Some features should be binary, but aren't, such as norm_cost__change_quantiles__f_agg_, norm_cost__quantile__q_0.4, etc. Is anyone else getting this? Is this a bug or can it be ignored?

The full text ist WARNING:tsfresh.feature_selection.significance_tests:[target_binary_feature_binary_test] A binary feature should have only values 1 and 0 (incl. True and False). Instead found {0.0, -1.2765957446808511e-05} in feature ''norm_cost__change_quantiles__f_agg_"mean"__isabs_False__qh_0.4__ql_0.0''.

Raubsau
@raubsau_gitlab
@thanhd10 The approach of @florianhumblot sounds reasonable.
It will be more efficient to only extract the relevant features directly on the phone without the Python intermediate and have the TF Lite program classify on that vector.
(Alternatively, you could also try to train directly on the sensor data and use raw data without feature extraction for your application.)
Wahid El Chaar
@wahidelchaar
Hi, I set up a tsfresh Docker image which I am currently using on Amazon SageMaker for training. I used the extract_relevant_features() convenience function (with the EfficientFCParameters) to extract the relevant features and wrote the resulting feature set to S3, then I trained an XGBoost classifier in SageMaker's native XGBoost container. Now suppose that my model is fully trained and I want to make predictions on incoming field data? How do I immediately extract the same features that were found to be relevant during training? Would the only proper solution be to create a sklearn transformer, fit it to the training data, then save it as a pickled file and call a transform on it on the incoming field data?
demontamer
@demontamer
Hi you can save the features to a dataframe and extract just those with kind_to_fc_parameters. Saves yuuuge amt of time.
feat = pd.DataFrame(columns= feat.index.tolist()) parameters = from_columns(feat)
kind_to_fc_parameters=parameters
see: https://tsfresh.readthedocs.io/en/latest/text/feature_extraction_settings.html#for-the-ambitious-how-do-i-set-the-parameters-for-different-type-of-time-series
Wahid El Chaar
@wahidelchaar
@demontamer thanks for your response. I ended up trying this, but got a weird error:
image.png
roy Ian
@royian11_gitlab

Hi, I was using tsfresh 0.12.2 and i got 794 features from extract_features() method. but, when i updated tsfresh to 0.15.1 I'm getting an error saying "794 fields are required while 756 values are provided". Seems like the new version is returning only 756 features for the same dataset . Can some one assist me on this?

And further, please suggest me a way to filter the best features from the returned features form tsfresh

🅼🆄🅰🅰🅳🅾 ™
@Muaado0_twitter
Hi. i am trying to do a rolling window where the Y column of all previous quarters are rolled
but when i do this, it creates a data leak.
my dataset has 6 years monthly sales data.
I want to do a rolling mean which for every row considering all quarters of the previous years
OJL96
@OJL96
hello, just wanted to say thank you to everyone who contributed to the tsfresh package. Its been extremely helpful for my final year project; classifying LFP signals acquired from neural probes . Was not looking forward to doing manual feature extraction so this has saved me a lot of time :)
OJL96
@OJL96
Also i do have a question: how does one manual select which features to calculate?
liujie0903
@liujie0903
Hi,I am trying to do a regression prediction, and want to use trfresh to extract the relevant features, the example robot failures is a classification problem, by doing: df, y = load_robot_execution_failures(), it get a dataframe df and the row is 1320, and a series y which includes 88 rows , that means it has 88 ids and 15 time stamp. But if my regression proble also have 88 ids and 15 time stamp and each time stamp has a target y value , that means there are also 1320 values in targer y. how can I get a series y only includes 88 values, so that I can use extract_features module to eatrct features. Thanks!
Arman
@techtide
Hi, I don't know how active this Gitter is, so I posted my question as an issue. blue-yonder/tsfresh#682 I'd greatly appreciate any help. Thanks for the library
Surya Krishnamurthy
@SuryaThiru
Hi. Is the gitter active?
Can someone help me with usign the make_forecasting_frame function? I have a dataset with columns "date", "shop id" and "quantity sold". How can I generate a forecast frame with tsfresh features?
Navaneeth Sen
@Navaneethsen

Hi, I would like to get some pointers in using tsfresh with databricks. I tried to create the grouped dataframe and was trying to run the below command

features = spark_feature_extraction_on_chunk(df_grouped, column_id="id", column_kind="kind", column_sort="date", column_value="value", default_fc_parameters=ComprehensiveFCParameters())
features.show()

This process is not finishing even after 2 hours.. I have 2,800,000 rows in my original dataframe with 13 starting features.

Can someone point me to the right docs to get this working on databricks?

jsnleong
@jsnleong
Hi everyone, happy to be part of the community. I have some issues with time-sseries forecasting using TSFresh.
I have data spanning over 3 years but the data points are not equally spaced. I would like to create windows of 2 weeks each. However, for the notebook examples on your site, I realised that it is creating windows of different sizes from my min_timestamp to max_timestamp. Eg. if my min=5, max=20, it creates a window of 1 - 20 Jan, 2 - 20 Jan, 3 - 20 Jan, ... etc up to 20 - 20 Jan.
Why does it create such a window? I was hoping it creates a single window 1 - 20 Jan, then subsequent windows would be 2 - 21 Jan, etc
Hope someone can advice on this, thank you so much
Ghayth AlMahadin
@ghayth82
the equation used to calculate cid_ce is different from the original paper as the upper limit of sum is (n-1) in the paper while you put (n-2lag) in the documentation?
any idea why?
binxiaoxiaobin
@binxiaoxiaobin
hello guys, there are an exception like"malformed node or string" thrown, I got the sklearn pipeline to do the work, it is said that the reason maybe is the featurenames use quotes, like "z_earth_useragg_linear_trendfagg"var"_chunk_len_50__attr"slope"" . Now, I want to know how to fix this. thx a lot.
Christian Hacker
@christianhacker

Greetings,

I'm trying to get a custom feature nolds.lyap_e working with tsfresh, but I don't think it qualifies as either a simple or combiner feature extractor. What it does is calculate n lyapunov exponents for a 1D timeseries, where n is variable . So the result should be multi-output ($n$) like a combiner, but it only uses 1 set of params like a simple feature extractor. I can't really think of an easy way to get this working.

Any ideas?

ff12353
@ff12353
Hello everyone, I have following problem with feature extraction: 'MultiprocessingDistributor' object has no attribute 'pool'. Maybe someone can tell me why this error message comes up.
jillyu3
@jillyu3
Hello everyone, can someone explain the p-value that is calculated in the back ground of the feature selection function?
Marble90
@Marble90

Hi everyone,

I have a problem with reordering my data frame to a form that tsfresh can understand it. There are 2 different links that make me confused:
first the explanation in :
Features dataframe with tsfresh is totally different from the thing described in :
blue-yonder/tsfresh#213
In the first link, one feature from timestep 0 to N is considered as a separate id. BUT in the second link, feature 1 to N in a single timestep is considered as a separate id. Which one is correct?

My case is classifying a time series data frame in which I have only 2 labels (1 and 0) as my target. If I follow the first link, after feature_extraction(), I won't get the equal number of ids (rows) and labels(target) at the end and I won't be able to use the feature_selection() afterward. However, I am not sure about the description in the second link.

Would you plz help me out?

Rmk17
@Rmk17

Hey Everyone,

I am doing 24 hour electricity demand forecast and further classify it to predict next day peak hours.

For that I compiled a dataset with several years of hourly electricity demand data for my city together with weather features (all that OpenWeather provides), plus street light on-off schedule, and also 24h hourly forecast of generation and demand for the state we are part of, which are correlated to my target.

Data has 24h, 168h seasonality and autocorrelates with 24h, 168h respectfully.

Did some manual datetime feature engineering like:
<code>
dataset['Hour_cos'] = np.cos(2 np.pi (dt.hour + 1.0) / 24.0)
dataset['Hour_sin'] = np.sin(2 np.pi (dt.hour + 1.0) / 24.0)

dataset['DayOfWeek_cos'] = np.cos(2 np.pi (dt.dayofweek + 1.0) / 7.0)
dataset['DayOfWeek_sin'] = np.sin(2 np.pi (dt.dayofweek + 1.0) / 7.0)
....
</code>

And some rolling aggregates like:
<code>
rolling_features = [
'load', # -> target
'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7',]
rolling_functions = ['min', 'max', 'std', numba_calc_slope]
rolling_windows = [24, 24*7]

for w in rolling_windows:
for feat in rolling_features:

# apply bunch of aggregations at once 
    df[l] = df[feat].rolling(window=(w)).agg(rolling_functions) 

    # weighted moving average (WMA)
    w_coff = weights[str(w)] # get weighted coeffs from dictionary 
    df[f'{feat}_wma_{w}'] = df[feat].rolling(window=w).apply(lambda rows: np.dot(rows, w_coff)/w_coff.sum(), raw=True)

    # N-day exponential weighted moving average (EWMA)
    df[f'{feat}_ewma_{w}'] = df[feat].ewm(span=w, adjust=False, min_periods=0).mean()
...

</code>

I have one dependent target (load) and dozens of independent features with 65k hourly observation of each.
Since tsFresh requires column_id for time series id, and I have one time series , I do something like
<code>
df.loc[:, 'id'] = 0
</code>, right?

Please advise what is the best way to do 24, 168 rolling windows feature calculations with fsFresh?
Thanks

avivples
@avivples

Hey everyone

I was given a train set with labels (it's a timeseries classification problem) and a test set with no labels. How can I use the feature extraction on my test set after I run it on my train set if I have no labels in the test set?

Thanks in advance

小鱼小鱼
@LiYiShen
Hi everyone! Feature extraction from training set for training model, then how to extract the same features from test set for validation?