When i run this code for generating a specific number of samples from my dataframes everything is fine.
frameslist=[]
j=0
bandwith=250
#zum trainieren
cycle1=0.5
cycle2=1.0
#für die validierung
#cycle1=3.5
#cycle2=4.0
######
while j<len(ALLDATA):
part1=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle1]
part2=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle2]
# part3=ALLDATA[j][0].loc[ALLDATA[j][0]['ist_Zyklus'] == cycle3]
Messung=pd.concat([part1,part2],axis=0)
del Messung["time"]
del Messung["ist_Zyklus"]
x=len(Messung)//bandwith
y=x*bandwith#runde dataframe ab!!!!
Messung=Messung[:y].copy()
i=0
idlist=[]
while i<bandwith:
idlist.append([i]*x)
i=i+1
flattened_list = [y for x in idlist for y in x]
Messung["id_vector"]=flattened_list
frameslist.append(Messung)
print(j)
j=j+1
off course the calculating part differs than a little bit. then i dont have the while loop over the cycle1 iterator.
Does the tsfresh.utilities.dataframe_functions.roll_time_series() function expect each "time" element to be different? In the example provided at https://tsfresh.readthedocs.io/en/latest/text/forecasting.html, this is true. And it also would seem true, since the new ID's that are created are are based off of the timestamp.
How would you use tsfresh.utilities.dataframe_functions.roll_time_series() to create a set of features at each time stamp, using the robot dataset in quick start?
Hi there, another question into the void ;). I get a lot of warnings with select_features
. Some features should be binary, but aren't, such as norm_cost__change_quantiles__f_agg_
, norm_cost__quantile__q_0.4
, etc. Is anyone else getting this? Is this a bug or can it be ignored?
The full text ist WARNING:tsfresh.feature_selection.significance_tests:[target_binary_feature_binary_test] A binary feature should have only values 1 and 0 (incl. True and False). Instead found {0.0, -1.2765957446808511e-05} in feature ''norm_cost__change_quantiles__f_agg_"mean"__isabs_False__qh_0.4__ql_0.0''.
feat = pd.DataFrame(columns= feat.index.tolist())
parameters = from_columns(feat)
kind_to_fc_parameters=parameters
Hi, I was using tsfresh 0.12.2 and i got 794 features from extract_features() method. but, when i updated tsfresh to 0.15.1 I'm getting an error saying "794 fields are required while 756 values are provided". Seems like the new version is returning only 756 features for the same dataset . Can some one assist me on this?
And further, please suggest me a way to filter the best features from the returned features form tsfresh
make_forecasting_frame
function? I have a dataset with columns "date", "shop id" and "quantity sold". How can I generate a forecast frame with tsfresh features?
Hi, I would like to get some pointers in using tsfresh with databricks. I tried to create the grouped dataframe and was trying to run the below command
features = spark_feature_extraction_on_chunk(df_grouped, column_id="id", column_kind="kind", column_sort="date", column_value="value", default_fc_parameters=ComprehensiveFCParameters())
features.show()
This process is not finishing even after 2 hours.. I have 2,800,000 rows in my original dataframe with 13 starting features.
Can someone point me to the right docs to get this working on databricks?
Greetings,
I'm trying to get a custom feature nolds.lyap_e
working with tsfresh, but I don't think it qualifies as either a simple or combiner feature extractor. What it does is calculate n
lyapunov exponents for a 1D timeseries, where n
is variable . So the result should be multi-output ($n$) like a combiner, but it only uses 1 set of params like a simple feature extractor. I can't really think of an easy way to get this working.
Any ideas?
Hi everyone,
I have a problem with reordering my data frame to a form that tsfresh can understand it. There are 2 different links that make me confused:
first the explanation in :
Features dataframe with tsfresh is totally different from the thing described in :
blue-yonder/tsfresh#213
In the first link, one feature from timestep 0 to N is considered as a separate id. BUT in the second link, feature 1 to N in a single timestep is considered as a separate id. Which one is correct?
My case is classifying a time series data frame in which I have only 2 labels (1 and 0) as my target. If I follow the first link, after feature_extraction(), I won't get the equal number of ids (rows) and labels(target) at the end and I won't be able to use the feature_selection() afterward. However, I am not sure about the description in the second link.
Would you plz help me out?
Hey Everyone,
I am doing 24 hour electricity demand forecast and further classify it to predict next day peak hours.
For that I compiled a dataset with several years of hourly electricity demand data for my city together with weather features (all that OpenWeather provides), plus street light on-off schedule, and also 24h hourly forecast of generation and demand for the state we are part of, which are correlated to my target.
Data has 24h, 168h seasonality and autocorrelates with 24h, 168h respectfully.
Did some manual datetime feature engineering like:
<code>
dataset['Hour_cos'] = np.cos(2 np.pi (dt.hour + 1.0) / 24.0)
dataset['Hour_sin'] = np.sin(2 np.pi (dt.hour + 1.0) / 24.0)
dataset['DayOfWeek_cos'] = np.cos(2 np.pi (dt.dayofweek + 1.0) / 7.0)
dataset['DayOfWeek_sin'] = np.sin(2 np.pi (dt.dayofweek + 1.0) / 7.0)
....
</code>
And some rolling aggregates like:
<code>
rolling_features = [
'load', # -> target
'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7',]
rolling_functions = ['min', 'max', 'std', numba_calc_slope]
rolling_windows = [24, 24*7]
for w in rolling_windows:
for feat in rolling_features:
# apply bunch of aggregations at once
df[l] = df[feat].rolling(window=(w)).agg(rolling_functions)
# weighted moving average (WMA)
w_coff = weights[str(w)] # get weighted coeffs from dictionary
df[f'{feat}_wma_{w}'] = df[feat].rolling(window=w).apply(lambda rows: np.dot(rows, w_coff)/w_coff.sum(), raw=True)
# N-day exponential weighted moving average (EWMA)
df[f'{feat}_ewma_{w}'] = df[feat].ewm(span=w, adjust=False, min_periods=0).mean()
...
</code>
I have one dependent target (load) and dozens of independent features with 65k hourly observation of each.
Since tsFresh requires column_id for time series id, and I have one time series , I do something like
<code>
df.loc[:, 'id'] = 0
</code>, right?
Please advise what is the best way to do 24, 168 rolling windows feature calculations with fsFresh?
Thanks