Thanks for the quick response, @CamDavidsonPilon . I get your point about by ignoring the censored users who haven’t been there 12 months, we are ignoring, and as churn rate is low, they would be predominately non-churners, so this would add a bias. However, in my analysis dataset, if I only consider users who could have completed 12 months (so there are no censored users with tenure<12) I still see a systematic difference.
If we consider this in the context of survival, how would you measure the survival after 12 months just from the data? As I think this problem would have exactly the same issues.
Thanks for you input, and also for your awesome package!
ValueError: setting an array element with a sequence. I've read from their documentation that "Assignment is hard to support...", but I at this point I can't imagine how it should be rightly implemented.
x_variables (which may cause problems with autograd), I instead chose a list of small matrices.
likvariable is now incrementing as we go.
@CamDavidsonPilon for what it's worth, a short snippet of a slightly misleading error involving pandas.DataFrame.apply that took me a day to debug
task: use Cox to predict event probability for censored items at the time of their current duration
import lifelines as ll import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(0, 100, size=(10, 2)), columns=['regressor', 'duration']) df['event'] = np.random.choice([True, False], 10) display(df) # uncomment to lose the bool and fix the TypeError #df['event'] = df['event'].astype(int) cf = ll.CoxPHFitter() cf.fit(df, duration_col='duration', event_col='event') # select only censored items df = df[df['event'] == 0] func = lambda row: cf.predict_survival_function(row[['regressor']], times=row['duration']) df.apply(func, axis=1)
'misleading' cause it will say the regressor column is non-numerical...
from lifelines import WeibullAFTFitter df['start_time'] = df['start_time'].map(map_to_seconds) df['sin_start_time'] = np.sin(2*np.pi*df['start_time']/seconds_in_day) df['cos_start_time'] = np.cos(2*np.pi*df['start_time']/seconds_in_day) df = df.drop('start_time', axis=1) wf = WeibullAFTFitter().fit(df, "duration") wf.predict_survival_function(df) wf.predict_median(df)
conditional_afterkwarg in the
predict_*methods as well
wf = WeibullAFTFitter().fit(df, "duration")exception throw
idcol in your model
from lifelines import WeibullAFTFitter from lifelines.datasets import load_rossi rossi_dataset = load_rossi() aft = WeibullAFTFitter() aft.fit(rossi_dataset, duration_col='week', event_col='arrest') X = rossi_dataset.loc[:10] aft.predict_survival_function(X)
@julianspaeth depends on the model. Recall that the c-index only depends on ranking of values. For the Cox model, the summing the cumulative hazard won't change the ranking, so it won't matter what you use. For an AFT model, it may change the ranking.
Alternatively, you can choose a point in time, and use the CHF at that