@gabrown, I am able to replicate what you are seeing locally. If I understand correctly, your definition of churn is "fraction of uncensored users who died before 12 months". I think this is going to bias your churn rate up, as you are not taking into account censoring. In an extreme case, where all but one subject is censored, then your def of churn will give 0% or 100%. But, that feels a bit strange, no? If they died early on, and the other subjects were censored later, we should feel that the churn isn't 100%.
Please correct me if I am mistaken, or I am not making sense. Happy to discuss more!
value_and_grad(negative_log_likelihood)in the minimization function, in fitters, helps? Why not simply minimize the
class ParametericAFTRegressionFitter(ParametricRegressionFitter)contains an extra 'e' :D
autogradand while looking at their documentation I've noticed the note saying that they won't develop it further. Have you thought about migrating to JAX?
Thanks for the quick response, @CamDavidsonPilon . I get your point about by ignoring the censored users who haven’t been there 12 months, we are ignoring, and as churn rate is low, they would be predominately non-churners, so this would add a bias. However, in my analysis dataset, if I only consider users who could have completed 12 months (so there are no censored users with tenure<12) I still see a systematic difference.
If we consider this in the context of survival, how would you measure the survival after 12 months just from the data? As I think this problem would have exactly the same issues.
Thanks for you input, and also for your awesome package!
ValueError: setting an array element with a sequence. I've read from their documentation that "Assignment is hard to support...", but I at this point I can't imagine how it should be rightly implemented.
x_variables (which may cause problems with autograd), I instead chose a list of small matrices.
likvariable is now incrementing as we go.
@CamDavidsonPilon for what it's worth, a short snippet of a slightly misleading error involving pandas.DataFrame.apply that took me a day to debug
task: use Cox to predict event probability for censored items at the time of their current duration
import lifelines as ll import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(0, 100, size=(10, 2)), columns=['regressor', 'duration']) df['event'] = np.random.choice([True, False], 10) display(df) # uncomment to lose the bool and fix the TypeError #df['event'] = df['event'].astype(int) cf = ll.CoxPHFitter() cf.fit(df, duration_col='duration', event_col='event') # select only censored items df = df[df['event'] == 0] func = lambda row: cf.predict_survival_function(row[['regressor']], times=row['duration']) df.apply(func, axis=1)
'misleading' cause it will say the regressor column is non-numerical...
from lifelines import WeibullAFTFitter df['start_time'] = df['start_time'].map(map_to_seconds) df['sin_start_time'] = np.sin(2*np.pi*df['start_time']/seconds_in_day) df['cos_start_time'] = np.cos(2*np.pi*df['start_time']/seconds_in_day) df = df.drop('start_time', axis=1) wf = WeibullAFTFitter().fit(df, "duration") wf.predict_survival_function(df) wf.predict_median(df)
conditional_afterkwarg in the
predict_*methods as well
wf = WeibullAFTFitter().fit(df, "duration")exception throw
idcol in your model
from lifelines import WeibullAFTFitter from lifelines.datasets import load_rossi rossi_dataset = load_rossi() aft = WeibullAFTFitter() aft.fit(rossi_dataset, duration_col='week', event_col='arrest') X = rossi_dataset.loc[:10] aft.predict_survival_function(X)
@julianspaeth depends on the model. Recall that the c-index only depends on ranking of values. For the Cox model, the summing the cumulative hazard won't change the ranking, so it won't matter what you use. For an AFT model, it may change the ranking.
Alternatively, you can choose a point in time, and use the CHF at that