@CamDavidsonPilon Let me know if you have any suggestions for question below:
Hi, I am using CoxPHFitter with IPS weights and
robust=True
flag. However, the fit is taking really long time to finish. I have about million instances and 6 features in my dataset. Let me know if slower runtime is expected in weighted version and what can be done to speed it up.
Hi all. I've somewhat new to using lifelines, and in using the CoxPHFitter, when I run check_assumptions
, I end up with an error that reads as follows: /RuntimeWarning: overflow encountered in exp scores = weights * np.exp(np.dot(X, self.params_))
Any suggestions on dealing with this issue? I'm starting down the road of normalization, but I'm not sure if that's 100% correct.
i just raised your NotImplementedError
for conditional_after
for CoxPH. it felt like running into a wall after reading about the new argument in the docs. i even bit the bullet and switched from the conda to the pip package ;)
your commit message does not sound to hopeful for that one, are you still working on it?
ps: still an awesome library
Hi @CamDavidsonPilon, I am new to survival analysis and am using it for trying to predict customer churn. I created a model using CoxPHFitter and I wanted to evaluate how well the model performed by comparing the survival after 12 months (using the correct row from predict_survival_function output), to the observed churn rate (1-survival rate). I noticed that it is consistently getting a higher survival rate compared to actual (~10%).
I paired back the model so that it was only based off the baseline hazard (passed no extra variables) and I still get a difference in survival rates.
I have tried this on open data, and can reproduce the result:
import lifelines
import numpy as np
import pandas as pd
churn_data = pd.read_csv('https://raw.githubusercontent.com/'
'treselle-systems/customer_churn_analysis/'
'master/WA_Fn-UseC_-Telco-Customer-Churn.csv')
event_col = 'Churn'
duration_col = 'tenure'
churn_data[event_col] = churn_data[event_col].map({'No':0, 'Yes':1})
churn_data_example = churn_data[[event_col, duration_col]]
cph = lifelines.CoxPHFitter()
cph.fit(churn_data[[event_col, duration_col]], duration_col=duration_col, event_col=event_col)
# cph.print_summary()
# get predicted churn:
unconditioned_sf = cph.predict_survival_function(churn_data_example)
predicted_survival = unconditioned_sf[[0]].T[12.0][0]
predicted_churn = 1 - predicted_survival
#Create churn at tenure = 12: logic is
# if tenure > 12 then they didnt churn => churn_12 =0;
# if they have tenure < 12 and churn=1, then the churn_12 =1;
# if tenure < 12 and churn=0, dont know if they churn => churn_12 = np.nan
churn_data_example['churn_12'] = churn_data_example['Churn']
churn_data_example.loc[(churn_data_example.tenure < 12) & (churn_data_example.churn_12 == 0), 'churn_12'] = np.nan
churn_data_example.loc[(churn_data_example.tenure > 12) , 'churn_12'] = 0
actual_churn = churn_data_example['churn_12'].mean()
print(f'actual churn: {round(actual_churn,2)}')
print(f'predicted churn: {round(predicted_churn,2)}')
print(f'ratio: {round(predicted_churn/actual_churn,2)}')
The results are:
actual churn: 0.17
predicted churn: 0.15
ratio: 0.89
And it deviates further as tenure increases.
Have you got any idea why I am seeing the behaviour? I feel it is either to do with me not understanding what predict_survival_function returns, or I am mis calculating the ‘actual churn’?