Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
Activity
Niranjan Ravichandra
@nravic
Also going off the survival regression chapter in the wiki, each of my observations are obtained daily. Does the fact that the duration in my data is just 1 matter?
Cameron Davidson-Pilon
@CamDavidsonPilon
@nravic I think you can use lifelines, but you're in the realm of recurrent events, which lifelines has only a little support for (there may be another package out there?). Since you have daily snapshots, you probably want to use time-varying regression: https://lifelines.readthedocs.io/en/latest/Time%20varying%20survival%20regression.html
Niranjan Ravichandra
@nravic
Great, thanks @CamDavidsonPilon ! I'll look into this and also see if there's anything around for recurrent events.
Niranjan Ravichandra
@nravic
@CamDavidsonPilon I have data granular down to the second too however. would that see a better use from lifelines or am I better off looking elsewhere?
d-seki
@d-seki
Thank you very much for creating this wonderful tool for statistics. Let me ask you a subtle question. What null hypothesis are you assuming for CoxTimeVaryingFitter? I guess it is for beta to be zero. Best,
Julian Späth
@julianspaeth
Hey, I have a question concerning the concordance_index. I want to use my predicted cumulative hazard functions to compute the concordance_index and use them as predicted_scores. Is it the right way to sum up the chf of each sample and take the negative of it to compute the concordance_index on the basis of the cumulative hazard functions?
Youyang
@zxclcsq
Hello. I'm trying to replicate the Weibull AFT model prediction section in the lifelines docs, but the return is all NANs from the predict_survival_function. Any thoughts on this? The code I used is :
from lifelines import WeibullAFTFitter

aft = WeibullAFTFitter()
aft.fit(rossi_dataset, duration_col='week', event_col='arrest')

X = rossi_dataset.loc[:10]

aft.predict_survival_function(X)
Cameron Davidson-Pilon
@CamDavidsonPilon
Ahh sorry about the delay folks! I don't check this daily, and gitter didn't end me emails
@d-seki yes that's right, NH is that beta == 0

@julianspaeth depends on the model. Recall that the c-index only depends on ranking of values. For the Cox model, the summing the cumulative hazard won't change the ranking, so it won't matter what you use. For an AFT model, it may change the ranking.

Alternatively, you can choose a point in time, and use the CHF at that

@zxclcsq not good! Looks like I broke something...
I'll investigate asap
Cameron Davidson-Pilon
@CamDavidsonPilon
@zxclcsq for now, you must specify the times argument in predict_survival_function
Cameron Davidson-Pilon
@CamDavidsonPilon
The fix is in master: CamDavidsonPilon/lifelines@085258e
hpham04
@hpham04
Hello everyone, i just tried to play with lifelines. I look into some examples but still do not understand. As far as I understand, after training we should have a way to save the model, then we can use this model immediately without re-training model. Can you please help to advise
hpham04
@hpham04
Cameron Davidson-Pilon
@CamDavidsonPilon
@hpham04 yup that is - let me know if you have other questions or that doesn't work.
Cameron Davidson-Pilon
@CamDavidsonPilon
Is anyone experiencing problems installing / upgrading lifelines? Let me know!
d-seki
@d-seki
@CamDavidsonPilon Thanks very much!
Cameron Davidson-Pilon
@CamDavidsonPilon
I got conda forge working again, so we should start to see simultaneous conda & pypi releases again
Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: Also, new minor release with some useful bug fixes: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.9
Bojan Kostic
@bkos
@CamDavidsonPilon I see there are estimators for cumulative hazard function, and it is as well in your mathematical links between entities diagram (nice one, BTW). What's the point (/advantage?) of introducing/estimating CHF in our survival analysis? It seems that all we need is hazard and survival functions, which have a direct transform. I can't explain the meaning of CHF, it doesn't bring anything, seems redundant... I'm reading about deep survival models (there's lots of papers and code lately) and they hardly mention it...
Cameron Davidson-Pilon
@CamDavidsonPilon
@bkos good question. A few points / advantages: i) The CHF is easier to estimate (less variance) than the hazard ii) The CHF, and the HF, are present in the likelihood equation for survival models, see equation (2.5) in https://cran.r-project.org/web/packages/flexsurv/vignettes/flexsurv.pdf iii) because of the "ease of differentiation" vs "hardness of integration", specifying the CHF and working out the HF is easier than the other way around, iv) it's 1-1 with the SF, that is, SF = exp(-CHF).
Bojan Kostic
@bkos
Thanks a lot, @CamDavidsonPilon! Is the equation you mentioned used in lifelines for some models? With it we don't lose any information, but it's different from the Cox partial likelihood, which includes only uncensored observations and softmax terms...
i completely missed that one, thx a lot!
mitchgallerstein-toast
@mitchgallerstein-toast
Has anyone had the issue where you get a "ZeroDivisionError: float division by zero" when using the CoxTimeVaryingFitter?
We originally thought it had to do with having multiple events with the same duration but that doesnt seem to be the case.
mitchgallerstein-toast
@mitchgallerstein-toast
This seems to be the problem! Does anyone know how we would get around this until it is fixed?
Cameron Davidson-Pilon
@CamDavidsonPilon
@bkos yup, that equation is the basis of parametric models (you're right that it's not used in the Cox model)
@mitchgallerstein-toast hm, this sounds similar to the issue here: CamDavidsonPilon/lifelines#768
:wave: also minor release with some bug fixes: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.10
kdkaiser
@kdkaiser
hello! Im somewhat new to survival analysis, and I havent found any resources explaining why convergence would be poor when I have a variable that correlates strongly with being censored or not - it isnt correlated with the time to event for the uncensored data. I have a very small data set, and when I bootstrap sample it many times, I end up with combinations of the data where certain of my boolean variables correlate with the censoring variable. The link that lifelines provides is related to logistic regression, where a variable correlates strongly with the class label that you are trying to predict/model, which seems different than what is happening with survival analysis...thanks for any pointers!!
Im also curious what type of model CPHFitter uses for the baseline, but didnt see that in the documentation
Cameron Davidson-Pilon
@CamDavidsonPilon
@kdkaiser for your second question, it's the Breslow method, see https://stats.stackexchange.com/questions/46532/cox-baseline-hazard
Cameron Davidson-Pilon
@CamDavidsonPilon
Your first question is really good, and I thought about it for a while
Take a look at the Cox log-likelihood:
$ll(\beta) = \sum_{i:C_i = 1} X_i \beta - \log{\sum_{j: Y_i \ge Y_j} \theta_j}$
Suppose, in an extreme case, that X_i = C_i, that is, we have a single column that is equal to the E vector. Then the first sum is equal to:
$\sum_{i:C_i=1} X_i \beta = \sum_{i:C_i=1} C_i \beta = \sum_{i:C_i=1} \beta$
so to maximize the $ll$, we can just make $\beta$ as large as possible!
Cameron Davidson-Pilon
@CamDavidsonPilon
And this is what an optimization algorithm will do if you have a column that has too high of a correlation with E
kdkaiser
@kdkaiser
@CamDavidsonPilon Thank you!! Im familiar with a slightly different notation so I'll work through it on my end too, but what you wrote makes sense. I appreciate your help!
Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: Good morning, a new lifelines release has just been released. Some small API changes, but lots of QOL improvements: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.23.0
Bojan Kostic
@bkos
I see the Brier score is used by some people to measure the goodness-of-fit of survival models. As lifelines containes many useful functions, is there any specific reason why the Brier score is not included? It's present in scikit-learn, but not used in any examples in lifelines...
Cameron Davidson-Pilon
@CamDavidsonPilon
@bkos mostly because I haven't gotten around to implementing it. I think it's a good measure, and should be included.
A N
@aleva85
Hi Cam, sometime ago we discussed the undocumented use of _log_likelihood. I see you now added a log_likelihood attribute (thanks!). It works for some models but not for the ExponentialFitter, which throws an AttributeError for log_likelihood and a deprecation warning for _log_likelihood. Just a heads up, hope i didn't do something wrong on my side
Cameron Davidson-Pilon
@CamDavidsonPilon
whoops! Thanks for the heads up! I'll fix that shortly
@aleva85 actually, can you try log_likelihood_?