##### Activity
Cameron Davidson-Pilon
@CamDavidsonPilon
yea, I can recreate your error locally. Do .values.astype(float)
So, did you use the function datetimes_to_durations from lifelines.utils to transform your data?
ningeo
@ningeo
no, I have a start datetime and an end datetime, so it's actually just a df['end'] - df['start
']
Cameron Davidson-Pilon
@CamDavidsonPilon
I'll add some error handling however, so this error is easier to fix in the future
ningeo
@ningeo
they're proper datetimes, so all of the above?
Cameron Davidson-Pilon
@CamDavidsonPilon
(oops I deleted my message by mistake)
ningeo
@ningeo
it's working now, much appreciated!
Cameron Davidson-Pilon
@CamDavidsonPilon
(it was: try using datetime_to_durations - part of the problem of df['end'] - df['start'] is it is ambiguous what the measurement scale is: days, hours, minutes, etc.?)
Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: just released a minor version, 0.20.2 - support for left-censoring and qqplots. See change log here: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.20.2
Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: just released a minor version, 0.20.3, that adds some quality-of-life improvements for Kaplan-Meier users. https://github.com/CamDavidsonPilon/lifelines/releases
githubhsss
@githubhsss
@CamDavidsonPilon Thanks for your great work! It 's amazing!
Will there be a Weibull proportional hazard model in the future? Or any advice about building WPHM in python?
Thanks again!!
Cameron Davidson-Pilon
@CamDavidsonPilon
Correct me if I am wrong, but Weibull AFT == Weibull PH, no?
Maybe I should add this to the docs, though
Cameron Davidson-Pilon
@CamDavidsonPilon
cc @githubhsss ^
Paul Zivich
@pzivich
They are equivalent, but there is a formula to convert between the estimate from Weibull AFT and Weibull HR. Where \beta_{PH} = - \beta_{AFT} * \sigma where \sigma is the scale (depends on how you have the Weibull factored as. I think lifelines might be slightly different)
fuyb1992
@fuyb1992
I want to get the interval of predicted median value for Weibull model, and I write some codes to get it, but I'm not sure if this is corret, here is my code:
class MyWeibullFitter(WeibullFitter): @property def median_confidence_interval_(self): '''get the confidence interval of the median, must call after fit and plot''' if self.median_ != np.inf: self.timeline = np.linspace(self.median_, self.median_, 1) return self.confidence_interval_survival_function_ else: return None
githubhsss
@githubhsss
@CamDavidsonPilon May I ask which book is the screenshot of Figure 4.1 from? Newbie at survival analysis and want to learn more~
Cameron Davidson-Pilon
@CamDavidsonPilon
@githubhsss it's from a thesis, which is a pretty nice intro to a lot of common models: https://harvest.usask.ca/bitstream/handle/10388/etd-03302009-140638/JiezhiQiThesis.pdf
githubhsss
@githubhsss
@CamDavidsonPilon Thanks a lot!
Cameron Davidson-Pilon
@CamDavidsonPilon
@fuyb1992 you can do something like this:
from lifelines.utils import median_survival_times

median_survival_times(self.confidence_interval_survival_function_)
Cameron Davidson-Pilon
@CamDavidsonPilon
(though it is pretty efficient, just not most efficient)
This actually isn't the most efficient way to compute the confidence intervals, but I think I'll expose a better way in the future
Cameron Davidson-Pilon
@CamDavidsonPilon
efficiency in the statistical sense, not performance
fuyb1992
@fuyb1992
@CamDavidsonPilon Thank you for your answer!!! I tried your answer, it only works for data with S(t)<=0.5 and return days interval, but for data with S(t)>0.5 return None .
fuyb1992
@fuyb1992
@CamDavidsonPilon I'am new to survival analysis, excuse me please if I'm wrong. I'm confused after reading wiki and papers about the confidence interval of survival function for parameter models, it would be a great help if you can give some references or documents about that!! Thanks a lot!
Cameron Davidson-Pilon
@CamDavidsonPilon
@fuyb1992 you can do something like this:
from lifelines.utils import median_survival_times
median_survival_times(self.confidence_interval_survival_function_)
fuyb1992
@fuyb1992
@CamDavidsonPilon Thank you for your answer!!! I tried your answer, it only works for data with S(t)<=0.5 and return days interval, but for data with S(t)>0.5 return None .
'''
Cameron Davidson-Pilon
@CamDavidsonPilon
efficient as in "statistical efficiency", not peformance
githubhsss
@githubhsss
@CamDavidsonPilon
Thanks for sharing the thesis again~
I'm dealing with some repeated events data(machine failure time data). Since a machine may have several failures and different machines have different number of failures, I think it's necessary to consider about repeated events and heterogeneity. Will frailty models help? Or any other advice? (^_^)/
Cameron Davidson-Pilon
@CamDavidsonPilon
Yuck, Gitter is being messy and posting my edited messages much later than originally posted. sorry sorry
@fuyb1992 ah yes, you may want to keep your if self.median_ != np.inf check
@githubhsss frailty, is one solution, though it's not in lifelines (but is in R's survival). Another option is to use cluster_col is CoxPHFitter: https://lifelines.readthedocs.io/en/latest/Examples.html#correlations-between-subjects-in-a-cox-model. Another solution is to strata-ify per machine in the CoxPHFitter.
fuyb1992
@fuyb1992
fuyb1992
@fuyb1992
Thanks a lot! I'm trying to understand the confidence interval of survival function for parameter models, the Taylor expansions method is mentioned a lot , and the Jacobian-vector product is used in lifelines code. I'm confused with the relationship between them, it would be a great help if you could give some references or documents about the implementation method. Thank you for your time!!
Cameron Davidson-Pilon
@CamDavidsonPilon
I'd be happy to, as it is something I'm really excited about. Let me type something up tomorrow
fuyb1992
@fuyb1992
Thank you so much, I'm looking forward it!!
Cameron Davidson-Pilon
@CamDavidsonPilon
let me know if you have questions about it
:wave: A minor release, 0.20.4, is available. Bug fixes, improvements to large datasets in AFT, and left-truncation in AFT models.
https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.20.4
Cameron Davidson-Pilon
@CamDavidsonPilon
Let me know if you are having install problems, please - w.r.t. to the 0.20.4 release
Also, I'm working on a new survival regression model. The original motivation was the predictable behaviour of SaaS companies customer churn, but it's generally a very flexible model. Have a look here if interested and I'm looking for feedback on it: https://nbviewer.jupyter.org/gist/CamDavidsonPilon/ce93dc24947c45b034402edc657aa6eb
fuyb1992
@fuyb1992
Thank you very much for your answer, which explains the delta method on parameter models clearly!!
githubhsss
@githubhsss
@CamDavidsonPilon Thanks~
githubhsss
@githubhsss
@CamDavidsonPilon
I have been reading your recommended thesis. It helps a lot. Though still lots of questions...
I tried Cox and WeibullAFT, but the concordance was only 0.53. Does this mean that the models fit unacceptably? What is the reference of the range of 0.55-0.7? In addition to concordance, can I directly compare log likelihood? Have no idea about goodness of fit and model selection...
Cameron Davidson-Pilon
@CamDavidsonPilon

Disappointingly, 0.53 is a bit on the low end. Have you tried a LogNormalAFT - it can fit some models better.

What is the reference of the range of 0.55-0.7?

I think I saw it in Frank H. work, maybe his blog?

You can't compare CoxPH and WeibullAFT log likelihood values, no. Mostly because the CoxPH is a partial likelihood.

I recently added some resources here to help with model selection between CoxPH and parametric models: https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#parametric-vs-semi-parametric-models
It's also very possible you are missing interactions or non-linear effects in your models.