Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Silvernine0S
    @Silvernine0S
    Thanks!
    Silvernine0S
    @Silvernine0S
    @CamDavidsonPilon I'm not quite sure I understand how you estimate the hazard rate through the cumulative hazard function. From what I understand, the cumulative hazard function can be estimated by taking the minus natural log of the Kaplan-Meier, or through the product summation of di/di. So I was wondering how you are estimating the hazard function from it without using a parametric distribution model? I can't find any clearer explanation of that from the documentation.
    Cameron Davidson-Pilon
    @CamDavidsonPilon

    Good question, so the cumulative hazard rate is the integral of the hazard rate, so to recover the latter, one could use finite differences on the cumulative hazard, ex: h(t) = (H(t+d) - H(t))/d

    However, this is quite a noisy estimate and only non-zero when H(t) changes over the interval [t, t+d]. So after differencing, one can apply a kernel smoother. https://en.wikipedia.org/wiki/Kernel_smoother

    In general though, recovering the hazard rate without parametric models is difficult, hence why most people focus on the cumulative hazard.

    Tim Holme
    @ironmantimholme_twitter
    @CamDavidsonPilon , thank you very much for your library. It is easy to use and very handy. I was the user who made this comment on stack overflow in response to which you opened CamDavidsonPilon/lifelines#438. What do you think about implementation?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @ironmantimholme_twitter hi there. I'm guessing you are looking for a non-parametric estimate too? I think I've seen this somewhere, and agree it would be nice to have. A open question is what should the API look like
    Silvernine0S
    @Silvernine0S
    Awesome! Thanks for answering my question!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: a minor version 0.20.1 was just released, offers some nice performance improvements to models. https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.20.1
    ningeo
    @ningeo
    Hi there, I'm just getting started with lifelines and have hit a mental block. I'm trying to pass a duration and censor series to a KaplanMeierFitter; at first the durations had NaN values (which I thought would be fine because of the censor vector, but apparently not. Filtering those out now leaves me with: AttributeError: 'bool' object has no attribute 'any'. It's raised while checking for NaN and np.inf on the durations. I've confirmed there are no inf or nan values in the data I'm passing...
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hi @ningeo, let me try to help. Your duration array should be all numbers (no nans), and your event_observed variable should be an array of the same size as your duration array. Can you confirm that your event_observed is the same size?
    I suspect you are passing in a bool instead, ex: kmf.fit(durations, True)
    ningeo
    @ningeo
    Hi Cameron, thanks! My durations are: Name: time_to_prod, Length: 12594, dtype: timedelta64[ns] and my event_observed is: Name: censored, Length: 12594, dtype: int32
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    okay, the event_observed looks okay then (I assume it's full of 0s and 1s). Hm, maybe the problem is the durations dtype. Can you try casting that to a float, ex: astype(float)
    ningeo
    @ningeo
    hmm... TypeError: cannot astype a timedelta from [timedelta64[ns]] to [float64]
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    yea, I can recreate your error locally. Do .values.astype(float)
    So, did you use the function datetimes_to_durations from lifelines.utils to transform your data?
    ningeo
    @ningeo
    no, I have a start datetime and an end datetime, so it's actually just a df['end'] - df['start
    ']
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    I'll add some error handling however, so this error is easier to fix in the future
    ningeo
    @ningeo
    they're proper datetimes, so all of the above?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    (oops I deleted my message by mistake)
    ningeo
    @ningeo
    it's working now, much appreciated!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    (it was: try using datetime_to_durations - part of the problem of df['end'] - df['start'] is it is ambiguous what the measurement scale is: days, hours, minutes, etc.?)
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: just released a minor version, 0.20.2 - support for left-censoring and qqplots. See change log here: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.20.2
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: just released a minor version, 0.20.3, that adds some quality-of-life improvements for Kaplan-Meier users. https://github.com/CamDavidsonPilon/lifelines/releases
    githubhsss
    @githubhsss
    @CamDavidsonPilon Thanks for your great work! It 's amazing!
    Will there be a Weibull proportional hazard model in the future? Or any advice about building WPHM in python?
    Thanks again!!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Correct me if I am wrong, but Weibull AFT == Weibull PH, no?
    Screen Shot 2019-03-25 at 11.00.52 AM.png
    Maybe I should add this to the docs, though
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    cc @githubhsss ^
    Paul Zivich
    @pzivich
    They are equivalent, but there is a formula to convert between the estimate from Weibull AFT and Weibull HR. Where \beta_{PH} = - \beta_{AFT} * \sigma where \sigma is the scale (depends on how you have the Weibull factored as. I think lifelines might be slightly different)
    fuyb1992
    @fuyb1992
    I want to get the interval of predicted median value for Weibull model, and I write some codes to get it, but I'm not sure if this is corret, here is my code:
    class MyWeibullFitter(WeibullFitter): @property def median_confidence_interval_(self): '''get the confidence interval of the median, must call after fit and plot''' if self.median_ != np.inf: self.timeline = np.linspace(self.median_, self.median_, 1) return self.confidence_interval_survival_function_ else: return None
    Thank you for your times!
    githubhsss
    @githubhsss
    @pzivich Thanks for your answer~
    @CamDavidsonPilon May I ask which book is the screenshot of Figure 4.1 from? Newbie at survival analysis and want to learn more~
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @githubhsss it's from a thesis, which is a pretty nice intro to a lot of common models: https://harvest.usask.ca/bitstream/handle/10388/etd-03302009-140638/JiezhiQiThesis.pdf
    githubhsss
    @githubhsss
    @CamDavidsonPilon Thanks a lot!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @fuyb1992 you can do something like this:
    from lifelines.utils import median_survival_times 
    
    median_survival_times(self.confidence_interval_survival_function_)
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    (though it is pretty efficient, just not most efficient)
    This actually isn't the most efficient way to compute the confidence intervals, but I think I'll expose a better way in the future
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    efficiency in the statistical sense, not performance
    fuyb1992
    @fuyb1992
    @CamDavidsonPilon Thank you for your answer!!! I tried your answer, it only works for data with S(t)<=0.5 and return days interval, but for data with S(t)>0.5 return None .
    fuyb1992
    @fuyb1992
    @CamDavidsonPilon I'am new to survival analysis, excuse me please if I'm wrong. I'm confused after reading wiki and papers about the confidence interval of survival function for parameter models, it would be a great help if you can give some references or documents about that!! Thanks a lot!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @fuyb1992 you can do something like this:
    from lifelines.utils import median_survival_times
    median_survival_times(self.confidence_interval_survival_function_)
    fuyb1992
    @fuyb1992
    @CamDavidsonPilon Thank you for your answer!!! I tried your answer, it only works for data with S(t)<=0.5 and return days interval, but for data with S(t)>0.5 return None .
    '''
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    efficient as in "statistical efficiency", not peformance
    githubhsss
    @githubhsss
    @CamDavidsonPilon
    Thanks for sharing the thesis again~
    I'm dealing with some repeated events data(machine failure time data). Since a machine may have several failures and different machines have different number of failures, I think it's necessary to consider about repeated events and heterogeneity. Will frailty models help? Or any other advice? (^_^)/
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Yuck, Gitter is being messy and posting my edited messages much later than originally posted. sorry sorry
    @fuyb1992 ah yes, you may want to keep your if self.median_ != np.inf check
    @githubhsss frailty, is one solution, though it's not in lifelines (but is in R's survival). Another option is to use cluster_col is CoxPHFitter: https://lifelines.readthedocs.io/en/latest/Examples.html#correlations-between-subjects-in-a-cox-model. Another solution is to strata-ify per machine in the CoxPHFitter.
    fuyb1992
    @fuyb1992
    image.png
    fuyb1992
    @fuyb1992
    Thanks a lot! I'm trying to understand the confidence interval of survival function for parameter models, the Taylor expansions method is mentioned a lot , and the Jacobian-vector product is used in lifelines code. I'm confused with the relationship between them, it would be a great help if you could give some references or documents about the implementation method. Thank you for your time!!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    I'd be happy to, as it is something I'm really excited about. Let me type something up tomorrow