Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    fuyb1992
    @fuyb1992
    '''
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    efficient as in "statistical efficiency", not peformance
    githubhsss
    @githubhsss
    @CamDavidsonPilon
    Thanks for sharing the thesis again~
    I'm dealing with some repeated events data(machine failure time data). Since a machine may have several failures and different machines have different number of failures, I think it's necessary to consider about repeated events and heterogeneity. Will frailty models help? Or any other advice? (^_^)/
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Yuck, Gitter is being messy and posting my edited messages much later than originally posted. sorry sorry
    @fuyb1992 ah yes, you may want to keep your if self.median_ != np.inf check
    @githubhsss frailty, is one solution, though it's not in lifelines (but is in R's survival). Another option is to use cluster_col is CoxPHFitter: https://lifelines.readthedocs.io/en/latest/Examples.html#correlations-between-subjects-in-a-cox-model. Another solution is to strata-ify per machine in the CoxPHFitter.
    fuyb1992
    @fuyb1992
    image.png
    fuyb1992
    @fuyb1992
    Thanks a lot! I'm trying to understand the confidence interval of survival function for parameter models, the Taylor expansions method is mentioned a lot , and the Jacobian-vector product is used in lifelines code. I'm confused with the relationship between them, it would be a great help if you could give some references or documents about the implementation method. Thank you for your time!!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    I'd be happy to, as it is something I'm really excited about. Let me type something up tomorrow
    fuyb1992
    @fuyb1992
    Thank you so much, I'm looking forward it!!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    let me know if you have questions about it
    :wave: A minor release, 0.20.4, is available. Bug fixes, improvements to large datasets in AFT, and left-truncation in AFT models.
    https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.20.4
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Let me know if you are having install problems, please - w.r.t. to the 0.20.4 release
    Also, I'm working on a new survival regression model. The original motivation was the predictable behaviour of SaaS companies customer churn, but it's generally a very flexible model. Have a look here if interested and I'm looking for feedback on it: https://nbviewer.jupyter.org/gist/CamDavidsonPilon/ce93dc24947c45b034402edc657aa6eb
    fuyb1992
    @fuyb1992
    Thank you very much for your answer, which explains the delta method on parameter models clearly!!
    githubhsss
    @githubhsss
    @CamDavidsonPilon Thanks~
    githubhsss
    @githubhsss
    @CamDavidsonPilon
    I have been reading your recommended thesis. It helps a lot. Though still lots of questions...
    I tried Cox and WeibullAFT, but the concordance was only 0.53. Does this mean that the models fit unacceptably? What is the reference of the range of 0.55-0.7? In addition to concordance, can I directly compare log likelihood? Have no idea about goodness of fit and model selection...
    Cameron Davidson-Pilon
    @CamDavidsonPilon

    Disappointingly, 0.53 is a bit on the low end. Have you tried a LogNormalAFT - it can fit some models better.

    What is the reference of the range of 0.55-0.7?

    I think I saw it in Frank H. work, maybe his blog?

    You can't compare CoxPH and WeibullAFT log likelihood values, no. Mostly because the CoxPH is a partial likelihood.

    I recently added some resources here to help with model selection between CoxPH and parametric models: https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#parametric-vs-semi-parametric-models
    It's also very possible you are missing interactions or non-linear effects in your models.
    githubhsss
    @githubhsss
    @CamDavidsonPilon Thanks for your answer~ I got to keep working on it...
    Manon Wientjes
    @manonww
    Hi @CamDavidsonPilon How do you ensure that lambda and rho are greater than 0 if you fit a weibull distribution using the WeibullFitter? You do not set the bounds as in the LogNormalFitter?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @manonww good observation. The bounds, when not specified, are set to be always positive
    hgfabc
    @hgfabc
    hi, I'm quite new to using lifelines and I stumble upon errors while executing. I was wondering when using the cph.fit() method, does it omit the missing values/Nan ? Or do I have to reform the dataframe? thanks
    Manon Wientjes
    @manonww
    @CamDavidsonPilon Thanks! I was also wondering why the Weibull distribution is not defined at 0. According to Wikipedia it is defined? https://en.m.wikipedia.org/wiki/Weibull_distribution
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @manonww it is defined at 0, but the probability of an event there is nil (hence why we reject any 0 durations - probably it's malformed data). Is there a place in lifelines where the weibull if not defined at 0? (Maybe my docs?)
    @hgfabc welcome! It does not omit or drop NaNs, that up to you to handle first
    hgfabc
    @hgfabc
    @CamDavidsonPilon so if my original data frame contains Nans in it, it doesn’t raise errors and would proceed with it?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    It will raise an error, which you must fix @hgfabc
    hgfabc
    @hgfabc
    My mistake, I didn’t read through. Thank you!@CamDavidsonPilon
    Typo sorry @CamDavidsonPilon
    Manon Wientjes
    @manonww
    @CamDavidsonPilon No, sorry I didn't read the error message properly. I have another question :). To determine rho and lambda of a Weibull distribution, you use scipy optimize minimize with the L-BFGS-B method?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @manonww yup that is correct!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: minor release alert! Update to 0.20.5 for some bug fixes. Changelog here
    Also, here's how I'm thinking about including interval censoring for a future 0.21 release: CamDavidsonPilon/lifelines#700
    Dan Turkel
    @daturkel_twitter

    Hello. I'm reading the docs on coxph regression (https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html).

    a) the badfit image appears to be broken
    b) in the rossi dataset example provided, plotting the KM curve against the baseline hazards appears not to have a good spread. is there an included dataset that could be used for this example that would show a bigger spread, like the one in the goodfit picture?

    I basically just followed the docs and then did
    kmf = lifelines.KaplanMeierFitter()
    kmf.fit(rossi_dataset['week'],rossi_dataset['arrest'])
    fig, ax = plt.subplots()
    ax.plot(cox_prop_hazards.baseline_survival_,color='b')
    ax.plot(kmf.survival_function_,color='r')
    this was the resulting plot (excuse my vanilla matplotlib)
    just want to get a decently interpretable proof of concept going before I start playing on my own data, but feel like i can't really use rossi end-to-end for the regression documentation here—unless i'm misjudging the plot and this really is a good fit
    FL512
    @FL512

    I do not know how I can modify the output image provide by lifelines since I am unfamiliar with "cph.plot_covariate_groups". Unfortunately, there seems no detailed description about it in the link here - https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html .

    What I am looking for is, (1) how to shorten the event days (X axis), I do not want to show such a long days for the survival curve. Ideally, 4000 is the best. (2) Also, if possible, I would like to remove the baseline survival curve from my image. (3) I am also hoping if I could change the color of the survival curves from orange/blue to others.

    Can anyone give me a kind feedback please?

    Cameron Davidson-Pilon
    @CamDavidsonPilon

    @daturkel_twitter the rossi fit is so-so, partly because the model is so simple (no interaction terms, no higher-order terms, and some variables fail the proportional hazards test). Generally we shouldn't expect huge separation. That visual test is just one test you can use. Predictive performance is another (looking at the c-index), and looking at the log-likelihood ratio test as well.

    What I suggest is to start with a baseline model, and then ask it questions to see if it improves the fit. Ex: do I satisfy the proportional hazards assumption? Does adding a quadratic term improve fit (and make sense)?

    Dan Turkel
    @daturkel_twitter
    @CamDavidsonPilon thanks for the quick response. when you say adding higher-order terms, do you mean simply adding columns that powers of some of your initial covariates?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    yup, could be that - or it could be adding splines, interactions, etc.
    I'll improve that section of the doc in the near future
    Dan Turkel
    @daturkel_twitter
    awesome, keep up the good work!
    FL512
    @FL512
    @CamDavidsonPilon thank you very much! I found your response on stackoverflow!