by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Also, here's how I'm thinking about including interval censoring for a future 0.21 release: CamDavidsonPilon/lifelines#700
    Dan Turkel
    @daturkel_twitter

    Hello. I'm reading the docs on coxph regression (https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html).

    a) the badfit image appears to be broken
    b) in the rossi dataset example provided, plotting the KM curve against the baseline hazards appears not to have a good spread. is there an included dataset that could be used for this example that would show a bigger spread, like the one in the goodfit picture?

    I basically just followed the docs and then did
    kmf = lifelines.KaplanMeierFitter()
    kmf.fit(rossi_dataset['week'],rossi_dataset['arrest'])
    fig, ax = plt.subplots()
    ax.plot(cox_prop_hazards.baseline_survival_,color='b')
    ax.plot(kmf.survival_function_,color='r')
    this was the resulting plot (excuse my vanilla matplotlib)
    just want to get a decently interpretable proof of concept going before I start playing on my own data, but feel like i can't really use rossi end-to-end for the regression documentation here—unless i'm misjudging the plot and this really is a good fit
    FL512
    @FL512

    I do not know how I can modify the output image provide by lifelines since I am unfamiliar with "cph.plot_covariate_groups". Unfortunately, there seems no detailed description about it in the link here - https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html .

    What I am looking for is, (1) how to shorten the event days (X axis), I do not want to show such a long days for the survival curve. Ideally, 4000 is the best. (2) Also, if possible, I would like to remove the baseline survival curve from my image. (3) I am also hoping if I could change the color of the survival curves from orange/blue to others.

    Can anyone give me a kind feedback please?

    Cameron Davidson-Pilon
    @CamDavidsonPilon

    @daturkel_twitter the rossi fit is so-so, partly because the model is so simple (no interaction terms, no higher-order terms, and some variables fail the proportional hazards test). Generally we shouldn't expect huge separation. That visual test is just one test you can use. Predictive performance is another (looking at the c-index), and looking at the log-likelihood ratio test as well.

    What I suggest is to start with a baseline model, and then ask it questions to see if it improves the fit. Ex: do I satisfy the proportional hazards assumption? Does adding a quadratic term improve fit (and make sense)?

    Dan Turkel
    @daturkel_twitter
    @CamDavidsonPilon thanks for the quick response. when you say adding higher-order terms, do you mean simply adding columns that powers of some of your initial covariates?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    yup, could be that - or it could be adding splines, interactions, etc.
    I'll improve that section of the doc in the near future
    Dan Turkel
    @daturkel_twitter
    awesome, keep up the good work!
    FL512
    @FL512
    @CamDavidsonPilon thank you very much! I found your response on stackoverflow!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: new lifelines release 0.21.0 is out. Some important bug fixes, and API changes if you have been doing any left-censoring inference. Full change log here: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.21.0
    Manon Wientjes
    @manonww
    @CamDavidsonPilon Is there a way we can input initial values to our fitter object?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hello! Yes, I’m the regression fitters there is initial_point
    In the fit method
    is that what you are looking for?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @manonww I'm curious why you are interested in this? Is it because of convergence problems?
    Manon Wientjes
    @manonww
    I have to rewrite code originally written in R to Python. I'm using your package to do so and needed to input initial parameters to the optimization. I do not have convergence problems (if I have I scale the data and everything is fine).
    Bojan Kostic
    @bkos
    Hi, I've started using lifelines recently, it seems that there is a small bug when plotting label for log-logistic fitter, instead of the name it gives the whole sentence, could you confirm? Also, I'm unable to plot the Q-Q plot for the piece-wise exp. fitter, it gives a TypeError when in create_scipy_stats_model_from_lifelines_model(model) ... I'm using the latest version.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hey @bkos, I can't reproduce the loglogistic bug you mention. Can you provide a code example?
    For the piecewise exp. and qq-plot - yea I'm not surprised this fails, as the piecewise model isn't part of scipy. However, I should return a better error message
    Bojan Kostic
    @bkos
    Thanks @CamDavidsonPilon , it was my bad for the label, I mistakenly put the model itself for the label argument, it didn't crash but the label was "<lifelines.LogLogisticFitter: fitted with 111 observations, 9 censored>" :)
    Paul Zivich
    @pzivich
    Hi @CamDavidsonPilon do the AFT models in lifelines support time-varying covariates? Or are they assumed to be fixed at baseline?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @pzivich Fixed, but time-varying is on the roadmap
    sam
    @veggiet_gitlab
    I'm confused by this page: https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#goodness-of-fit
    The paragraph states that the first image has a better fit than the second image, but the two lines seem to correlate in the second image near perfectly... I also don't understand how to generate the "baseline survival" metric.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hi @veggiet_gitlab, you know, today I was thinking of removing that blurb from my docs, so I suggest ignoring it completely. If you are interested in model selection, I think a more appropriate way is using the log-likelihood test in the print_summary() output.
    sam
    @veggiet_gitlab
    Thank you
    Ossama Alshabrawy
    @OssamaAlshabrawy
    Hi I was just wondering whether I could use CoxPHFitter() with more than one duration column. Is that possible?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @OssamaAlshabrawy no, but I'm confused what you are trying to model? What might more than one duration column represent?
    sam
    @veggiet_gitlab

    I'd like some advice, I really like the Cox model, it's ability to quantize how different factors might influence the lifetime. But I have a lot of unknowns, my data started getting collected at a certain point years ago and we have people in the system without a known "start point," now I read that this is what left censoring is for, but in all the models I've looked at, it either has no left censoring parameter OR it has a "starting point" parameter which suggests that I know how long a person has been involved before data began to be collect, which I don't, and even if I crawl through old record books I won't have complete knowledge of everyone.

    Is there a guideline for creating a probable starting point?

    sam
    @veggiet_gitlab

    ok I read a different description for Left censoring, and I guess I was wrong, that left censoring is for when you don't know exactly when the event happened but you know it happened before a certain point? Is this true?

    If so my original question still is valid, what to do with people who we don't know their start point?

    Cameron Davidson-Pilon
    @CamDavidsonPilon

    Yea, left censoring is best described with an example: a sensor can't detect values less than 0.05, so we know that some observations are less than 0.05, but we don't know their exact value.

    In your case, you actually have right-censoring! Let me explain (I hope I understand your problem well enough). You are trying to model lifetimes, let's call this durations. For the first type of subject, where we do know their start date, then their observed duration is end_date (or now) - start_date and a 0/1 for if we observed their end date or not.
    For the second type of subject, where we don't know their start date, then their observed duration is end_date (or now) - first observed date and then always have a 0 (since we know they lived longer than what we observed).

    However, there are going to be some restrictions on your model. You can't use "year_joined" as a baseline covariate, since you don't know that for some subjects. Similarly, I don't know how to extend this to time-varying covariates (if you were interested in that).

    Also, what do baseline covariates even mean in this context? I don't know, since the second type of subject may have evolving covariates that don't reflect the subjects state when they initially started.

    So, I think you can model it, but you'll need to be careful with what variables you include.

    quanthubscl
    @quanthubscl
    Hi, I have a question on the python lifelines software. I am new to survival analysis so please bare with me. If you want the survival function you should integrate the hazard function and take the negative exponent? If this is true, how does lifelines handle the integration. I am using scipy trapz and the survival curves I get a slightly different from what lifelines predicts. I am wondering if maybe I just have a misunderstanding on how to get survival curves from the hazard function.
    Cameron Davidson-Pilon
    @CamDavidsonPilon

    Hi @quanthubscl, your correct that the method you describe is a way to get the survival function. There are other ways however, and it depends on what model you are using. For example, parametric forms often have a closed form formula for the integral of the hazard, and lifelines uses that. Kaplan Meier estimates the survival function directly, and doesn't estimate any hazard.

    Can I ask what model you are using?

    quanthubscl
    @quanthubscl
    @CamDavidsonPilon, I am using the Cox Proportional Hazard Model. I am actually following the examples given for the rossi dataset. Mostly, I am trying to make sure that I am doing things and understanding things correctly. I take the baseline hazard function then multiply it by the partial hazard function for a sample. I then integrate this function with scipy and take the negative exponent.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @quanthubscl in the case of the cox model, we can just cumulatively sum the baseline hazard to get the cumulative baseline hazard. Why? In the Cox model, we actually estimate the cumulative hazard first (using https://stats.stackexchange.com/questions/46532/cox-baseline-hazard), and then take the .diffto recover the baseline hazard, so .cumsum recovers the original cumulative hazard
    davidrindt
    @davidrindt
    Hi! How do we access the p value of the Likelihood ratio test? I can see it printed after cph.print_summary() but I don't know where it is stored?
    @CamDavidsonPilon
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @davidrindt ah, yea this is something I'm thinking of exposing differently. ATM you can access it using _, _, log2p = cph._compute_likelihood_ratio_test()
    sam
    @veggiet_gitlab
    Shoenfield giving.png
    So, I'm finally at the place where I'm using "check assumptions" on my cph model. And I've got a few variables that are reported as "failed the non-proportional test" that I can see clearly do, but then there are a couple that visual inspection doesn't seem like they do... As I understand it to pass the test the variable needs to produce a straight line? And in this image the "giving" parameter is clearly showing a straight line, but I'm getting: 1. Variable 'giving' failed the non-proportional test: p-value is <5e-05.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @veggiet_gitlab the left-hand graph does dip in the right tail, which is probably the violation. However, it's very very minor, and because you have so much data, the test has enough power to detect even this minor violation. It's safe to ignore this minor violation.
    fredrichards72
    @fredrichards72

    @CamDavidsonPilon Thanks so much for your work on lifelines. Very cool. I was particularly excited to see your last post on SaaS churn. https://dataorigami.net/blogs/napkin-folding/churn.

    I've been trying to follow along but have run into a couple issues. First, I don't see that 'PiecewiseExponentialRegressionFitter' exists in lifelines. I do see 'PiecewiseExponentialFitter', however. If I use ''PiecewiseExponentialFitter' I get an error:
    'object has no attribute 'predict_cumulative_hazard'

    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hey @fredrichards72, the model isn't in lifelines yet (I should have added that to the blog article). It's in a PR right now, and I should merge it soon. CamDavidsonPilon/lifelines#715
    fredrichards72
    @fredrichards72
    Got it. Thanks!