by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Manon Wientjes
    @manonww
    Hi @CamDavidsonPilon How do you ensure that lambda and rho are greater than 0 if you fit a weibull distribution using the WeibullFitter? You do not set the bounds as in the LogNormalFitter?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @manonww good observation. The bounds, when not specified, are set to be always positive
    hgfabc
    @hgfabc
    hi, I'm quite new to using lifelines and I stumble upon errors while executing. I was wondering when using the cph.fit() method, does it omit the missing values/Nan ? Or do I have to reform the dataframe? thanks
    Manon Wientjes
    @manonww
    @CamDavidsonPilon Thanks! I was also wondering why the Weibull distribution is not defined at 0. According to Wikipedia it is defined? https://en.m.wikipedia.org/wiki/Weibull_distribution
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @manonww it is defined at 0, but the probability of an event there is nil (hence why we reject any 0 durations - probably it's malformed data). Is there a place in lifelines where the weibull if not defined at 0? (Maybe my docs?)
    @hgfabc welcome! It does not omit or drop NaNs, that up to you to handle first
    hgfabc
    @hgfabc
    @CamDavidsonPilon so if my original data frame contains Nans in it, it doesn’t raise errors and would proceed with it?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    It will raise an error, which you must fix @hgfabc
    hgfabc
    @hgfabc
    My mistake, I didn’t read through. Thank you!@CamDavidsonPilon
    Typo sorry @CamDavidsonPilon
    Manon Wientjes
    @manonww
    @CamDavidsonPilon No, sorry I didn't read the error message properly. I have another question :). To determine rho and lambda of a Weibull distribution, you use scipy optimize minimize with the L-BFGS-B method?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @manonww yup that is correct!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: minor release alert! Update to 0.20.5 for some bug fixes. Changelog here
    Also, here's how I'm thinking about including interval censoring for a future 0.21 release: CamDavidsonPilon/lifelines#700
    Dan Turkel
    @daturkel_twitter

    Hello. I'm reading the docs on coxph regression (https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html).

    a) the badfit image appears to be broken
    b) in the rossi dataset example provided, plotting the KM curve against the baseline hazards appears not to have a good spread. is there an included dataset that could be used for this example that would show a bigger spread, like the one in the goodfit picture?

    I basically just followed the docs and then did
    kmf = lifelines.KaplanMeierFitter()
    kmf.fit(rossi_dataset['week'],rossi_dataset['arrest'])
    fig, ax = plt.subplots()
    ax.plot(cox_prop_hazards.baseline_survival_,color='b')
    ax.plot(kmf.survival_function_,color='r')
    this was the resulting plot (excuse my vanilla matplotlib)
    just want to get a decently interpretable proof of concept going before I start playing on my own data, but feel like i can't really use rossi end-to-end for the regression documentation here—unless i'm misjudging the plot and this really is a good fit
    FL512
    @FL512

    I do not know how I can modify the output image provide by lifelines since I am unfamiliar with "cph.plot_covariate_groups". Unfortunately, there seems no detailed description about it in the link here - https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html .

    What I am looking for is, (1) how to shorten the event days (X axis), I do not want to show such a long days for the survival curve. Ideally, 4000 is the best. (2) Also, if possible, I would like to remove the baseline survival curve from my image. (3) I am also hoping if I could change the color of the survival curves from orange/blue to others.

    Can anyone give me a kind feedback please?

    Cameron Davidson-Pilon
    @CamDavidsonPilon

    @daturkel_twitter the rossi fit is so-so, partly because the model is so simple (no interaction terms, no higher-order terms, and some variables fail the proportional hazards test). Generally we shouldn't expect huge separation. That visual test is just one test you can use. Predictive performance is another (looking at the c-index), and looking at the log-likelihood ratio test as well.

    What I suggest is to start with a baseline model, and then ask it questions to see if it improves the fit. Ex: do I satisfy the proportional hazards assumption? Does adding a quadratic term improve fit (and make sense)?

    Dan Turkel
    @daturkel_twitter
    @CamDavidsonPilon thanks for the quick response. when you say adding higher-order terms, do you mean simply adding columns that powers of some of your initial covariates?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    yup, could be that - or it could be adding splines, interactions, etc.
    I'll improve that section of the doc in the near future
    Dan Turkel
    @daturkel_twitter
    awesome, keep up the good work!
    FL512
    @FL512
    @CamDavidsonPilon thank you very much! I found your response on stackoverflow!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: new lifelines release 0.21.0 is out. Some important bug fixes, and API changes if you have been doing any left-censoring inference. Full change log here: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.21.0
    Manon Wientjes
    @manonww
    @CamDavidsonPilon Is there a way we can input initial values to our fitter object?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hello! Yes, I’m the regression fitters there is initial_point
    In the fit method
    is that what you are looking for?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @manonww I'm curious why you are interested in this? Is it because of convergence problems?
    Manon Wientjes
    @manonww
    I have to rewrite code originally written in R to Python. I'm using your package to do so and needed to input initial parameters to the optimization. I do not have convergence problems (if I have I scale the data and everything is fine).
    Bojan Kostic
    @bkos
    Hi, I've started using lifelines recently, it seems that there is a small bug when plotting label for log-logistic fitter, instead of the name it gives the whole sentence, could you confirm? Also, I'm unable to plot the Q-Q plot for the piece-wise exp. fitter, it gives a TypeError when in create_scipy_stats_model_from_lifelines_model(model) ... I'm using the latest version.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hey @bkos, I can't reproduce the loglogistic bug you mention. Can you provide a code example?
    For the piecewise exp. and qq-plot - yea I'm not surprised this fails, as the piecewise model isn't part of scipy. However, I should return a better error message
    Bojan Kostic
    @bkos
    Thanks @CamDavidsonPilon , it was my bad for the label, I mistakenly put the model itself for the label argument, it didn't crash but the label was "<lifelines.LogLogisticFitter: fitted with 111 observations, 9 censored>" :)
    Paul Zivich
    @pzivich
    Hi @CamDavidsonPilon do the AFT models in lifelines support time-varying covariates? Or are they assumed to be fixed at baseline?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @pzivich Fixed, but time-varying is on the roadmap
    sam
    @veggiet_gitlab
    I'm confused by this page: https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#goodness-of-fit
    The paragraph states that the first image has a better fit than the second image, but the two lines seem to correlate in the second image near perfectly... I also don't understand how to generate the "baseline survival" metric.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hi @veggiet_gitlab, you know, today I was thinking of removing that blurb from my docs, so I suggest ignoring it completely. If you are interested in model selection, I think a more appropriate way is using the log-likelihood test in the print_summary() output.
    sam
    @veggiet_gitlab
    Thank you
    Ossama Alshabrawy
    @OssamaAlshabrawy
    Hi I was just wondering whether I could use CoxPHFitter() with more than one duration column. Is that possible?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @OssamaAlshabrawy no, but I'm confused what you are trying to model? What might more than one duration column represent?
    sam
    @veggiet_gitlab

    I'd like some advice, I really like the Cox model, it's ability to quantize how different factors might influence the lifetime. But I have a lot of unknowns, my data started getting collected at a certain point years ago and we have people in the system without a known "start point," now I read that this is what left censoring is for, but in all the models I've looked at, it either has no left censoring parameter OR it has a "starting point" parameter which suggests that I know how long a person has been involved before data began to be collect, which I don't, and even if I crawl through old record books I won't have complete knowledge of everyone.

    Is there a guideline for creating a probable starting point?

    sam
    @veggiet_gitlab

    ok I read a different description for Left censoring, and I guess I was wrong, that left censoring is for when you don't know exactly when the event happened but you know it happened before a certain point? Is this true?

    If so my original question still is valid, what to do with people who we don't know their start point?

    Cameron Davidson-Pilon
    @CamDavidsonPilon

    Yea, left censoring is best described with an example: a sensor can't detect values less than 0.05, so we know that some observations are less than 0.05, but we don't know their exact value.

    In your case, you actually have right-censoring! Let me explain (I hope I understand your problem well enough). You are trying to model lifetimes, let's call this durations. For the first type of subject, where we do know their start date, then their observed duration is end_date (or now) - start_date and a 0/1 for if we observed their end date or not.
    For the second type of subject, where we don't know their start date, then their observed duration is end_date (or now) - first observed date and then always have a 0 (since we know they lived longer than what we observed).

    However, there are going to be some restrictions on your model. You can't use "year_joined" as a baseline covariate, since you don't know that for some subjects. Similarly, I don't know how to extend this to time-varying covariates (if you were interested in that).