- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

this was the resulting plot (excuse my vanilla matplotlib)

just want to get a decently interpretable proof of concept going before I start playing on my own data, but feel like i can't really use *is* a good fit

`rossi`

end-to-end for the regression documentation here—unless i'm misjudging the plot and this really I do not know how I can modify the output image provide by lifelines since I am unfamiliar with "cph.plot_covariate_groups". Unfortunately, there seems no detailed description about it in the link here - https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html .

What I am looking for is, (1) how to shorten the event days (X axis), I do not want to show such a long days for the survival curve. Ideally, 4000 is the best. (2) Also, if possible, I would like to remove the baseline survival curve from my image. (3) I am also hoping if I could change the color of the survival curves from orange/blue to others.

Can anyone give me a kind feedback please?

@daturkel_twitter the rossi fit is so-so, partly because the model is so simple (no interaction terms, no higher-order terms, and some variables fail the proportional hazards test). Generally we shouldn't expect *huge* separation. That visual test is just one test you can use. Predictive performance is another (looking at the c-index), and looking at the log-likelihood ratio test as well.

What I suggest is to start with a baseline model, and then ask it questions to see if it improves the fit. Ex: do I satisfy the proportional hazards assumption? Does adding a quadratic term improve fit (and make sense)?

@FL512 I answered your question here: https://stackoverflow.com/questions/55629968/how-to-modify-the-output-of-my-coxph-image-drawn-by-cph-plot-covariate-groups/55634181#55634181

I'll improve that section of the doc in the near future

:wave: new lifelines release 0.21.0 is out. Some important bug fixes, and API changes if you have been doing any left-censoring inference. Full change log here: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.21.0

In the fit method

is that what you are looking for?

Hi, I've started using lifelines recently, it seems that there is a small bug when plotting label for log-logistic fitter, instead of the name it gives the whole sentence, could you confirm? Also, I'm unable to plot the Q-Q plot for the piece-wise exp. fitter, it gives a TypeError when in create_scipy_stats_model_from_lifelines_model(model) ... I'm using the latest version.

For the piecewise exp. and qq-plot - yea I'm not surprised this fails, as the piecewise model isn't part of scipy. However, I should return a better error message

I'm confused by this page: https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#goodness-of-fit

The paragraph states that the first image has a better fit than the second image, but the two lines seem to correlate in the second image near perfectly... I also don't understand how to generate the "baseline survival" metric.

The paragraph states that the first image has a better fit than the second image, but the two lines seem to correlate in the second image near perfectly... I also don't understand how to generate the "baseline survival" metric.

Hi @veggiet_gitlab, you know, today I was thinking of removing that blurb from my docs, so I suggest ignoring it completely. If you are interested in model selection, I think a more appropriate way is using the log-likelihood test in the

`print_summary()`

output.
I'd like some advice, I really like the Cox model, it's ability to quantize how different factors might influence the lifetime. But I have a lot of unknowns, my data started getting collected at a certain point years ago and we have people in the system without a known "start point," now I read that this is what left censoring is for, but in all the models I've looked at, it either has no left censoring parameter OR it has a "starting point" parameter which suggests that I know how long a person has been involved before data began to be collect, which I don't, and even if I crawl through old record books I won't have complete knowledge of everyone.

Is there a guideline for creating a probable starting point?

ok I read a different description for Left censoring, and I guess I was wrong, that left censoring is for when you don't know exactly when the event happened but you know it happened before a certain point? Is this true?

If so my original question still is valid, what to do with people who we don't know their start point?

Yea, left censoring is best described with an example: a sensor can't detect values less than 0.05, so we know that some observations are less than 0.05, but we don't know their exact value.

In your case, you actually have right-censoring! Let me explain (I hope I understand your problem well enough). You are trying to model lifetimes, let's call this durations. For the first type of subject, where we *do* know their start date, then their observed duration is `end_date (or now) - start_date`

and a 0/1 for if we observed their end date or not.

For the second type of subject, where we *don't* know their start date, then their observed duration is `end_date (or now) - first observed date`

and then *always* have a 0 (since we know they lived longer than what we observed).

However, there are going to be some restrictions on your model. You can't use "year_joined" as a baseline covariate, since you don't know that for some subjects. Similarly, I don't know how to extend this to time-varying covariates (if you were interested in that).

Also, what do baseline covariates even mean in this context? I don't know, since the second type of subject may have evolving covariates that don't reflect the subjects state when they initially started.

So, I think you can model it, but you'll need to be careful with what variables you include.

Hi, I have a question on the python lifelines software. I am new to survival analysis so please bare with me. If you want the survival function you should integrate the hazard function and take the negative exponent? If this is true, how does lifelines handle the integration. I am using scipy trapz and the survival curves I get a slightly different from what lifelines predicts. I am wondering if maybe I just have a misunderstanding on how to get survival curves from the hazard function.

Hi @quanthubscl, your correct that the method you describe is a way to get the survival function. There are other ways however, and it depends on what model you are using. For example, parametric forms often have a closed form formula for the integral of the hazard, and lifelines uses that. Kaplan Meier estimates the survival function directly, and doesn't estimate any hazard.

Can I ask what model you are using?

@CamDavidsonPilon, I am using the Cox Proportional Hazard Model. I am actually following the examples given for the rossi dataset. Mostly, I am trying to make sure that I am doing things and understanding things correctly. I take the baseline hazard function then multiply it by the partial hazard function for a sample. I then integrate this function with scipy and take the negative exponent.

@quanthubscl in the case of the cox model, we can just cumulatively sum the baseline hazard to get the cumulative baseline hazard. Why? In the Cox model, we actually estimate the cumulative hazard first (using https://stats.stackexchange.com/questions/46532/cox-baseline-hazard), and then take the

`.diff`

to recover the baseline hazard, so `.cumsum`

recovers the original cumulative hazard
@CamDavidsonPilon

So, I'm finally at the place where I'm using "check assumptions" on my cph model. And I've got a few variables that are reported as "failed the non-proportional test" that I can see clearly do, but then there are a couple that visual inspection doesn't seem like they do... As I understand it to pass the test the variable needs to produce a straight line? And in this image the "giving" parameter is clearly showing a straight line, but I'm getting: 1. Variable 'giving' failed the non-proportional test: p-value is <5e-05.

@veggiet_gitlab the left-hand graph does dip in the right tail, which is probably the violation. *However*, it's very very minor, and because you have so much data, the test has enough power to detect even this minor violation. It's safe to ignore this minor violation.

@CamDavidsonPilon Thanks so much for your work on lifelines. Very cool. I was particularly excited to see your last post on SaaS churn. https://dataorigami.net/blogs/napkin-folding/churn.

I've been trying to follow along but have run into a couple issues. First, I don't see that 'PiecewiseExponentialRegressionFitter' exists in lifelines. I do see 'PiecewiseExponentialFitter', however. If I use ''PiecewiseExponentialFitter' I get an error:

'object has no attribute 'predict_cumulative_hazard'

Hey @fredrichards72, the model isn't in lifelines yet (I should have added that to the blog article). It's in a PR right now, and I should merge it soon. CamDavidsonPilon/lifelines#715

One other question: I'm dealing with subscriber data which is right censored (still have lots of active subcribers whose death events have not been observed). In addition, we have acquired companies with active subscribers over the past few years and their start dates are unknown. If we acquired a company on jan 1, 2018, we know that the subscriber start date (birth) was at least that early, but it could have been years before. I believe that would be left censored. Any suggestion for how to handle that?

That's similar to a previous situation talked about in this room: https://gitter.im/python-lifelines/Lobby?at=5ccb47e6375bac74704463e3