Hello. I'm reading the docs on coxph regression (https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html).
a) the badfit image appears to be broken
b) in the rossi
dataset example provided, plotting the KM curve against the baseline hazards appears not to have a good spread. is there an included dataset that could be used for this example that would show a bigger spread, like the one in the goodfit picture?
kmf = lifelines.KaplanMeierFitter()
kmf.fit(rossi_dataset['week'],rossi_dataset['arrest'])
fig, ax = plt.subplots()
ax.plot(cox_prop_hazards.baseline_survival_,color='b')
ax.plot(kmf.survival_function_,color='r')
rossi
end-to-end for the regression documentation here—unless i'm misjudging the plot and this really is a good fit
I do not know how I can modify the output image provide by lifelines since I am unfamiliar with "cph.plot_covariate_groups". Unfortunately, there seems no detailed description about it in the link here - https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html .
What I am looking for is, (1) how to shorten the event days (X axis), I do not want to show such a long days for the survival curve. Ideally, 4000 is the best. (2) Also, if possible, I would like to remove the baseline survival curve from my image. (3) I am also hoping if I could change the color of the survival curves from orange/blue to others.
Can anyone give me a kind feedback please?
@daturkel_twitter the rossi fit is so-so, partly because the model is so simple (no interaction terms, no higher-order terms, and some variables fail the proportional hazards test). Generally we shouldn't expect huge separation. That visual test is just one test you can use. Predictive performance is another (looking at the c-index), and looking at the log-likelihood ratio test as well.
What I suggest is to start with a baseline model, and then ask it questions to see if it improves the fit. Ex: do I satisfy the proportional hazards assumption? Does adding a quadratic term improve fit (and make sense)?
print_summary()
output.
I'd like some advice, I really like the Cox model, it's ability to quantize how different factors might influence the lifetime. But I have a lot of unknowns, my data started getting collected at a certain point years ago and we have people in the system without a known "start point," now I read that this is what left censoring is for, but in all the models I've looked at, it either has no left censoring parameter OR it has a "starting point" parameter which suggests that I know how long a person has been involved before data began to be collect, which I don't, and even if I crawl through old record books I won't have complete knowledge of everyone.
Is there a guideline for creating a probable starting point?