by

## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
Cameron Davidson-Pilon
@CamDavidsonPilon
mm sorry try this: np.outer(cph.predict_partial_hazard(rossi), cph.baseline_hazard_['baseline hazard'])
Jane Wayne
@jwayne2978
ok, so predict_survival_function gives me a dataframe of dimensions 57 x 59, and doing that operation gives me an array of 59 x 57. it seems in the latter operation, the rows now correspond to the individuals (unit of analysis), and in the former, the columns correspond to the individuals. when i inspect (one row) the result of the latter, i see values like 1.63576020e+01 which is over 100% (e.g. 16.36 * 100%). doesn't the former and latter all give me a value [0, 1]? the former is the probability of survival pass time t, and the latter the probability of death occurring at time t (given death has not occurred?)
Cameron Davidson-Pilon
@CamDavidsonPilon
So, the hazard is only a probability in discrete models. In general, the hazard is a rate, which can exceed 1.
Jane Wayne
@jwayne2978
what's strange is that the last couple of values show a huge jump: 4.63891229e-01, 6.01481490e-01, 6.62723996e-01, 1.13880326e+00, 1.63576020e+01
Cameron Davidson-Pilon
@CamDavidsonPilon
well, the baseline hazard is really noisy, especially in the tail where there is little data
Jane Wayne
@jwayne2978
my data has all observed events (no right-censoring)
and it's the tail end that is the most interesting
Cameron Davidson-Pilon
@CamDavidsonPilon
Sure, but not everyone dies in the tail, so there are only a few individuals left to die in the tail. The sample size, 59, is small too
Jane Wayne
@jwayne2978
what can i do about the sample size? this data is all i have. should i boost it (e.g. sample with replacement)?
Cameron Davidson-Pilon
@CamDavidsonPilon
In your case, since you are interested in the hazard in the tail, and you have low sample size, I think the semi-parametric model isn't going to work. I suggest you use a parametric model (of which there are Cox flavours): https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#modeling-baseline-hazard-and-survival-with-parametric-models
They even have a predict_hazard function
Jane Wayne
@jwayne2978
for a cox model with base estimator spline, what's the difference between predict_cumulative_hazard and predict_hazard? i would suspect that cumulative hazard is just the cumulative sum of hazard, however, when i try to match them up cph_spline.predict_hazard(sdf).iloc[:, 0].cumsum() the outputs do not match
Cameron Davidson-Pilon
@CamDavidsonPilon
the former is the integral of the latter. cumsum is an approximation (and something we only do for semi-parametric models)
Jane Wayne
@jwayne2978
so ignore predict_cumulative_hazard for spline base estimator?
Cameron Davidson-Pilon
@CamDavidsonPilon
or use it - it's up to your application
Jane Wayne
@jwayne2978
my application is that once i fit a cox model, i want to predict the time to event. the survival function doesn't get me there, as it time-phases the probability of surviving pass time t. the hazard seems to get me there, as it is the probability of experiencing the event at time t, given the event has not happened.
Cameron Davidson-Pilon
@CamDavidsonPilon
In that case, I would suggest predict_median or predict_expectation - these give you point estimates for survival time
Jane Wayne
@jwayne2978
ok thanks @CamDavidsonPilon i will try to experiment with those in a few hours
i appreciate the help
Jane Wayne
@jwayne2978
is it not possible yet to pickle CoxPHFitter with base estimator spline or piecewise?
with 0.25.4?
but i'm not using formulas
Cameron Davidson-Pilon
@CamDavidsonPilon
@jwayne2978 yea, under the hood, the spline models use formulas
Daniel
@dan-r95
hey, thanks for this awesome library! I have a question tho about how to formulate a problem. Basically what I want to do is use the nasa turbofan dataset for survival analysis. I have read the section about survival regression but am unsure if I can use dynamic sensor data as covariates, since the examples use static features like in the rossi dataset. or if using some aggregation over the whole lifecycle of the machines sensor
I have extracted the lifetime of 100 machines and if the broke down by their last known recorded observation or not. the only other static features i have is an operational setting
Daniel
@dan-r95
on another dataset I would have similiar data but also the overall age of the machine,
I would greatly appreciate some direction suggestion :)
Cameron Davidson-Pilon
@CamDavidsonPilon
Hey Daniel, I've seen that dataset, and unfortunately, I don't think lifelines is setup to handle it yet. There is a broader class of models, called renewal models, that are probably the correct approach, see https://arxiv.org/pdf/0708.0362.pdf - though I don't know a Python lib for this
@dan-r95 ^
Daniel
@dan-r95
Thanks a lot for your suggestion!
griffiri
@griffiri
hi @CamDavidsonPilon - any update on when persistence of spline models with version 0.25+ will be available or somewhere i can watch for updates? thanks
Cameron Davidson-Pilon
@CamDavidsonPilon
@griffiri I'm waiting on the next release of formulaic - you can also pip install from my branch too: pip install https://github.com/CamDavidsonPilon/lifelines/archive/try-formulaic.zip
(sorry about the delay on this msg!)
NewsJunkie8590

Hi @CamDavidsonPilon - I am looking to add time varying sensor data as covariates to my existing CoxPH survival regression model (currently with static time to event data). I can do this following the instructions in "Time Varying Survival Regression" in lifelines documentation, correct? The only change I would need to make is the conversion of dataframe to the "long" format - is that right?

I also have a related question about predictions for this problem formulation. Does using rolling/lagged features for time varying covariates help in predictions at future times? Thanks for your time!

Cameron Davidson-Pilon
@CamDavidsonPilon

@NewsJunkie8590_twitter that's right, the dataset will be to be "long" (read the docs carefully though, as adding time-varying covariates is tricky).

Prediction with lagged features makes sense, up to the length of the lag. The trouble comes in what does your covariate matrix look like beyond known/observed times

NewsJunkie8590
Thank you so much for your response, @CamDavidsonPilon !
NewsJunkie8590
@CamDavidsonPilon - In our previous correspondence, you had mentioned that you wanted to re-write the scikit-learn wrapper. I know you had mentioned that it's not currently on the horizon, but it's a feature that I'd love to see. My work involves using lifelines in production, and having the scikit-learn wrapper would be great. Wondering if it has perhaps moved up in priority for you? Thank you!
griffiri
@griffiri
Hi @CamDavidsonPilon checking open issues for formulaic https://github.com/matthewwardrop/formulaic/issues , is it one of these that you are waiting on? Not clear from the descriptions. Trying to get a better idea when formulaic release will happen, last release was nov 19.
Cameron Davidson-Pilon
@CamDavidsonPilon
@griffiri I suspect a new release is soon, as there has been a flurry of activity over the last few months. For the current version of formulaic, all my lifelines tests pass. What is lacking from the current version of formulaic is bs (basis splines), and the next version has that. bs was the biggest motivation for me to include formulas in lifelines in the first place - so I don't want to regress that feature for users.
griffiri
@griffiri
thanks for info @CamDavidsonPilon , will keep an eye on formulaic
Jane Wayne
@jwayne2978
for the CoxTimeVaryingFitter there are only 2 predict methods, predict_partial_hazard and predict_log_partial_hazard. how do i get the survival function like with CoxPHFitter.predict_survival_function()?
Jane Wayne
@jwayne2978
@CamDavidsonPilon any thoughts on the question above?
Jane Wayne
@jwayne2978
@CamDavidsonPilon i've posted the question on stack: https://stackoverflow.com/questions/63942882/how-do-i-compute-the-survival-function-from-coxtimevaryingfitter-in-pythons-lif (might be better to have it there so others may also benefit)
Cameron Davidson-Pilon
@CamDavidsonPilon
@jwayne2978 you can't get the survival function. The problem is we don't know the covariates at all time (in particular: the future) so we can't know the hazard -> can't know the survival function. Note to: if we did know the covariates at some specific time, well, this means the subject is alive, so we don't need the survival function.
Jane Wayne
@jwayne2978
@CamDavidsonPilon i've been playing with the outputs and also what you've said to me before. what does this operation give me? np.exp(-pd.DataFrame(np.outer(ctv.predict_partial_hazard(base_df), ctv.baseline_cumulative_hazard_['baseline hazard'])))
it seems to give me a survival function for each pseudo observation
Cameron Davidson-Pilon
@CamDavidsonPilon
It is a survival function, but it doesn't make sense: It says that there is a less-than-1 probability that the subject is dead, but yet I have observations on it - so it defs is not dead!
Tal Gutman
@Talgutman_gitlab
Hello :) I have a dataset with 4,200 features and about 9,000 patients. Can any of the lifeline models handle a dataset with so many features in a reasonable time frame? Do any of them use multiple CPUs in parallel? another question- what is the stopping criteria for the CoxPHFitter? number of iterations? delta smaller than X? thank you!