- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

ok, so

`predict_survival_function`

gives me a dataframe of dimensions `57 x 59`

, and doing that operation gives me an array of `59 x 57`

. it seems in the latter operation, the rows now correspond to the individuals (unit of analysis), and in the former, the columns correspond to the individuals. when i inspect (one row) the result of the latter, i see values like `1.63576020e+01`

which is over 100% (e.g. 16.36 * 100%). doesn't the former and latter all give me a value [0, 1]? the former is the probability of survival pass time t, and the latter the probability of death occurring at time t (given death has not occurred?)
and it's the tail end that is the most interesting

In your case, since you are interested in the hazard in the tail, and you have low sample size, I think the semi-parametric model isn't going to work. I suggest you use a parametric model (of which there are Cox flavours): https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#modeling-baseline-hazard-and-survival-with-parametric-models

They even have a

`predict_hazard`

function
for a cox model with base estimator

`spline`

, what's the difference between `predict_cumulative_hazard`

and `predict_hazard`

? i would suspect that cumulative hazard is just the cumulative sum of hazard, however, when i try to match them up `cph_spline.predict_hazard(sdf).iloc[:, 0].cumsum()`

the outputs do not match
my application is that once i fit a cox model, i want to predict the time to event. the survival function doesn't get me there, as it time-phases the probability of surviving pass time t. the hazard seems to get me there, as it is the probability of experiencing the event at time t, given the event has not happened.

i appreciate the help

with

`0.25.4`

?
i see something in the docs about inability to pickle: https://lifelines.readthedocs.io/en/latest/Changelog.html#api-changes-2

but i'm not using formulas

hey, thanks for this awesome library! I have a question tho about how to formulate a problem. Basically what I want to do is use the nasa turbofan dataset for survival analysis. I have read the section about survival regression but am unsure if I can use dynamic sensor data as covariates, since the examples use static features like in the

`rossi`

dataset. or if using some aggregation over the whole lifecycle of the machines sensor
I have extracted the lifetime of 100 machines and if the broke down by their last known recorded observation or not. the only other static features i have is an operational setting

Hey Daniel, I've seen that dataset, and unfortunately, I don't think lifelines is setup to handle it yet. There is a broader class of models, called renewal models, that are probably the correct approach, see https://arxiv.org/pdf/0708.0362.pdf - though I don't know a Python lib for this

@dan-r95 ^

(sorry about the delay on this msg!)

Hi @CamDavidsonPilon - I am looking to add time varying sensor data as covariates to my existing CoxPH survival regression model (currently with static time to event data). I can do this following the instructions in "Time Varying Survival Regression" in lifelines documentation, correct? The only change I would need to make is the conversion of dataframe to the "long" format - is that right?

I also have a related question about predictions for this problem formulation. Does using rolling/lagged features for time varying covariates help in predictions at future times? Thanks for your time!

@NewsJunkie8590_twitter that's right, the dataset will be to be "long" (read the docs carefully though, as adding time-varying covariates is tricky).

Prediction with lagged features makes sense, *up to the length of the lag*. The trouble comes in what does your covariate matrix look like beyond known/observed times

@CamDavidsonPilon - In our previous correspondence, you had mentioned that you wanted to re-write the scikit-learn wrapper. I know you had mentioned that it's not currently on the horizon, but it's a feature that I'd love to see. My work involves using lifelines in production, and having the scikit-learn wrapper would be great. Wondering if it has perhaps moved up in priority for you? Thank you!

Hi @CamDavidsonPilon checking open issues for formulaic https://github.com/matthewwardrop/formulaic/issues , is it one of these that you are waiting on? Not clear from the descriptions. Trying to get a better idea when formulaic release will happen, last release was nov 19.

@griffiri I suspect a new release is soon, as there has been a flurry of activity over the last few months. For the current version of formulaic, all my lifelines tests pass. What is lacking from the current version of formulaic is

`bs`

(basis splines), and the next version has that. `bs`

was the biggest motivation for me to include formulas in lifelines in the first place - so I don't want to regress that feature for users.
@CamDavidsonPilon i've posted the question on stack: https://stackoverflow.com/questions/63942882/how-do-i-compute-the-survival-function-from-coxtimevaryingfitter-in-pythons-lif (might be better to have it there so others may also benefit)

@jwayne2978 you can't get the survival function. The problem is we don't know the covariates at all time (in particular: the future) so we can't know the hazard -> can't know the survival function. Note to: if we *did* know the covariates at some specific time, well, this means the subject is alive, so we don't need the survival function.

it seems to give me a survival function for each

`pseudo`

observation
Hello :) I have a dataset with 4,200 features and about 9,000 patients. Can any of the lifeline models handle a dataset with so many features in a reasonable time frame? Do any of them use multiple CPUs in parallel? another question- what is the stopping criteria for the CoxPHFitter? number of iterations? delta smaller than X? thank you!

1 reply

Hi @CamDavidsonPilon! First of all, thanks for this awesome package.

I am running Cox PH models trying to evaluate potential interactions between covariates and treatment. Initially I was using a likelihood ratio test (model with interaction - model without interaction) to decide its importance. But I was concern about overfitting the model (I don't have a lot of subjects) and decide to repeat the analysis doing monte carlo CV and measure the importance again. I feel that the permutation analysis should be more robust, what do you think? Thanks a lot for your time and sorry if the question is a bit general.