by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    mm sorry try this: np.outer(cph.predict_partial_hazard(rossi), cph.baseline_hazard_['baseline hazard'])
    Jane Wayne
    @jwayne2978
    ok, so predict_survival_function gives me a dataframe of dimensions 57 x 59, and doing that operation gives me an array of 59 x 57. it seems in the latter operation, the rows now correspond to the individuals (unit of analysis), and in the former, the columns correspond to the individuals. when i inspect (one row) the result of the latter, i see values like 1.63576020e+01 which is over 100% (e.g. 16.36 * 100%). doesn't the former and latter all give me a value [0, 1]? the former is the probability of survival pass time t, and the latter the probability of death occurring at time t (given death has not occurred?)
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    So, the hazard is only a probability in discrete models. In general, the hazard is a rate, which can exceed 1.
    Jane Wayne
    @jwayne2978
    what's strange is that the last couple of values show a huge jump: 4.63891229e-01, 6.01481490e-01, 6.62723996e-01, 1.13880326e+00, 1.63576020e+01
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    well, the baseline hazard is really noisy, especially in the tail where there is little data
    Jane Wayne
    @jwayne2978
    my data has all observed events (no right-censoring)
    and it's the tail end that is the most interesting
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Sure, but not everyone dies in the tail, so there are only a few individuals left to die in the tail. The sample size, 59, is small too
    Jane Wayne
    @jwayne2978
    what can i do about the sample size? this data is all i have. should i boost it (e.g. sample with replacement)?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    In your case, since you are interested in the hazard in the tail, and you have low sample size, I think the semi-parametric model isn't going to work. I suggest you use a parametric model (of which there are Cox flavours): https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#modeling-baseline-hazard-and-survival-with-parametric-models
    They even have a predict_hazard function
    Jane Wayne
    @jwayne2978
    for a cox model with base estimator spline, what's the difference between predict_cumulative_hazard and predict_hazard? i would suspect that cumulative hazard is just the cumulative sum of hazard, however, when i try to match them up cph_spline.predict_hazard(sdf).iloc[:, 0].cumsum() the outputs do not match
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    the former is the integral of the latter. cumsum is an approximation (and something we only do for semi-parametric models)
    Jane Wayne
    @jwayne2978
    so ignore predict_cumulative_hazard for spline base estimator?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    or use it - it's up to your application
    Jane Wayne
    @jwayne2978
    my application is that once i fit a cox model, i want to predict the time to event. the survival function doesn't get me there, as it time-phases the probability of surviving pass time t. the hazard seems to get me there, as it is the probability of experiencing the event at time t, given the event has not happened.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    In that case, I would suggest predict_median or predict_expectation - these give you point estimates for survival time
    Jane Wayne
    @jwayne2978
    ok thanks @CamDavidsonPilon i will try to experiment with those in a few hours
    i appreciate the help
    Jane Wayne
    @jwayne2978
    is it not possible yet to pickle CoxPHFitter with base estimator spline or piecewise?
    with 0.25.4?
    i see something in the docs about inability to pickle: https://lifelines.readthedocs.io/en/latest/Changelog.html#api-changes-2
    but i'm not using formulas
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @jwayne2978 yea, under the hood, the spline models use formulas
    Daniel
    @dan-r95
    hey, thanks for this awesome library! I have a question tho about how to formulate a problem. Basically what I want to do is use the nasa turbofan dataset for survival analysis. I have read the section about survival regression but am unsure if I can use dynamic sensor data as covariates, since the examples use static features like in the rossi dataset. or if using some aggregation over the whole lifecycle of the machines sensor
    I have extracted the lifetime of 100 machines and if the broke down by their last known recorded observation or not. the only other static features i have is an operational setting
    Daniel
    @dan-r95
    on another dataset I would have similiar data but also the overall age of the machine,
    I would greatly appreciate some direction suggestion :)
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hey Daniel, I've seen that dataset, and unfortunately, I don't think lifelines is setup to handle it yet. There is a broader class of models, called renewal models, that are probably the correct approach, see https://arxiv.org/pdf/0708.0362.pdf - though I don't know a Python lib for this
    @dan-r95 ^
    Daniel
    @dan-r95
    Thanks a lot for your suggestion!
    griffiri
    @griffiri
    hi @CamDavidsonPilon - any update on when persistence of spline models with version 0.25+ will be available or somewhere i can watch for updates? thanks
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @griffiri I'm waiting on the next release of formulaic - you can also pip install from my branch too: pip install https://github.com/CamDavidsonPilon/lifelines/archive/try-formulaic.zip
    (sorry about the delay on this msg!)
    NewsJunkie8590
    @NewsJunkie8590_twitter

    Hi @CamDavidsonPilon - I am looking to add time varying sensor data as covariates to my existing CoxPH survival regression model (currently with static time to event data). I can do this following the instructions in "Time Varying Survival Regression" in lifelines documentation, correct? The only change I would need to make is the conversion of dataframe to the "long" format - is that right?

    I also have a related question about predictions for this problem formulation. Does using rolling/lagged features for time varying covariates help in predictions at future times? Thanks for your time!

    Cameron Davidson-Pilon
    @CamDavidsonPilon

    @NewsJunkie8590_twitter that's right, the dataset will be to be "long" (read the docs carefully though, as adding time-varying covariates is tricky).

    Prediction with lagged features makes sense, up to the length of the lag. The trouble comes in what does your covariate matrix look like beyond known/observed times

    NewsJunkie8590
    @NewsJunkie8590_twitter
    Thank you so much for your response, @CamDavidsonPilon !
    NewsJunkie8590
    @NewsJunkie8590_twitter
    @CamDavidsonPilon - In our previous correspondence, you had mentioned that you wanted to re-write the scikit-learn wrapper. I know you had mentioned that it's not currently on the horizon, but it's a feature that I'd love to see. My work involves using lifelines in production, and having the scikit-learn wrapper would be great. Wondering if it has perhaps moved up in priority for you? Thank you!
    griffiri
    @griffiri
    Hi @CamDavidsonPilon checking open issues for formulaic https://github.com/matthewwardrop/formulaic/issues , is it one of these that you are waiting on? Not clear from the descriptions. Trying to get a better idea when formulaic release will happen, last release was nov 19.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @griffiri I suspect a new release is soon, as there has been a flurry of activity over the last few months. For the current version of formulaic, all my lifelines tests pass. What is lacking from the current version of formulaic is bs (basis splines), and the next version has that. bs was the biggest motivation for me to include formulas in lifelines in the first place - so I don't want to regress that feature for users.
    griffiri
    @griffiri
    thanks for info @CamDavidsonPilon , will keep an eye on formulaic
    Jane Wayne
    @jwayne2978
    for the CoxTimeVaryingFitter there are only 2 predict methods, predict_partial_hazard and predict_log_partial_hazard. how do i get the survival function like with CoxPHFitter.predict_survival_function()?
    Jane Wayne
    @jwayne2978
    @CamDavidsonPilon any thoughts on the question above?
    Jane Wayne
    @jwayne2978
    @CamDavidsonPilon i've posted the question on stack: https://stackoverflow.com/questions/63942882/how-do-i-compute-the-survival-function-from-coxtimevaryingfitter-in-pythons-lif (might be better to have it there so others may also benefit)
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @jwayne2978 you can't get the survival function. The problem is we don't know the covariates at all time (in particular: the future) so we can't know the hazard -> can't know the survival function. Note to: if we did know the covariates at some specific time, well, this means the subject is alive, so we don't need the survival function.
    Jane Wayne
    @jwayne2978
    @CamDavidsonPilon i've been playing with the outputs and also what you've said to me before. what does this operation give me? np.exp(-pd.DataFrame(np.outer(ctv.predict_partial_hazard(base_df), ctv.baseline_cumulative_hazard_['baseline hazard'])))
    it seems to give me a survival function for each pseudo observation
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    It is a survival function, but it doesn't make sense: It says that there is a less-than-1 probability that the subject is dead, but yet I have observations on it - so it defs is not dead!
    Tal Gutman
    @Talgutman_gitlab
    Hello :) I have a dataset with 4,200 features and about 9,000 patients. Can any of the lifeline models handle a dataset with so many features in a reasonable time frame? Do any of them use multiple CPUs in parallel? another question- what is the stopping criteria for the CoxPHFitter? number of iterations? delta smaller than X? thank you!
    1 reply
    Damian Bikiel
    @dbikiel

    Hi @CamDavidsonPilon! First of all, thanks for this awesome package.

    I am running Cox PH models trying to evaluate potential interactions between covariates and treatment. Initially I was using a likelihood ratio test (model with interaction - model without interaction) to decide its importance. But I was concern about overfitting the model (I don't have a lot of subjects) and decide to repeat the analysis doing monte carlo CV and measure the importance again. I feel that the permutation analysis should be more robust, what do you think? Thanks a lot for your time and sorry if the question is a bit general.

    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @dbikiel the latter is definitely more robust, but I suspect you will also see smaller effects. I think that's normal, though, but more "correct"
    1 reply