by

## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
Rob deCarvalho
@robdmc
Here is a crude sketch of what I'd like to do.
class SeasonalHazardFitter(ParametericUnivariateFitter):
"""
The idea of this class would be to fit custom seasonality to an
exponential-like hazard model.
"""

_fitted_parameter_names = ['a_q1_', 'a_q2_', 'a_q3_', 'a_q4_' 'dates']
def _cumulative_hazard(self, params, times):
# Pull out fiscal quarters and dates corresponding to times.
# Each element of the dates array corresponds an element of the
# times array.
a_q1_, a_q2_, a_q3_, a_q4_, dates = params

# Call a function that associates fiscal quarter with date
quarters = get_fiscal_quarters(dates)

# Get the hazard for each time
q_lookup = {1: a_q1_, 2: a_q2_, 3: a_q3_, 4:a_q4}
hazards = np.array([q_lookup[quarter] for quarter in quarters])

# Return the cumulative hazard
# You'd have to be more careful to actually do the
# integration properly, but you get the idea.
return np.cumsum(hazards)
Cameron Davidson-Pilon
@CamDavidsonPilon

@robdmc, but dates isn't an unknown, is it? If not, if could be a global variable. If it is unknown, then I think you'll need to "flatten" it, i.e. one parameter for each element of the list.

Rob deCarvalho
@robdmc
@CamDavidsonPilon You are correct. dates are not an unknown. They are known constants. It makes sense that everything that goes into params should be unknown. Not sure what I was thinking there. Putting it in a global/class/instance variable makes sense. I just want to be sure I understand how _cumulative_hazard() is called.
params: get tweaked by the optimization
times: the times passed into the fitter as "durations"
return: The cumulative hazard encountered over the duration represented by each time
Is that right?
Rob deCarvalho
@robdmc
The process I am trying to model consists of two competing kinds of events. The hazard for each event is a function of date. So the cumulative hazard for each time would be the integral of the hazard from the "start_date" to the "end_date". (where these can be derived from an element of time and its corresponding date.) What I really care about is the cumulative incidence function (CIF) for each kind of event. If the idea of getting dates into the _cumulative_hazard function works, then I was hoping to use this technique to model the CIF for one of the competing event types.
Is this making sense?
Cameron Davidson-Pilon
@CamDavidsonPilon

Your explanation of _cumulative_hazard is correct. But you can also see it as simply the cumulative hazard you wish to implement (i.e., not necessary to think about "durations" or "unknowns")

I was thinking about your seasonal model, and actually tried to code something up, but there is a problem I think. The _cumulative_hazard is invoked for both the censored and uncensored data, so your code needs to handle that (and you won't know which until you see the shapes of the input data)

Cameron Davidson-Pilon
@CamDavidsonPilon
yea I don't know if this can be done... I'm playing with it locally, and having some trouble
I'll think more about it. Try to write down the hazard mathematically - I think the problem is that it is clock-time dependent.
Rob deCarvalho
@robdmc
Thank you for thinking about it. Clock-dependent hazards I think are actually pretty common
I love this interface you have for arbitrary models. If there was a way to hack that, it could be pretty useful.
Rob deCarvalho
@robdmc
maybe with (..., *args, **kwargs) to the _cumulative_hazard? I actually don't understand very well how _cumulative_hazard is used under the hood, so perhaps I'm spouting nonsense.
Cameron Davidson-Pilon
@CamDavidsonPilon
(..., *args, **kwargs) I was thinking about this, too

Clock-dependent hazards I think are actually pretty common

Agree, but I feel like the common strategy is to use a regression model or fit N univariate models (i.e. partition the data)

I think a seasonal model is a great idea, so I want this to work.

Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: new lifelines release: 0.22.0. Some important API changes to take a look at, but some really powerful new regression models: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.0
Julian Späth
@julianspaeth
Hi all, does lifelines somehow offer a Random Survival Forest? Or is there a specific reason why not? As there is no real python implementation of RSF and I want to implement it for my Master thesis, I was wondering if you are interested in including it into lifelines?
Cameron Davidson-Pilon
@CamDavidsonPilon
Hi Julian, lifelines does not have a RF model. Maybe scikit-learn survival does though.
lifelines has focused less on purely predictive models, and more on inference
Julian Späth
@julianspaeth
Hi Cameron, thank you for your answer. As far as I can see scikit-survival does not have a RF model. So I guess I need to implement it from scratch to use it in python. Thank you anyways 🙂
Cameron Davidson-Pilon
@CamDavidsonPilon
hm, I thought it was, okay - have fun!
Pedro Sola
@pedrosola
Hi everyone, I'm trying to fit a model onto a recurrent process. I.E: Patient returns to a doctor. Is there a way to do so using lifelines ? So far the closest that I've got was this repo: https://github.com/dunan/MultiVariatePointProcess
Cameron Davidson-Pilon
@CamDavidsonPilon
Lifelines has limited support for recurring events.
Unfortunately
mohit
@mohit-shrma
Hi, I am using CoxPHFitter with IPS weights and robust=True flag. However, the fit is taking really long time to finish. I have about million instances and 6 features in my dataset. Let me know if slower runtime is expected in weighted version and what can be done to speed it up.
mohit
@mohit-shrma

@CamDavidsonPilon Let me know if you have any suggestions for question below:

Hi, I am using CoxPHFitter with IPS weights and robust=True flag. However, the fit is taking really long time to finish. I have about million instances and 6 features in my dataset. Let me know if slower runtime is expected in weighted version and what can be done to speed it up.

Cameron Davidson-Pilon
@CamDavidsonPilon
hello! A million is a lot, much more than needed for only 6 features. I would suggest subsampling to 50k or even less, and checking the std. errors.
Cameron Davidson-Pilon
@CamDavidsonPilon
@mohit-shrma, another suggestion is to "collapse" similar rows and use weights. Ex: with only 6 variables, you likely have the same row appear twice. You can group these, assign that row an integer count, and use the weight_col argument in fit
Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: minor version of lifelines released: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.1
Robert Green
@rgreen13_gitlab

Hi all. I've somewhat new to using lifelines, and in using the CoxPHFitter, when I run check_assumptions, I end up with an error that reads as follows: /RuntimeWarning: overflow encountered in exp scores = weights * np.exp(np.dot(X, self.params_))

Any suggestions on dealing with this issue? I'm starting down the road of normalization, but I'm not sure if that's 100% correct.

Cameron Davidson-Pilon
@CamDavidsonPilon
@rgreen13_gitlab hi, thanks for reporting this. I'll create a bug issue around it. For now, you can try scaling and normalizing your matrix before calling .fit
Robert Green
@rgreen13_gitlab
Thanks!
sharmarishika
@sharmarishika
Hi! I am currently trying to create mixed cure models using the lifelines fitter. I saw that there is an example code in the GitHub under experiments. I was going to use this as a starting point and then adjust accordingly but I am getting an error when I run that code saying: "AttributeError: 'CureModel' object has no attribute '_primary_parameter_name'
I don't have a full understanding of the input arguments for _cumulative_hazard so I am not sure what is causing this error. Thank you!
Cameron Davidson-Pilon
@CamDavidsonPilon
@sharmarishika hi there. Are you using lifelines >= 0.22.0?
If not, try upgrading. Otherwise, if you are still getting the error, can you post the entire stack trace?
Cameron Davidson-Pilon
@CamDavidsonPilon
(Also, make sure you are subclassing ParametricRegressionFitter, and not ParametericAFTRegressionFitter)
sharmarishika
@sharmarishika
@CamDavidsonPilon ah - I think I'm using version 0.19.5! when i install through pip it says 'requirement already satisfied' - would you recommend a different way of upgrading?
in reference to the subclass my computer doesn't recognize ParametricRegressionFitter as an option but it does recognize ParametericRegressionFitter - perhaps also because of the version?
mohit
@mohit-shrma
hello! A million is a lot, much more than needed for only 6 features. I would suggest subsampling to 50k or even less, and checking the std. errors.
@CamDavidsonPilon thanks for the advice, I will try that idea.
Cameron Davidson-Pilon
@CamDavidsonPilon
@sharmarishika try pip install -U lifelines or pip install lifelines==0.21.1
sharmarishika
@sharmarishika
@CamDavidsonPilon i get an error saying that there are no matching distributions for lifelines 0.21.1 - in the list provided the most recent version is 0.19.5
Cameron Davidson-Pilon
@CamDavidsonPilon
is this conda?
sharmarishika
@sharmarishika
i was using pip, in my command line typed exactly what i mentioned above - sorry i'm quite new to this!
Cameron Davidson-Pilon
@CamDavidsonPilon
@sharmarishika ah, are you on Python2? 0.19.5 was the latest Py2 release. It's only Python3 now
sharmarishika
@sharmarishika
oh i see! ya i'm using python2.7 - i should look into doing the upgrade! thanks for the help!
Charlene Chambliss
Hi folks, does anyone have experience explaining concordance index to a nontechnical audience (like execs), or even devising an alternative method of presenting model accuracy? I don't think describing the model's predictions in terms of ordered pairs is likely to be of interest - they just want to know how accurate the model is in terms of customer retention/LTV.
Pedro Sobreiro
@pesobreiro
@blissfulchar_twitter personally I like using the survival probabilities to calculate the CLV assuming contractual settings (e.g. Berry & Linof, 2004). I don't like it for a non-technical audience. I normally I try to link survival to CLV for execs. Anyone correct me if I am wrong but concordance is "global" index for validating the predictive ability of a survival model, representing how well the variables allow to predict the survival, e.g. observations with higher survival time has the higher probability of survival predicted by your model.
Charlene Chambliss