## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
• Create your own community
##### Activity
Rob deCarvalho
@robdmc
Is this making sense?
Cameron Davidson-Pilon
@CamDavidsonPilon

Your explanation of _cumulative_hazard is correct. But you can also see it as simply the cumulative hazard you wish to implement (i.e., not necessary to think about "durations" or "unknowns")

I was thinking about your seasonal model, and actually tried to code something up, but there is a problem I think. The _cumulative_hazard is invoked for both the censored and uncensored data, so your code needs to handle that (and you won't know which until you see the shapes of the input data)

Cameron Davidson-Pilon
@CamDavidsonPilon
yea I don't know if this can be done... I'm playing with it locally, and having some trouble
I'll think more about it. Try to write down the hazard mathematically - I think the problem is that it is clock-time dependent.
Rob deCarvalho
@robdmc
Thank you for thinking about it. Clock-dependent hazards I think are actually pretty common
I love this interface you have for arbitrary models. If there was a way to hack that, it could be pretty useful.
Rob deCarvalho
@robdmc
maybe with (..., *args, **kwargs) to the _cumulative_hazard? I actually don't understand very well how _cumulative_hazard is used under the hood, so perhaps I'm spouting nonsense.
Cameron Davidson-Pilon
@CamDavidsonPilon
(..., *args, **kwargs) I was thinking about this, too

Clock-dependent hazards I think are actually pretty common

Agree, but I feel like the common strategy is to use a regression model or fit N univariate models (i.e. partition the data)

I think a seasonal model is a great idea, so I want this to work.

Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: new lifelines release: 0.22.0. Some important API changes to take a look at, but some really powerful new regression models: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.0
Julian Späth
@julianspaeth
Hi all, does lifelines somehow offer a Random Survival Forest? Or is there a specific reason why not? As there is no real python implementation of RSF and I want to implement it for my Master thesis, I was wondering if you are interested in including it into lifelines?
Cameron Davidson-Pilon
@CamDavidsonPilon
Hi Julian, lifelines does not have a RF model. Maybe scikit-learn survival does though.
lifelines has focused less on purely predictive models, and more on inference
Julian Späth
@julianspaeth
Hi Cameron, thank you for your answer. As far as I can see scikit-survival does not have a RF model. So I guess I need to implement it from scratch to use it in python. Thank you anyways 🙂
Cameron Davidson-Pilon
@CamDavidsonPilon
hm, I thought it was, okay - have fun!
Pedro Sola
@pedrosola
Hi everyone, I'm trying to fit a model onto a recurrent process. I.E: Patient returns to a doctor. Is there a way to do so using lifelines ? So far the closest that I've got was this repo: https://github.com/dunan/MultiVariatePointProcess
Cameron Davidson-Pilon
@CamDavidsonPilon
Lifelines has limited support for recurring events.
Unfortunately
mohit
@mohit-shrma
Hi, I am using CoxPHFitter with IPS weights and robust=True flag. However, the fit is taking really long time to finish. I have about million instances and 6 features in my dataset. Let me know if slower runtime is expected in weighted version and what can be done to speed it up.
mohit
@mohit-shrma

@CamDavidsonPilon Let me know if you have any suggestions for question below:

Hi, I am using CoxPHFitter with IPS weights and robust=True flag. However, the fit is taking really long time to finish. I have about million instances and 6 features in my dataset. Let me know if slower runtime is expected in weighted version and what can be done to speed it up.

Cameron Davidson-Pilon
@CamDavidsonPilon
hello! A million is a lot, much more than needed for only 6 features. I would suggest subsampling to 50k or even less, and checking the std. errors.
Cameron Davidson-Pilon
@CamDavidsonPilon
@mohit-shrma, another suggestion is to "collapse" similar rows and use weights. Ex: with only 6 variables, you likely have the same row appear twice. You can group these, assign that row an integer count, and use the weight_col argument in fit
Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: minor version of lifelines released: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.1
Robert Green
@rgreen13_gitlab

Hi all. I've somewhat new to using lifelines, and in using the CoxPHFitter, when I run check_assumptions, I end up with an error that reads as follows: /RuntimeWarning: overflow encountered in exp scores = weights * np.exp(np.dot(X, self.params_))

Any suggestions on dealing with this issue? I'm starting down the road of normalization, but I'm not sure if that's 100% correct.

Cameron Davidson-Pilon
@CamDavidsonPilon
@rgreen13_gitlab hi, thanks for reporting this. I'll create a bug issue around it. For now, you can try scaling and normalizing your matrix before calling .fit
Robert Green
@rgreen13_gitlab
Thanks!
sharmarishika
@sharmarishika
Hi! I am currently trying to create mixed cure models using the lifelines fitter. I saw that there is an example code in the GitHub under experiments. I was going to use this as a starting point and then adjust accordingly but I am getting an error when I run that code saying: "AttributeError: 'CureModel' object has no attribute '_primary_parameter_name'
I don't have a full understanding of the input arguments for _cumulative_hazard so I am not sure what is causing this error. Thank you!
Cameron Davidson-Pilon
@CamDavidsonPilon
@sharmarishika hi there. Are you using lifelines >= 0.22.0?
If not, try upgrading. Otherwise, if you are still getting the error, can you post the entire stack trace?
Cameron Davidson-Pilon
@CamDavidsonPilon
(Also, make sure you are subclassing ParametricRegressionFitter, and not ParametericAFTRegressionFitter)
sharmarishika
@sharmarishika
@CamDavidsonPilon ah - I think I'm using version 0.19.5! when i install through pip it says 'requirement already satisfied' - would you recommend a different way of upgrading?
in reference to the subclass my computer doesn't recognize ParametricRegressionFitter as an option but it does recognize ParametericRegressionFitter - perhaps also because of the version?
mohit
@mohit-shrma
hello! A million is a lot, much more than needed for only 6 features. I would suggest subsampling to 50k or even less, and checking the std. errors.
@CamDavidsonPilon thanks for the advice, I will try that idea.
Cameron Davidson-Pilon
@CamDavidsonPilon
@sharmarishika try pip install -U lifelines or pip install lifelines==0.21.1
sharmarishika
@sharmarishika
@CamDavidsonPilon i get an error saying that there are no matching distributions for lifelines 0.21.1 - in the list provided the most recent version is 0.19.5
Cameron Davidson-Pilon
@CamDavidsonPilon
is this conda?
sharmarishika
@sharmarishika
i was using pip, in my command line typed exactly what i mentioned above - sorry i'm quite new to this!
Cameron Davidson-Pilon
@CamDavidsonPilon
@sharmarishika ah, are you on Python2? 0.19.5 was the latest Py2 release. It's only Python3 now
sharmarishika
@sharmarishika
oh i see! ya i'm using python2.7 - i should look into doing the upgrade! thanks for the help!
Charlene Chambliss
@blissfulchar_twitter
Hi folks, does anyone have experience explaining concordance index to a nontechnical audience (like execs), or even devising an alternative method of presenting model accuracy? I don't think describing the model's predictions in terms of ordered pairs is likely to be of interest - they just want to know how accurate the model is in terms of customer retention/LTV.
Pedro Sobreiro
@pesobreiro
@blissfulchar_twitter personally I like using the survival probabilities to calculate the CLV assuming contractual settings (e.g. Berry & Linof, 2004). I don't like it for a non-technical audience. I normally I try to link survival to CLV for execs. Anyone correct me if I am wrong but concordance is "global" index for validating the predictive ability of a survival model, representing how well the variables allow to predict the survival, e.g. observations with higher survival time has the higher probability of survival predicted by your model.
Charlene Chambliss
@blissfulchar_twitter
@pesobreiro This is a nice idea! And definitely in line with how we're currently asking retention questions at my co. Did you do that by predicting the survival function for each individual and then multiplying by customer LTV accordingly?
Pedro Sobreiro
@pesobreiro
@blissfulchar_twitter we used the survival probabilities under each curve (cohort) and the monthly payment to calculate CLV. We didn’t used individual customer but customers grouped in the survival curves. This option as some limitations but gives us an idea for an estimated CLV. What you say should be very interesting. I think there are other approaches to calculate the predictions of individual CLV.
Cameron Davidson-Pilon
@CamDavidsonPilon
:wave: minor lifelines release. Important thing is that scipy 1.3 can be used with it now: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.2
Charlene Chambliss
@blissfulchar_twitter
Thanks @pesobreiro, this is helpful :)
A N
@aleva85
Hi, what is the best way to retrieve log likelihood of a fit? it is shown via 'model.print_summary()' but not via 'model.summary', which only shows a summary of the parameters.
I managed to get it via model._log_likelihood, but had to look into the source code for that.
Thanks and kudos for the library!
Cameron Davidson-Pilon
@CamDavidsonPilon
Hi @aleva85, that currently is the best way, but you bring up a good observation that it’s not easy to find. Maybe in a future release I’ll promote it and document it well
Cameron Davidson-Pilon
@CamDavidsonPilon
FYI I've been playing around with pure-python & autograd neural nets for better prediction. I may make this it's own experimental package. Need a good name for it though
Cameron Davidson-Pilon
@CamDavidsonPilon
lifelike, lifenets?