Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Cameron Davidson-Pilon
    @CamDavidsonPilon

    @robdmc, but dates isn't an unknown, is it? If not, if could be a global variable. If it is unknown, then I think you'll need to "flatten" it, i.e. one parameter for each element of the list.

    Can you tell me more about this seasonal model?

    Rob deCarvalho
    @robdmc
    @CamDavidsonPilon You are correct. dates are not an unknown. They are known constants. It makes sense that everything that goes into params should be unknown. Not sure what I was thinking there. Putting it in a global/class/instance variable makes sense. I just want to be sure I understand how _cumulative_hazard() is called.
    params: get tweaked by the optimization
    times: the times passed into the fitter as "durations"
    return: The cumulative hazard encountered over the duration represented by each time
    Is that right?
    Rob deCarvalho
    @robdmc
    The process I am trying to model consists of two competing kinds of events. The hazard for each event is a function of date. So the cumulative hazard for each time would be the integral of the hazard from the "start_date" to the "end_date". (where these can be derived from an element of time and its corresponding date.) What I really care about is the cumulative incidence function (CIF) for each kind of event. If the idea of getting dates into the _cumulative_hazard function works, then I was hoping to use this technique to model the CIF for one of the competing event types.
    Is this making sense?
    Cameron Davidson-Pilon
    @CamDavidsonPilon

    Your explanation of _cumulative_hazard is correct. But you can also see it as simply the cumulative hazard you wish to implement (i.e., not necessary to think about "durations" or "unknowns")

    I was thinking about your seasonal model, and actually tried to code something up, but there is a problem I think. The _cumulative_hazard is invoked for both the censored and uncensored data, so your code needs to handle that (and you won't know which until you see the shapes of the input data)

    Cameron Davidson-Pilon
    @CamDavidsonPilon
    yea I don't know if this can be done... I'm playing with it locally, and having some trouble
    I'll think more about it. Try to write down the hazard mathematically - I think the problem is that it is clock-time dependent.
    Rob deCarvalho
    @robdmc
    Thank you for thinking about it. Clock-dependent hazards I think are actually pretty common
    I love this interface you have for arbitrary models. If there was a way to hack that, it could be pretty useful.
    Rob deCarvalho
    @robdmc
    maybe with (..., *args, **kwargs) to the _cumulative_hazard? I actually don't understand very well how _cumulative_hazard is used under the hood, so perhaps I'm spouting nonsense.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    (..., *args, **kwargs) I was thinking about this, too

    Clock-dependent hazards I think are actually pretty common

    Agree, but I feel like the common strategy is to use a regression model or fit N univariate models (i.e. partition the data)

    I think a seasonal model is a great idea, so I want this to work.

    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: new lifelines release: 0.22.0. Some important API changes to take a look at, but some really powerful new regression models: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.0
    Julian Späth
    @julianspaeth
    Hi all, does lifelines somehow offer a Random Survival Forest? Or is there a specific reason why not? As there is no real python implementation of RSF and I want to implement it for my Master thesis, I was wondering if you are interested in including it into lifelines?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Hi Julian, lifelines does not have a RF model. Maybe scikit-learn survival does though.
    lifelines has focused less on purely predictive models, and more on inference
    Julian Späth
    @julianspaeth
    Hi Cameron, thank you for your answer. As far as I can see scikit-survival does not have a RF model. So I guess I need to implement it from scratch to use it in python. Thank you anyways 🙂
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    hm, I thought it was, okay - have fun!
    Pedro Sola
    @pedrosola
    Hi everyone, I'm trying to fit a model onto a recurrent process. I.E: Patient returns to a doctor. Is there a way to do so using lifelines ? So far the closest that I've got was this repo: https://github.com/dunan/MultiVariatePointProcess
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    Lifelines has limited support for recurring events.
    Unfortunately
    mohit
    @mohit-shrma
    Hi, I am using CoxPHFitter with IPS weights and robust=True flag. However, the fit is taking really long time to finish. I have about million instances and 6 features in my dataset. Let me know if slower runtime is expected in weighted version and what can be done to speed it up.
    mohit
    @mohit-shrma

    @CamDavidsonPilon Let me know if you have any suggestions for question below:

    Hi, I am using CoxPHFitter with IPS weights and robust=True flag. However, the fit is taking really long time to finish. I have about million instances and 6 features in my dataset. Let me know if slower runtime is expected in weighted version and what can be done to speed it up.

    Cameron Davidson-Pilon
    @CamDavidsonPilon
    hello! A million is a lot, much more than needed for only 6 features. I would suggest subsampling to 50k or even less, and checking the std. errors.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @mohit-shrma, another suggestion is to "collapse" similar rows and use weights. Ex: with only 6 variables, you likely have the same row appear twice. You can group these, assign that row an integer count, and use the weight_col argument in fit
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: minor version of lifelines released: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.1
    Robert Green
    @rgreen13_gitlab

    Hi all. I've somewhat new to using lifelines, and in using the CoxPHFitter, when I run check_assumptions, I end up with an error that reads as follows: /RuntimeWarning: overflow encountered in exp scores = weights * np.exp(np.dot(X, self.params_))

    Any suggestions on dealing with this issue? I'm starting down the road of normalization, but I'm not sure if that's 100% correct.

    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @rgreen13_gitlab hi, thanks for reporting this. I'll create a bug issue around it. For now, you can try scaling and normalizing your matrix before calling .fit
    Robert Green
    @rgreen13_gitlab
    Thanks!
    sharmarishika
    @sharmarishika
    Hi! I am currently trying to create mixed cure models using the lifelines fitter. I saw that there is an example code in the GitHub under experiments. I was going to use this as a starting point and then adjust accordingly but I am getting an error when I run that code saying: "AttributeError: 'CureModel' object has no attribute '_primary_parameter_name'
    I don't have a full understanding of the input arguments for _cumulative_hazard so I am not sure what is causing this error. Thank you!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @sharmarishika hi there. Are you using lifelines >= 0.22.0?
    If not, try upgrading. Otherwise, if you are still getting the error, can you post the entire stack trace?
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    (Also, make sure you are subclassing ParametricRegressionFitter, and not ParametericAFTRegressionFitter)
    sharmarishika
    @sharmarishika
    @CamDavidsonPilon ah - I think I'm using version 0.19.5! when i install through pip it says 'requirement already satisfied' - would you recommend a different way of upgrading?
    in reference to the subclass my computer doesn't recognize ParametricRegressionFitter as an option but it does recognize ParametericRegressionFitter - perhaps also because of the version?
    mohit
    @mohit-shrma
    hello! A million is a lot, much more than needed for only 6 features. I would suggest subsampling to 50k or even less, and checking the std. errors.
    @CamDavidsonPilon thanks for the advice, I will try that idea.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @sharmarishika try pip install -U lifelines or pip install lifelines==0.21.1
    sharmarishika
    @sharmarishika
    @CamDavidsonPilon i get an error saying that there are no matching distributions for lifelines 0.21.1 - in the list provided the most recent version is 0.19.5
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    is this conda?
    sharmarishika
    @sharmarishika
    i was using pip, in my command line typed exactly what i mentioned above - sorry i'm quite new to this!
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    @sharmarishika ah, are you on Python2? 0.19.5 was the latest Py2 release. It's only Python3 now
    sharmarishika
    @sharmarishika
    oh i see! ya i'm using python2.7 - i should look into doing the upgrade! thanks for the help!
    Charlene Chambliss
    @blissfulchar_twitter
    Hi folks, does anyone have experience explaining concordance index to a nontechnical audience (like execs), or even devising an alternative method of presenting model accuracy? I don't think describing the model's predictions in terms of ordered pairs is likely to be of interest - they just want to know how accurate the model is in terms of customer retention/LTV.
    Pedro Sobreiro
    @pesobreiro
    @blissfulchar_twitter personally I like using the survival probabilities to calculate the CLV assuming contractual settings (e.g. Berry & Linof, 2004). I don't like it for a non-technical audience. I normally I try to link survival to CLV for execs. Anyone correct me if I am wrong but concordance is "global" index for validating the predictive ability of a survival model, representing how well the variables allow to predict the survival, e.g. observations with higher survival time has the higher probability of survival predicted by your model.
    Charlene Chambliss
    @blissfulchar_twitter
    @pesobreiro This is a nice idea! And definitely in line with how we're currently asking retention questions at my co. Did you do that by predicting the survival function for each individual and then multiplying by customer LTV accordingly?
    Pedro Sobreiro
    @pesobreiro
    @blissfulchar_twitter we used the survival probabilities under each curve (cohort) and the monthly payment to calculate CLV. We didn’t used individual customer but customers grouped in the survival curves. This option as some limitations but gives us an idea for an estimated CLV. What you say should be very interesting. I think there are other approaches to calculate the predictions of individual CLV.
    Cameron Davidson-Pilon
    @CamDavidsonPilon
    :wave: minor lifelines release. Important thing is that scipy 1.3 can be used with it now: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.2
    Charlene Chambliss
    @blissfulchar_twitter
    Thanks @pesobreiro, this is helpful :)
    Alessandro Nesti
    @aleva85
    Hi, what is the best way to retrieve log likelihood of a fit? it is shown via 'model.print_summary()' but not via 'model.summary', which only shows a summary of the parameters.
    I managed to get it via model._log_likelihood, but had to look into the source code for that.
    Thanks and kudos for the library!