times
parameter. If this were possible, I think it might allow me to add seasonality to a competing risk model that captures the cumulative hazard of the outcome-of-interest. So I guess my question is two-fold. a) Is that possible with lifelines, b) Does that make sense for modeling competing risk.
class SeasonalHazardFitter(ParametericUnivariateFitter):
"""
The idea of this class would be to fit custom seasonality to an
exponential-like hazard model.
"""
_fitted_parameter_names = ['a_q1_', 'a_q2_', 'a_q3_', 'a_q4_' 'dates']
def _cumulative_hazard(self, params, times):
# Pull out fiscal quarters and dates corresponding to times.
# Each element of the dates array corresponds an element of the
# times array.
a_q1_, a_q2_, a_q3_, a_q4_, dates = params
# Call a function that associates fiscal quarter with date
quarters = get_fiscal_quarters(dates)
# Get the hazard for each time
q_lookup = {1: a_q1_, 2: a_q2_, 3: a_q3_, 4:a_q4}
hazards = np.array([q_lookup[quarter] for quarter in quarters])
# Return the cumulative hazard
# You'd have to be more careful to actually do the
# integration properly, but you get the idea.
return np.cumsum(hazards)
dates
are not an unknown. They are known constants. It makes sense that everything that goes into params should be unknown. Not sure what I was thinking there. Putting it in a global/class/instance variable makes sense. I just want to be sure I understand how _cumulative_hazard()
is called.params
: get tweaked by the optimizationtimes
: the times passed into the fitter as "durations"return
: The cumulative hazard encountered over the duration represented by each timedate
. So the cumulative hazard for each time would be the integral of the hazard from the "start_date" to the "end_date". (where these can be derived from an element of time
and its corresponding date.) What I really care about is the cumulative incidence function (CIF) for each kind of event. If the idea of getting dates
into the _cumulative_hazard
function works, then I was hoping to use this technique to model the CIF for one of the competing event types.
Your explanation of _cumulative_hazard
is correct. But you can also see it as simply the cumulative hazard you wish to implement (i.e., not necessary to think about "durations" or "unknowns")
I was thinking about your seasonal model, and actually tried to code something up, but there is a problem I think. The _cumulative_hazard
is invoked for both the censored and uncensored data, so your code needs to handle that (and you won't know which until you see the shapes of the input data)
Clock-dependent hazards I think are actually pretty common
Agree, but I feel like the common strategy is to use a regression model or fit N univariate models (i.e. partition the data)
I think a seasonal model is a great idea, so I want this to work.
@CamDavidsonPilon Let me know if you have any suggestions for question below:
Hi, I am using CoxPHFitter with IPS weights and
robust=True
flag. However, the fit is taking really long time to finish. I have about million instances and 6 features in my dataset. Let me know if slower runtime is expected in weighted version and what can be done to speed it up.
Hi all. I've somewhat new to using lifelines, and in using the CoxPHFitter, when I run check_assumptions
, I end up with an error that reads as follows: /RuntimeWarning: overflow encountered in exp scores = weights * np.exp(np.dot(X, self.params_))
Any suggestions on dealing with this issue? I'm starting down the road of normalization, but I'm not sure if that's 100% correct.