- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

Hi @dmitryuk, sure that can be done. Queue times fit perfectly into survival analysis. Since you suggest that when the client uploaded the doc is important, I would suggest that you use that feature (mapped to a cyclic variable¹) in a regression model. Ex:

```
from lifelines import WeibullAFTFitter
df['start_time'] = df['start_time'].map(map_to_seconds)
df['sin_start_time'] = np.sin(2*np.pi*df['start_time']/seconds_in_day)
df['cos_start_time'] = np.cos(2*np.pi*df['start_time']/seconds_in_day)
df = df.drop('start_time', axis=1)
wf = WeibullAFTFitter().fit(df, "duration")
wf.predict_survival_function(df)
wf.predict_median(df)
```

Since you want *how long left to wait*, you probably want to use the

`conditional_after`

kwarg in the `predict_*`

methods as well
@CamDavidsonPilon Thank you for your answer!

This way I prepared the data as

id(doc id)|start_from_week_seconds(seconds past from start of week after client uploaded doc)|duration(seconds spent to check the doc)

After code line executed

"StatisticalWarning: The diagonal of the variance*matrix* has negative values. This could be a problem with WeibullFitter's fit to the data."

Could you help to understand what is wrong in the code?

Simple code with data https://github.com/dmitryuk/lifetime_predict_queue

This way I prepared the data as

id(doc id)|start_from_week_seconds(seconds past from start of week after client uploaded doc)|duration(seconds spent to check the doc)

After code line executed

`wf = WeibullAFTFitter().fit(df, "duration")`

exception throw"StatisticalWarning: The diagonal of the variance

Could you help to understand what is wrong in the code?

Simple code with data https://github.com/dmitryuk/lifetime_predict_queue

@dmitryuk ah, ignore it, I need to suppress that. Also, make sure to drop the

`id`

col in your model
:wave: minor lifelines release. Better support for pickling! https://github.com/CamDavidsonPilon/lifelines/releases

Hello! I'm trying to predict failure of a few robots with a pretty substantial time-series dataset, and I've been looking at lifelines as a potential method for doing so. The time series data has a few instances of failure, and I'm trying to correlate a number of other variables we have data on (such as forward velocity, number of stationary hours, etc) with failure. In short, I'm trying to get a window in which to predict possible failure based on historical data. Should I be using survival regression for this?

@nravic I think you can use lifelines, but you're in the realm of recurrent events, which lifelines has only a little support for (there may be another package out there?). Since you have daily snapshots, you probably want to use time-varying regression: https://lifelines.readthedocs.io/en/latest/Time%20varying%20survival%20regression.html

Hey, I have a question concerning the concordance_index. I want to use my predicted cumulative hazard functions to compute the concordance_index and use them as predicted_scores. Is it the right way to sum up the chf of each sample and take the negative of it to compute the concordance_index on the basis of the cumulative hazard functions?

Hello. I'm trying to replicate the Weibull AFT model prediction section in the lifelines docs, but the return is all NANs from the predict_survival_function. Any thoughts on this? The code I used is :

```
from lifelines import WeibullAFTFitter
from lifelines.datasets import load_rossi
rossi_dataset = load_rossi()
aft = WeibullAFTFitter()
aft.fit(rossi_dataset, duration_col='week', event_col='arrest')
X = rossi_dataset.loc[:10]
aft.predict_survival_function(X)
```

@d-seki yes that's right, NH is that beta == 0

@julianspaeth depends on the model. Recall that the c-index *only* depends on ranking of values. For the Cox model, the summing the cumulative hazard won't change the ranking, so it won't matter what you use. For an AFT model, it may change the ranking.

Alternatively, you can choose a point in time, and use the CHF at that

@zxclcsq not good! Looks like I broke something...

I'll investigate asap

@zxclcsq for now, you must specify the

`times`

argument in `predict_survival_function`

:wave: Also, new minor release with some useful bug fixes: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.9

@CamDavidsonPilon I see there are estimators for cumulative hazard function, and it is as well in your mathematical links between entities diagram (nice one, BTW). What's the point (/advantage?) of introducing/estimating CHF in our survival analysis? It seems that all we need is hazard and survival functions, which have a direct transform. I can't explain the meaning of CHF, it doesn't bring anything, seems redundant... I'm reading about deep survival models (there's lots of papers and code lately) and they hardly mention it...

@bkos good question. A few points / advantages: i) The CHF is easier to estimate (less variance) than the hazard ii) The CHF, and the HF, are present in the likelihood equation for survival models, see equation (2.5) in https://cran.r-project.org/web/packages/flexsurv/vignettes/flexsurv.pdf iii) because of the "ease of differentiation" vs "hardness of integration", specifying the CHF and working out the HF is easier than the other way around, iv) it's 1-1 with the SF, that is, SF = exp(-CHF).

We originally thought it had to do with having multiple events with the same duration but that doesnt seem to be the case.

This seems to be the problem! Does anyone know how we would get around this until it is fixed?

@mitchgallerstein-toast hm, this sounds similar to the issue here: CamDavidsonPilon/lifelines#768

Can you confirm you're on the latest version (0.22.9 or 0.22.10)?

Can you confirm you're on the latest version (0.22.9 or 0.22.10)?

:wave: also minor release with some bug fixes: https://github.com/CamDavidsonPilon/lifelines/releases/tag/v0.22.10

hello! Im somewhat new to survival analysis, and I havent found any resources explaining why convergence would be poor when I have a variable that correlates strongly with being censored or not - it isnt correlated with the time to event for the uncensored data. I have a very small data set, and when I bootstrap sample it many times, I end up with combinations of the data where certain of my boolean variables correlate with the censoring variable. The link that lifelines provides is related to logistic regression, where a variable correlates strongly with the class label that you are trying to predict/model, which seems different than what is happening with survival analysis...thanks for any pointers!!

Im also curious what type of model CPHFitter uses for the baseline, but didnt see that in the documentation

@kdkaiser for your second question, it's the Breslow method, see https://stats.stackexchange.com/questions/46532/cox-baseline-hazard

Take a look at the Cox log-likelihood:

$ll(\beta) = \sum_{i:C_i = 1} X_i \beta - \log{\sum_{j: Y_i \ge Y_j} \theta_j}$