These are chat archives for thunder-project/thunder

6th
Jun 2016
Nikita Vladimirov
@nvladimus
Jun 06 2016 15:57
so, with old Thunder I was able to run my job with 15 nodes, but with the new one the job crashes even with 25 nodes, at the stage of collecting betas.tolocal(). Does anyone have similar issues?
ERROR TaskSchedulerImpl: Lost executor 15 on h02u31.int.janelia.org: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Davis Bennett
@d-v-b
Jun 06 2016 15:58
i'm getting the same problems right now
doing something super simple
Nikita Vladimirov
@nvladimus
Jun 06 2016 15:59
did it work with new Thunder previously?
Jeremy Freeman
@freeman-lab
Jun 06 2016 16:00
@nvladimus @d-v-b is the full workflow here basically images.toseries() and then regression?
Nikita Vladimirov
@nvladimus
Jun 06 2016 16:01
yes
Davis Bennett
@d-v-b
Jun 06 2016 16:03
nope, i'm just doing images[a_few_timepoints].mean().toarray()
Jeremy Freeman
@freeman-lab
Jun 06 2016 16:05
@d-v-b ok so that doesn't even include a shuffle
and you weren't seeing this before, same versions of everything?
Davis Bennett
@d-v-b
Jun 06 2016 16:05
i was seeing this in the last few days
according to the spark app ui, there is shuffle reading and writing going on