These are chat archives for atomix/atomix
@kuujo again, thanks for your time and detailed explanation.
If I understand you correctly my usecase (coordinating autoscale group from AWS) is not supported.
In order to maintain fault tolerance, an administrator must intervene to explicitly replace a gone-forever node with a new node to preserve fault tolerance guarantees. This is just the nature of consensus.
Not very clear to me what prevents cluster from promoting new nodes from PASSIVE/RESERVE to ACTIVE and restoring original quorum size. When partition is healed lost nodes will need to reset their state, become PASSIVE and resync again.
But if cluster will remember failed nodes forever those failed nodes will accumulate over time and this will potentially lead to performance degradation (at least in AWS setup where I just can't resurrect old node with same address, new nodes are spawned by system).