These are chat archives for atomix/atomix

25th
Jun 2017
Jordan Halterman
@kuujo
Jun 25 2017 10:17

I'm not taking about clusters being static in general, I'm talking clusters being generally static in consensus based systems. Adding and removing nodes can be extremely expensive, and arbitrarily removing nodes from a cluster impacts fault tolerance. For these reasons, there's little benefit to completely removing a node from a cluster.

What Atomix 2.0 will have is slightly more dynamic clusters, but it will never forget nodes without administrator intervention. Atomix can't do what Hazelcast does (use multicast for cluster management) because of the consensus protocol. But we can control which nodes participate in the consensus protocol based on liveness. Atomix 2.0 uses a phi accrual failure detector to determine when nodes become unavailable and will rebalance partitions when a node goes down, but the node will never be forgotten without administrator intervention, because forgetting a node removes the possibility of rebalancing partitions for better performance if a node eventually comes back online. Administrators will always have to explicitly add or remove nodes from the cluster.

Jordan Halterman
@kuujo
Jun 25 2017 10:31
Removing nodes without administrator intervention implies a highly available protocol, which Raft is not. If a cluster is deployed with 5 node partitions (which can tolerate two failures each and guarantee redundancy of three nodes), automatically removing nodes from the cluster impacts those guarantees. Because Raft is not an AP algorithm, it would be difficult to find a point at which a peer can be reliably considered dead and never to return. In order to maintain fault tolerance, an administrator must intervene to explicitly replace a gone-forever node with a new node to preserve fault tolerance guarantees. This is just the nature of consensus. Atomix 2.0 can only safely remove nodes down to the size of any one partition. If a 5 node cluster is deployed with 5 partitions of 3 nodes each, we can conceivably remove 2 nodes since all partitions can exist on the same 3 remaining nodes. But TBH, I'm not certain what the value in aggressively removing those nodes is, particularly considering it's impossible to distinguish crashed nodes from partitioned nodes.
Jordan Halterman
@kuujo
Jun 25 2017 10:38
The reason we have to be careful with dynamically modifying the cluster is because it risks split brain. If two sides of a partition believe nodes on the other side to be dead and remove them from their configuration, it will effectively result in two separate clusters with totally divergent state. For that reason, all configuration changes must go through the Raft protocol, and the Raft protocol has strict requirements for when/how configuration changes can occur. If a 5 node cluster is partitioned into two halves and the majority half removes the minority nodes, the quorum size will shrink and so will fault tolerance guarantees. When the partition is healed, the entire state of the remaining nodes will have to be copied again, placing the cluster under major load. Atomix 2.0 will never change the quorum size within a partition nor change the number of partitions (eventually this will be implemented), it will only change the nodes to which members of a partition are assigned.
Some protocols in Atomix 2.0 are high availability (gossip/anti-entropy/failure detection), and we'll eventually have client nodes that do not embed the Raft protocol and can therefore scale eventually consistent protocols in this manner, but the core cluster can only be dynamically scaled down to the size of a single partition.