These are chat archives for atomix/atomix

11th
Jan 2016
Jordan Halterman
@kuujo
Jan 11 2016 07:27
indeed
Richard Pijnenburg
@electrical
Jan 11 2016 07:29
Hiya
Uhg. Back to work for me after 2.5 weeks off
Jordan Halterman
@kuujo
Jan 11 2016 07:31
wow nice
sucks going back to work though
2.5 weeks makes you lazy
Richard Pijnenburg
@electrical
Jan 11 2016 07:31
Hehe yeah ;)
Richard Pijnenburg
@electrical
Jan 11 2016 07:37
One thing is for sure. I should have gone to bed way earlier then 1 am and have my alarm at 6. Lol
Jordan Halterman
@kuujo
Jan 11 2016 07:38
I did my experiment with sharding and abandoned it again. Any solution I can implement in a week would be half assed. So instead I just finished porting some of the work I did a while back to allow clusters to dynamically scale in Atomix. The thing that has always annoyed me about AtomixReplica is that it’s still limited to 3 or 5 nodes. You can’t just start n number of replicas and let the system figure out the replication itself. But now you can, or will be able to. Essentially, the user defines the number of Raft nodes and the number of backup nodes per Raft node, and Atomix dynamically scales the Raft cluster as nodes are added and removed. If a Raft voting member dies or is partitioned for long enough, Atomix can replace it by promoting another replica to prevent loss to availability. Most importantly, this means you should be able to start 20 replicas and tell it you want 5 Raft nodes, and the cluster will continue operating even if the original 5 Raft nodes die. Probably will take a couple more days to go through all the testing.
It also adds a gossip-like replication protocol to Raft followers, so the cluster can potentially replicate to many more nodes very efficiently.
Richard Pijnenburg
@electrical
Jan 11 2016 11:45
Hiya. sorry. went to work shortly after you sent this so haven't had time to reply
Any specific reason that Raft is limited to 3 or 5 nodes? or is that a limitation inside atomix?
Jordan Halterman
@kuujo
Jan 11 2016 18:28
@electrical That's a limitation of Raft. Well, it's not technically limited to 3 or 5 nodes, that's just a typical cluster size. Raft requires that a write be stored on a majority of the cluster to be committed, and so having an odd number of Raft nodes provides the greatest level of fault tolerance. A 3 node cluster can tolerate 1 failure, but a 2 node cluster can tolerate 0 failures. A 5 node cluster can tolerate 2 failures, but a 4 node cluster can tolerate 1. What this change to Atomix allows is for cluster sizes to greatly exceed that number. Atomix will ensure that the desired number of nodes participate in the Raft protocol, and the rest either participate in gossip or are just standby nodes. This in theory allows individual nodes to be more ignorant of the structure of the system. Atomix's current structure generally requires that three or five nodes create an AtomixReplica and the rest create an AtomixClient or you use a separate set of AtomixServers with clients. But in this case, you can just create an AtomixReplica on each node and let Atomix figure it out. The user provides a quorumHint and Atomix will ensure at least that many Raft nodes exist at all times. More interestingly, this also provides an additional level of fault tolerance. If a Raft node dies, Atomix can replace it with a non-Raft node to bring the number of alive Raft nodes back to quorumHintand thus return the actual fault tolerance to the desired level. This allows it to behave more closely to HA systems where you can arbitrarily add and remove nodes, though adding and remove nodes is still done in a safe manner, and too many Raft nodes failing simultaneously can still cause a loss of availability. i.e. if the quorumHint is 3 and two Raft nodes die at the same time, the cluster will become unavailable because Atomix/Copycat still use the Raft algorithm to do cluster configuration changes.
When cluster sizes exceed the quorumHit, Raft followers begin replicating state changes to additional nodes. So, if the quorumHint is 3 and you add a fourth node to the cluster, one of the Raft followers will replicate committed entries to the new node. Clients can potentially read from that additional node and still maintain sequential consistency, and in the event that a Raft node fails it can be more quickly replaced by the fourth node since that node is already up-to-date. Copycat doesn't allow a new Raft node to join the existing Raft cluster until its state is caught up to prevent availability issues, but this ensures that process of replacing a Raft node takes only a few seconds.
Jordan Halterman
@kuujo
Jan 11 2016 18:34
So, the pattern of replication in large clusters is leader to followers, and then followers to passive nodes. You can also create nodes that don't actually maintain any state - reserve nodes - but can still replace a Raft node. Atomix allows the user to define the number of Raft nodes and the number of passive nodes per Raft node, and the rest are made into reserve nodes.
Then, you can create a cluster of 100 AtomixReplicas and let them figure it out amongst themselves, though that should not be misconstrued as scalability. You're scaling the number of replicas that can be supported, but not the amount of state or the throughput of state changes.