These are chat archives for atomix/atomix

19th
Mar 2017
Jordan Halterman
@kuujo
Mar 19 2017 14:30 UTC

I didn't read it today, but I read it a couple weeks ago :-)

I sort of take issue with some of the claims about leader based consensus protocols. A multi-leader cluster is not a leaderless cluster. Having multiple leaders doesn't mean you don't lose availability when a node crashes. You usually just lose partial availability instead of total availability. But Raft doesn't have any mechanism to balance leader's to ensure they won't all get elected on the same node, so theoretically you can still briefly lose total availability when a node goes down if all partitions elected leaders on that node. And if you spread the partitions across different sections of a larger cluster, then suddenly your cluster may not even be able to tolerate the failure of a minority of nodes before becoming unavailable. Many operations also necessarily must span all partitions e.g. via 2PC, and those operations become unavailable during a leader election within any partition.

I also wouldn't even equate a leader election to a loss of availability - even though I'm doing so here - at least in most Raft implementations. From the client's perspective, in systems like Copycat leader elections are indistinguishable from a slow cluster. A leader election is a period in which latency is merely increased, but requests are not rejected. The loss of availability implies a system that cannot progress. Client requests are necessarily rejected as the cluster cannot reach. This only occurs when a majority of the cluster is down, which prevents the cluster from progressing until a node comes back up. Conversely, a cluster that merely lacks a leader can reach consensus by electing one, it just takes a bit longer than usual.

Anyways, multiple leaders <> no leaders. This is what we do in ONOS, and while it does improve availability in some respects, it weakens it in others. The real benefit to partitioned leader-based consensus clusters is performance, not one second of availability (which partitioning doesn't technically achieve anyways). That work will eventually make its way back into Atomix.

But I just take issue with the leader-based leader-less arguments. Single Decree Paxos is interesting.
reach consensus*
Jon Hall
@jhall11
Mar 19 2017 16:38 UTC
hmm, thats sort of what I was thinking, by increasing the number of leaders, you are just pushing down the “unavailability”/increased latency to a smaller subset of the cluster.