These are chat archives for atomix/atomix

13th
Jun 2016
andreas-gilbert
@andreas-gilbert
Jun 13 2016 12:44
@kuujo thanks for information - with jgroups i can set up a cluster of two nodes, if one node fails, the other will take over (distributed lock)..
Jordan Halterman
@kuujo
Jun 13 2016 16:23
@andreas-gilbert that also means with jgroups, if a partition occurs between those two nodes, the node that doesn't hold the lock can take over the lock, while he node that remains retains its lock. Suddenly you have split brain where each node believes itself to hold the lock. This is the problem that consensus solves, and it's the reason a majority is required to make changes to the state of the system.
The problem is, failure detection in distributed systems is impossible. Thus, there is no perceivable difference between a partitioned node and a failed node. So, "if one node fails" means if one node fails or one node is actually still alive and still doing what it thinks its supposed to be doing but can't talk to the other node.
Jordan Halterman
@kuujo
Jun 13 2016 16:28
Consensus ensures that two sides of such a partition cannot both progress
The side with a majority will take over, and the side with a minority will stop. In the case of a lock, the side with a majority will acquire the lock, and the side with a minority will release it.
Atomix is designed for systems where its critical that things like locks maintain that level of consistency. JGroups and other systems are not. They are designed for high availability and sacrifice consistency in those types of cases.
Jonathan Halterman
@jhalterman
Jun 13 2016 16:49
@andreas-gilbert I would use the word safety. Atomix provides safety in knowing that when you obtain a lock, for example, that lock will never be obtained by two processes at once, regardless of what sort of failures might happen. Aphyr's Jepsen series of articles on testing distributed datastores under failure shows that without safety guarantees, bad things can happen: https://aphyr.com/tags/Jepsen But of course, the safety comes at a cost, in that (among other things) it requires majorities for decision making.
Jordan Halterman
@kuujo
Jun 13 2016 16:56
And actually, even Raft/Copycat/Atomix can't technically guarantee two processes can't hold the same lock at the same time. Reality may be even more complex than that, and additional mechanisms are required to preserve the semantics of distributed locks. Those mechanisms are discussed in DistributedLock documentation and in this article: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html But I think that demonstrates that safe distributed locks are not really as simple as two nodes deciding which holds a lock. As I mentioned, the difference between a crashed, partitioned, or slow node in a distributed system is imperceptible. And systems like JGroups and Hazelcast are a long way from solving those problems.