These are chat archives for atomix/atomix

8th
Jun 2016
Tom
@taobaorun
Jun 08 2016 02:14
@kuujo thanks,copycat directory config is error,all instances used the same directory.:cry:
Jordan Halterman
@kuujo
Jun 08 2016 02:34
Common mistake... The problem is servers are designed to recover from the log directory, so one server starts then the next recovers from and starts writing to the first server's logs. I guess there are some ways to mitigate it by storing and checking the server address.
David Moravek
@dmvk
Jun 08 2016 11:41
@kuujo will try make a code example tomorrow, so you can replicate the issue
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:47
It is possible to use the LeaderElectionExample with 2 nodes only - for master/slave election ?
David Moravek
@dmvk
Jun 08 2016 11:48
first one to join the distributed group becomes a leader?
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:48
Yep.
David Moravek
@dmvk
Jun 08 2016 11:48
so it is ;)
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:48
And if leader leaves the group, the other one should be leader...
Do i have to specify a ClusterManager with quorumhint of 1 ?
Actually is not working with 2 nodes only - or i am miss anything ?
David Moravek
@dmvk
Jun 08 2016 11:50
you mean 2 nodes of atomix replica?
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:50
Yep - correct
David Moravek
@dmvk
Jun 08 2016 11:50
or of clients trying to join distributed group
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:51
I need a leader election for 2 nodes only ....
David Moravek
@dmvk
Jun 08 2016 11:51
that shouldn't be possible, because you cannot have majority quorum
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:52
Yea i know that, but on the atomix clustering docs there is statet, that clusters < 3 nodes can be used ...
"Smaller clusters can be used but may lose write availability if a node fails."
David Moravek
@dmvk
Jun 08 2016 11:53
in general, I would say that all distributed consensus algorithms should have odd number of nodes in order to work properly
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:53
But with the atomix lib is it possible to configure a 2 node cluster to work ?
I 'only' need a master/slave decision
David Moravek
@dmvk
Jun 08 2016 11:56
thats not how Raft works, it needs quorum in order to accept writes or elect leader
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:57
a ok - i thought only for the write a quorum is needed - also for the leader election ...
David Moravek
@dmvk
Jun 08 2016 11:59
maybe it would start, but wouldn't be able to tolerate any failures
andreas-gilbert
@andreas-gilbert
Jun 08 2016 11:59
in case of a node failure the other should take over...
David Moravek
@dmvk
Jun 08 2016 11:59
ye, thats not possible
andreas-gilbert
@andreas-gilbert
Jun 08 2016 12:00
ok thanks for explanation
David Moravek
@dmvk
Jun 08 2016 12:00
because if you would split them, both members would be on minority side of partition
David Moravek
@dmvk
Jun 08 2016 12:09

Anyway, lets talk atomix client consistency guarantees :) Let's take DistributedGroup for example. I create a new group and join it. Then network partition comes in and unfortunately, my client happens to be on the minority side of a partition (or maybe cannot event connect to any node). After some time partition recovers, but client's session timed out. Default recoveryStrategy for atomix client is set to RECOVERY (and cannot be overridden right now).

Now when client is able to connect to quorum again, it gets brand new session id. The problem is, it is not a member of distributed group anymore. I guess atomix client should guarantee this. It's more of a philosophical question. I'm deciding whether to submit pull request for atomix client or to handle it in my application.

Another problem is, that members of distributed group are cached. It does sync() when resource opens, but then updates its state based on onJoin and onLeave events. So with new session ID it doesn't receive those on reconnect and doesn't do any "resync" either.

What do you think? Should be handled in atomix client or on "application side"? Thanks

andreas-gilbert
@andreas-gilbert
Jun 08 2016 14:21
In the meantime i found the jgroups toolkit - seems to work with 2 nodes ...
Madan Jampani
@madjam
Jun 08 2016 16:01
@davidmoravek This was exactly the issue I ran into. I have decided to handle it in my application (or primitive) as it is the one keeping tracking of listeners. But if we can I do prefer it to be a handled transparently by the framework.
Madan Jampani
@madjam
Jun 08 2016 16:10
@andreas-gilbert The problem with a 2 node consensus group is dealing with a network partition. If you expect each node to continue operating as if the the other node is down then this could lead to some conflicts. For instance: if you are using this to do some distributed locking, you’ll potentially violate mutual exclusion.
Madan Jampani
@madjam
Jun 08 2016 20:01
On the topic of resuming event delivery after a client’s session changes (post recovery), it may be possible to do it transparently if we associate listeners with a specific client (instead of a specific session). That way when it is time to publish an event, we publish as long as there is a valid session registered for that client.