These are chat archives for atomix/atomix

8th
Mar 2016
Joachim De Beule
@joachimdb
Mar 08 2016 09:52
and was @jhalterman able to solve those problems?
Joachim De Beule
@joachimdb
Mar 08 2016 10:02
Also, what's the rationale behind the fact that opening on a replica can throw a "not a member of the cluster" exception? This means I need to check whether opening a replica was successful and possibly retry etc., which ultimately puts the end responsibility of reaching consensus in the hands of the user again instead of Atomix, or am I missing something?
Richard Pijnenburg
@electrical
Mar 08 2016 11:19
How do you mean opening a replica? as in starting an AtomixReplica?
Joachim De Beule
@joachimdb
Mar 08 2016 12:11
I mean calling the open method on an AtomixReplica.
For instance, I create 5 replica's. I call open on each of them, but three out of 5 throw an exception with message "not member of the cluster", meaning that I have to call open on them again until success, right?
Richard Pijnenburg
@electrical
Mar 08 2016 12:13
hmm. i believe so yeah. unless it retries ( which it should i think )
Joachim De Beule
@joachimdb
Mar 08 2016 12:13
I expected that the open method would retry
indeed
so is this a bug or the expected behavior?
Richard Pijnenburg
@electrical
Mar 08 2016 12:16
Feels to me a bug but tbh i haven’t seen this issue happening yet with my tests ( 3 node cluster )
Jordan Halterman
@kuujo
Mar 08 2016 16:40
@joachimdb it seems like a misconfiguration of the cluster. That exception is thrown when a replica tries to join the cluster and isn't a member of the configuration. In other words, there's a cluster configuration that lists the members in the cluster. That replica's address is not one of those members. It could be a bug or misconfiguration. when you start the initial replicas, the list of members should be the exact same for all the replicas, e.g. 123.456.789.0:5000, 123.456.789.1:5000, 123.456.789.2:5000. This is the list you pass to the builder along with the address of the server, which is one of the addresses in that list. This annoying tidbit is fixed in the PRs that are being merged today.
The configuration issues are partially a limitation of consensus. In order to start the cluster, there needs to be an initial configuration (set of members) with replicas in it that can be elected leader. Once the initial cluster is started, additional replicas that aren't in the configuration can be added or removed
The examples correctly configure the cluster
For example, the leader election example takes a local host:port and some number of remote host:port. The local host:port is the first argument to builder and local and remote hosts:ports are the second.
Jordan Halterman
@kuujo
Mar 08 2016 16:45
This was changed in Copycat to have replicas always attempt to join the cluster at startup. If there's no cluster to join, they'll attempt to get elected leader. As long as only one replica is added to the cluster at a time, there's no possibility of split brain since a single partitioned replica cannot by itself form a quorum.
That change is still being propagated to Atomix. There are some issues to be worked out such as whether replicas that join the cluster should be allowed to impact the quorum size before the leader rebalances the cluster. For the time being, the membership list is still relevant so that new replicas don't impact the size of the quorum, but that will likely change.
Jordan Halterman
@kuujo
Mar 08 2016 16:51
The exception you're seeing can be found in the join(Iterator) method in ClusterState: https://github.com/atomix/copycat/blob/master/server/src/main/java/io/atomix/copycat/server/state/ClusterState.java When a join request is successful (meaning the joining member's configuration is consistent with the leader's) if the joining node is not in the configuration it throws that exception. Im curious if there's something we can do to prevent this from happening aside from the already pending configuration changes.
I'll try to reproduce it. I also wonder if it is possible the configuration is correct, but the Address was serialized/deserialized inconsistently. That could result in the node perceiving itself not to be present in the configuration when in fact it is.
Joachim De Beule
@joachimdb
Mar 08 2016 17:32
Thanks for the info kuujo! I'm pretty sure the configuration is correct. My "cluster" actually is a bunch of replica instances on different ports but running on the same JVM, don't know if that could explain things? If you want I can make a gist of my code throwing the exception (later, got to do some other stuff atm). It's in clojure though.
I also didn't manage to specify "nodes" using ip-addresses so far (only using "localhost", but not sth like "123.456.789.1"), but I have to look into that more closely
Jordan Halterman
@kuujo
Mar 08 2016 18:14
Interesting thanks. Actually, all of the Atomix tests use LocalTransport (an in-memory queue) to speed them up, but I frequently run the tests with NettyTransport to ensure everything works as expected in that context, and it seems like that should be the same environment. I'll try it again and see if anything might have changed. A gist is always helpful too :-)
Philip Lombardi
@plombardi89
Mar 08 2016 21:04
Is there any plan to support an eventual consistency mode for cases where there is a lack of need for strong consistency?
Jordan Halterman
@kuujo
Mar 08 2016 22:32
There are definitely plans for that, and continuous changes to some of the resources are moving us in that direction. In particular, a large PR will be merged to extend DistributedGroup to support consistent hashing and facilitate partitioning schemes. Eventually, resource state machines will be abstracted from the Raft consensus algorithm, and alternative algorithms can be plugged in to manage those state machines. The initial implementation will use consistent hashing. But there are some challenges to the state machine abstraction, in particular with server-to-client communication. Copycat facilitates event driven algorithms, allowing for very efficient locking and leader election algorithms. That abstraction probably needs to be translated to any alternative replication algorithms. There are a lot of issues there in terms of consistency that have to be worked out when that's done.
The idea for partitioning and consistent hashing in DistributedGroup is that some alternative resources can be built on it. But even before that happens, users can build eventually consistent algorithms on it as well.