These are chat archives for atomix/atomix

24th
Jun 2017
Luca Burgazzoli
@lburgazzoli
Jun 24 2017 06:50
@kuujo is there any eta for 2.0
?
Michael
@mkulak
Jun 24 2017 10:18
I have autoscaling group on AWS with 4-8 EC2 instances. I want to coordinate their work with atomix and I have a couple of questions:
  1. How to create cluster at startup? My best guess is to start 1 machine, make it bootstrap the cluster then start all other machines and make them join. But this looks very suboptimal, I would like to do this in one step. Is there standard recipe for this?
  2. When I locally run several JVM processes and make them join one group and then kill one of them, I see leader of the group tries to reconnect to dead nodes for some time (long after they disappear from the list of group members). When leader will "forget" about those dead nodes?
Michael
@mkulak
Jun 24 2017 10:29
  1. I start 1 node and bootstrap the cluster. Then start 3 more nodes and make them join cluster (by providing just 1 address of the first node). Then all nodes join one group. Then I kill the first node. I expect that new leader will be elected. In fact all the nodes are trying to reconnect to first node, group membership and leader not changed (even though leader is dead for several minutes). How can I fix it? (make them elect new leader after previous one died). I use version 1.0.8.
Jordan Halterman
@kuujo
Jun 24 2017 10:57
@lburgazzoli I am hoping to have an initial release around the time of the next ONOS release, which is around the end of August IIRC. Top priority, though, for now is getting the Raft refactoring done and stable. I have four weeks to finish that and then the rest of the code (partitions, primitives, transactions, etc) will be migrated over.
I'm trying to get it released around the same time as our next ONOS release so the following release I can start deleting code from ONOS itself
So it's a pretty firm deadline, otherwise I may have to wait three more months
@mkulak
  1. You don't have to bootstrap only one node. You can bootstrap all of them if you provide the full cluster configuration. But at least one node has to be bootstrapped
  2. It will never forget. Clusters are largely static out of necessity. It's impossible to distinguish a crashed node from slow/partitioned node, which is why Copycat aggressively tries to reestablish communication. Often it's still too aggressive, and Atomix 2.0 is actually already addressing that issue
Michael
@mkulak
Jun 24 2017 16:38
@kuujo
  1. So I'll just call replica.bootstrap() on each node with full list of address and they will all end up in a single cluster? Nice, thanks.
  2. The problem with AWS is that EC2 instances come and go (scale up and down), cluster configuration IS in fact very dynamic. So I'd really like cluster to forgot failed nodes after some time. Will version 2.0 have some way to express this?