These are chat archives for atomix/atomix

13th
Dec 2017
Paweł Kamiński
@pawel-kaminski-krk
Dec 13 2017 09:45
hi, I tried to build master and it is not working on jdk9 as there are known dependencies removed. So I reverted to jdk8 but project is not building either, and even though I fixed child-parent dependencies (ex in storage module), test are failing in few modules. My question is which version is stable enough to use it. I made some test with 435f6859589529363b47e65551f6c79fe90bf7a8 (Support for map getAllPresent operation. from 15th Nov) but I can see that current master has many incompatibilities with that version.
Paweł Kamiński
@pawel-kaminski-krk
Dec 13 2017 09:57
Just to clarify, I ve build project on windows machine latest jdk 8 on saturday. now I rerun build on mac and it passed. I will rebuild on other machine later today to see it that may be something with that machine or maybe last commits fixed build :/.
Paweł Kamiński
@pawel-kaminski-krk
Dec 13 2017 18:17
ok, so there is something wrong with windows, maybe I should configure something.
[INFO] --- maven-surefire-plugin:2.20.1:test (default-test) @ atomix-cluster ---
[INFO]
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running io.atomix.cluster.impl.DefaultClusterServiceTest
19:14:58.850 [main] INFO  i.a.c.impl.DefaultClusterService - Started
19:14:58.885 [main] INFO  i.a.c.impl.DefaultClusterService - Started
19:14:58.886 [main] INFO  i.a.c.impl.DefaultClusterService - Started
19:14:59.901 [main] INFO  i.a.c.impl.DefaultClusterService - Started
19:15:02.523 [main] INFO  i.a.c.impl.DefaultClusterService - Stopped
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.84 s <<< FAILURE! - in io.atomix.cluster.impl.DefaultClusterServiceTest
[ERROR] testClusterService(io.atomix.cluster.impl.DefaultClusterServiceTest)  Time elapsed: 7.581 s  <<< FAILURE!
java.lang.AssertionError: expected:<INACTIVE> but was:<ACTIVE>
        at io.atomix.cluster.impl.DefaultClusterServiceTest.testClusterService(DefaultClusterServiceTest.java:153)

[INFO] Running io.atomix.cluster.messaging.impl.DefaultClusterEventServiceTest
19:15:05.028 [main] INFO  i.a.c.impl.DefaultClusterService - Started
19:15:05.043 [main] INFO  i.a.c.m.i.DefaultClusterCommunicationService - Started
19:15:05.075 [main] INFO  i.a.c.m.i.DefaultClusterEventService - Started
19:15:05.075 [main] INFO  i.a.c.impl.DefaultClusterService - Started
19:15:05.075 [main] INFO  i.a.c.m.i.DefaultClusterCommunicationService - Started
19:15:05.075 [main] INFO  i.a.c.m.i.DefaultClusterEventService - Started
19:15:05.075 [main] INFO  i.a.c.impl.DefaultClusterService - Started
19:15:05.075 [main] INFO  i.a.c.m.i.DefaultClusterCommunicationService - Started
19:15:05.075 [main] INFO  i.a.c.m.i.DefaultClusterEventService - Started
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.852 s - in io.atomix.cluster.messaging.impl.DefaultClusterEventServiceTest
[INFO]
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   DefaultClusterServiceTest.testClusterService:153 expected:<INACTIVE> but was:<ACTIVE>
[INFO]
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0
Johno Crawford
@johnou
Dec 13 2017 21:14
@kuujo looks like nagles was never disabled correctly :) atomix/atomix#331
Jordan Halterman
@kuujo
Dec 13 2017 21:15
Ahh awesome
I think this needs to be back ported to ONOS too
ugh there’s a bug in that pom file let me fix it
not entirely sure why tests were even passing before
or you can fix it if you want
Jordan Halterman
@kuujo
Dec 13 2017 21:21
nvm fixed
I’ll wait for this build and merge it then should rebase that PR
Johno Crawford
@johnou
Dec 13 2017 21:35
@kuujo I think because travis caches artifacts locally
so it probably pulled a snapshot jar from the previous build
or not, I think that would involve the following in your travis file
cache:
  timeout: 1000
  directories:
  - $HOME/.m2
Jordan Halterman
@kuujo
Dec 13 2017 22:43
it’s merged you can update and restart that build
the build for Atomix 2.1 is clearly still pretty unreliable
it’s a work in progress… lots of testing to do for the next few weeks
Jordan Halterman
@kuujo
Dec 13 2017 23:38

Should have a PR in for membership changes today though. Then that will be the last of the refactoring work and on to testing.

Membership changes are a pretty complex topic with consensus, and even more so given that Atomix is actually many Raft clusters. We have to have an initial set of fixed nodes with which to bootstrap a cluster to prevent split brain. But beyond that, the approach I took was to write an anti-entropy protocol to replicate membership info. Atomix 2.1 has two types of nodes: data and client. Data nodes store partition data, and clients only communicate with the cluster. When a data node is added, the set of data nodes is replicated using the anti-entropy protocol, and nodes react to changes in the data nodes to rebalance partitions. There's some risk here in that different nodes can see changes in different orders, but so long as data nodes don't evolve incredibly quickly, this is an okay solution for now. In the future, changes to the set of data nodes should probably be non-overlapping so Raft partitions can be completely reconfigured prior to the next change.