These are chat archives for atomix/atomix

8th
Aug 2018
Jordan Halterman
@kuujo
Aug 08 2018 00:01
@aruanruan how frequently would you say you see exceptions on the restart?
Junbo Ruan
@aruanruan
Aug 08 2018 00:01
first time i start the cluster from last error after change limit the length to length <= 0 || length > 1024 * 1024
i started the cluster correct!!!
Jordan Halterman
@kuujo
Aug 08 2018 00:01
cool
will have to put in some limit constants and throw some exceptions for big changes
Johno Crawford
@johnou
Aug 08 2018 00:02
Hard to know what's a valid size though
Junbo Ruan
@aruanruan
Aug 08 2018 00:03
only check & limit , i think
Jordan Halterman
@kuujo
Aug 08 2018 00:03
if there’s a limit then it’s anything between 1 and 1024 * 1024 * 1024
ZooKeeper has a 1MB limit
the Raft protocol won’t work well with large entries anyways
Any entry with a valid size is validated by the checksum
just need to make sure we’re not reading entries we know are invalid based on their size
which just requires imposing a limit
Junbo Ruan
@aruanruan
Aug 08 2018 00:04
how about if i got big rows data ?
Jordan Halterman
@kuujo
Aug 08 2018 00:05
Have to partition them. Raft will not perform well with huge rows anyways. It may never even be able to commit a single large change. Imposing limits forces the user to handle this problem.
Johno Crawford
@johnou
Aug 08 2018 00:06
Something for 3.0 then?
Jordan Halterman
@kuujo
Aug 08 2018 00:06
The Raft protocol implicitly requires persistence and replication to happen within a fixed time frame, which doesn’t lend itself to arbitrary entry sizes
yeah
Junbo Ruan
@aruanruan
Aug 08 2018 00:06
maybe we should check the limit in a range, then validate it by crc32
Jordan Halterman
@kuujo
Aug 08 2018 00:06
yeah that’s effectively what will be done
just by adding the upper limit. We’ll keep the checksum
Junbo Ruan
@aruanruan
Aug 08 2018 00:07
last week, i test the 20M bytes
Jordan Halterman
@kuujo
Aug 08 2018 00:08
It can be configurable
Junbo Ruan
@aruanruan
Aug 08 2018 00:09
ok, perfect!
Jordan Halterman
@kuujo
Aug 08 2018 00:09
:+1:
Junbo Ruan
@aruanruan
Aug 08 2018 00:11
i can not access the new cluster by codes:
protected Map<String, String> addresses() {
        return new HashMap<String, String>() {
            {
                put("node-1", "10.93.85.253:5000");
                put("node-2", "10.93.76.228:5000");
                put("node-3", "10.93.79.103:5000");
            }
        };
    }

protected static AtomixBuilder buildAtomix(Map<String, String> addresses) {
        Collection<Node> nodes = new ArrayList<>();

        addresses.forEach((id, address) -> {
            Node node = Node.builder().withId(String.valueOf(id)).withAddress(Address.from(address)).build();
            nodes.add(node);
        });

        ClusterConfig clusterConfig = new ClusterConfig(); 
        clusterConfig.setClusterId("atomix");
        MemberConfig memberConfig = new MemberConfig();
        memberConfig.setAddress(Address.local());
        clusterConfig.setNodeConfig(memberConfig);
        AtomixConfig config = new AtomixConfig();
        config.setClusterConfig(clusterConfig);

        return Atomix.builder(config).withMembershipProvider(new BootstrapDiscoveryProvider(nodes));
    }
this is a client mode
it reports java.util.concurrent.TimeoutException
from the cluster, i found the new member
08:08:32.538 [atomix-cluster-heartbeat-sender] INFO  i.a.c.i.DefaultClusterMembershipService - node-3 - Member unreachable: Member{id=63e7dce1-2caa-409f-9fbd-f880587cc0b0, address=10.78.10.219:5679, properties={}}
08:08:37.609 [atomix-cluster-heartbeat-sender] INFO  i.a.c.i.DefaultClusterMembershipService - 63e7dce1-2caa-409f-9fbd-f880587cc0b0 - Member reachable: Member{id=63e7dce1-2caa-409f-9fbd-f880587cc0b0, address=10.78.10.219:5679, properties={}}
08:09:09.239 [atomix-cluster-heartbeat-sender] INFO  i.a.c.i.DefaultClusterMembershipService - node-3 - Member unreachable: Member{id=63e7dce1-2caa-409f-9fbd-f880587cc0b0, address=10.78.10.219:5679, properties={}}
Jordan Halterman
@kuujo
Aug 08 2018 00:15
seems like binding to Address.local() will cause connectivity problems from those nodes back to the client
in ONOS we actually look up the first non-localhost interface and use it, but that has all sorts of problems too
can you copy the TimeoutException?
stack trace
bah I gotta finish up this iterator change real quick...
Junbo Ruan
@aruanruan
Aug 08 2018 00:22
java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at com.xxx.dcp.registry2.client.AbstractPrimitiveTest.atomix(AbstractPrimitiveTest.java:73)
at com.xxx.dcp.registry2.client.graph.UdoGraphTest.testAddVertex(UdoGraphTest.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Jordan Halterman
@kuujo
Aug 08 2018 00:23
oh that’s from a future when the client is trying to start?
Junbo Ruan
@aruanruan
Aug 08 2018 00:23
protected Atomix atomix() throws Exception {
        Atomix instance = createAtomix(addresses());
        clients.add(instance);
        instance.start().get(30, TimeUnit.SECONDS);
        return instance;
    }
instance.start().get(30, TimeUnit.SECONDS); <------ this line
Jordan Halterman
@kuujo
Aug 08 2018 00:23
Gotcha… yeah probably from the local Address
Junbo Ruan
@aruanruan
Aug 08 2018 00:24
not yet
protected static AtomixBuilder buildAtomix(Map<String, String> addresses) {
        Collection<Node> nodes = new ArrayList<>();

        addresses.forEach((id, address) -> {
            Node node = Node.builder().withId(String.valueOf(id)).withAddress(Address.from(address)).build();
            nodes.add(node);
        });

        ClusterConfig clusterConfig = new ClusterConfig(); 
        clusterConfig.setClusterId("atomix");
        MemberConfig memberConfig = new MemberConfig();
//        memberConfig.setAddress(Address.local());

        memberConfig.setAddress(Address.from("10.78.10.219", 5689));
        clusterConfig.setNodeConfig(memberConfig);
        AtomixConfig config = new AtomixConfig();
        config.setClusterConfig(clusterConfig);

        return Atomix.builder(config).withMembershipProvider(new BootstrapDiscoveryProvider(nodes));
    }
i change ti local address to real ip
Jordan Halterman
@kuujo
Aug 08 2018 00:25
you have logs from that node?
should be able to see what is not starting
are those other nodes configured in the same way?
with BootstrapDiscoveryProvider?
Junbo Ruan
@aruanruan
Aug 08 2018 00:29
maybe the group name do not match?
RaftClient{system-partition-1} - OpenSessionRequest{node=1f92326c-6e11-46f4-b896-3cebdc3e3082, serviceName=atomix-primary-elector, serviceType=PRIMARY_ELECTOR, readConsistency=SEQUENTIAL, minTimeout=250, maxTimeout=30000} failed! Reason: {}
java.util.concurrent.TimeoutException: Request timed out in 5001 milliseconds
my config:

# A management group using the Raft consensus protocol.
management-group {
  type: raft
  partitions: 1
  members: [node-1, node-2, node-3]
}

# A primary-backup based management group.
#management-group {
#  type: primary-backup
#  partitions: 1
#}

# A consensus partition group.
partition-groups.consensus {
  type: raft
  partitions: 7
  members: [node-1, node-2, node-3]
}

# A primary-backup (data grid) partition group.
partition-groups.data {
  type: primary-backup
  partitions: 32
}
Jordan Halterman
@kuujo
Aug 08 2018 00:31
what about the cluster
seems fine
Junbo Ruan
@aruanruan
Aug 08 2018 00:32
# The cluster configuration
cluster {
  cluster-id: atomix
  node.id: node-3
  node.address: "10.93.79.103:5000"
}


# This is a bootstrap based discovery configuration. When bootstrap discovery is used, the node connects to
# each node in the provided set for service discovery.
cluster.discovery {
  type: bootstrap
  nodes.1 {
    id: node-1
    address: "10.93.85.253:5000"
 }
  nodes.2 {
    id: node-2
    address: "10.93.76.228:5000"
  }
  nodes.3 {
    id: node-3
    address: "10.93.79.103:5000"
 }
}

# A management group using the Raft consensus protocol.
management-group {
  type: raft
  partitions: 1
  members: [node-1, node-2, node-3]
}

# A primary-backup based management group.
#management-group {
#  type: primary-backup
#  partitions: 1
#}

# A consensus partition group.
partition-groups.consensus {
  type: raft
  partitions: 7
  members: [node-1, node-2, node-3]
}

# A primary-backup (data grid) partition group.
partition-groups.data {
  type: primary-backup
  partitions: 32
}
Jordan Halterman
@kuujo
Aug 08 2018 00:32
do you have Docker installed?
Junbo Ruan
@aruanruan
Aug 08 2018 00:32
no, just vm
Jordan Halterman
@kuujo
Aug 08 2018 00:33
lemme try something
➜  atomix-test git:(master) ✗ atomix-test cluster foo setup strong-data-grid -n 3
2018-08-07 17:33:39,980 Setting up cluster
2018-08-07 17:33:40,031 Creating network
2018-08-07 17:33:40,275 Running container foo-1
2018-08-07 17:33:41,208 Running container foo-2
2018-08-07 17:33:41,886 Running container foo-3
2018-08-07 17:33:42,644 Waiting for cluster bootstrap
➜  atomix-test git:(master) ✗ atomix-test cluster foo add-node client
2018-08-07 17:35:05,218 Adding a node to the cluster
2018-08-07 17:35:05,247 Running container foo-4
➜  atomix-test git:(master) ✗ atomix-test cluster foo list
 NAME     NODES     PATH                                                      NETWORK     SUBNET            GATEWAY        CPUS     MEMORY     profiler
 foo      4         /Users/jordanhalterman/Projects/atomix-test/.data/foo     foo         172.18.0.0/24     172.18.0.1              0          False
➜  atomix-test git:(master) ✗ atomix-test cluster foo nodes
 ID     NAME      STATUS      PATH                                                            IP             LOCAL PORT     PROFILER PORT
 1      foo-1     running     /Users/jordanhalterman/Projects/atomix-test/.data/foo/foo-1     172.18.0.2     64254
 2      foo-2     running     /Users/jordanhalterman/Projects/atomix-test/.data/foo/foo-2     172.18.0.3     64255
 3      foo-3     running     /Users/jordanhalterman/Projects/atomix-test/.data/foo/foo-3     172.18.0.4     64256
 4      foo-4     running     /Users/jordanhalterman/Projects/atomix-test/.data/foo/foo-4     172.18.0.5     64309
question is what’s different here
➜  atomix-test git:(master) ✗ curl -XGET http://localhost:64254/v1/cluster/nodes
[{"id":"foo-1","host":"foo-1","port":5679},{"id":"foo-3","host":"172.18.0.4","port":5679},{"id":"foo-2","host":"172.18.0.3","port":5679},{"id":"foo-4","host":"172.18.0.5","port":5679}]
Jordan Halterman
@kuujo
Aug 08 2018 00:38
here are the configurations for the strong-data-grid and client nodes
➜  atomix-test git:(master) ✗ cat /Users/jordanhalterman/Projects/atomix-test/.data/foo/foo-1/strong-data-grid.conf
# Consensus based data grid configuration
cluster {
  cluster-id: ${CLUSTER_ID}
  node.id: ${NODE_ID}
  node.address: ${NODE_ADDRESS}
  discovery: {
    type: bootstrap
    nodes.1 {
      id: foo-1
      address: "foo-1:5679"
    }
    nodes.2 {
      id: foo-2
      address: "foo-2:5679"
    }
    nodes.3 {
      id: foo-3
      address: "foo-3:5679"
    }
  }
}

management-group {
  type: raft
  partitions: 1
  data-directory: ${DATA_DIR}/system
  members: [
    foo-1
    foo-2
    foo-3
  ]
}

partition-groups.data {
  type: primary-backup
  partitions: 32
}
➜  atomix-test git:(master) ✗ cat /Users/jordanhalterman/Projects/atomix-test/.data/foo/foo-4/client.conf
# Client node configuration
cluster {
  cluster-id: ${CLUSTER_ID}
  node.id: ${NODE_ID}
  node.address: ${NODE_ADDRESS}
  discovery: {
    type: bootstrap
    nodes.1 {
      id: foo-1
      address: "foo-1:5679"
    }
    nodes.2 {
      id: foo-2
      address: "foo-2:5679"
    }
    nodes.3 {
      id: foo-3
      address: "foo-3:5679"
    }
  }
}
Hmm
Junbo Ruan
@aruanruan
Aug 08 2018 00:40
i try something else
Jordan Halterman
@kuujo
Aug 08 2018 00:40
man the atomix-test project is awesomely useful ATM
atomix-test cluster foo setup strong-data-grid -n 3
atomix-test cluster foo add-node client
Jordan Halterman
@kuujo
Aug 08 2018 00:45
turns out to be great for testing configurations, but it uses the Docker agent to set up the clusters programmatically
Junbo Ruan
@aruanruan
Aug 08 2018 00:46
enn
Jordan Halterman
@kuujo
Aug 08 2018 01:50
gonna go play with my son for a bit. Be back in a couple hours
Junbo Ruan
@aruanruan
Aug 08 2018 01:51
good play
Junbo Ruan
@aruanruan
Aug 08 2018 02:30
once more "OutOfMemoryError"
Exception in thread "main" java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:991)
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at io.atomix.primitive.partition.impl.DefaultPartitionService.lambda$2(DefaultPartitionService.java:150)
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981)
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2124)
        at io.atomix.primitive.partition.impl.DefaultPartitionService.start(DefaultPartitionService.java:138)
        at io.atomix.core.Atomix.lambda$6(Atomix.java:733)
        at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at io.atomix.storage.buffer.HeapBytes.newByteBuffer(HeapBytes.java:60)
        at io.atomix.storage.buffer.ByteBufferBytes.resize(ByteBufferBytes.java:50)
        at io.atomix.storage.buffer.AbstractBuffer.capacity(AbstractBuffer.java:209)
        at io.atomix.storage.buffer.AbstractBuffer.position(AbstractBuffer.java:234)
        at io.atomix.storage.buffer.AbstractBuffer.read(AbstractBuffer.java:452)
        at io.atomix.storage.journal.JournalSegmentWriter.reset(JournalSegmentWriter.java:87)
        at io.atomix.storage.journal.JournalSegmentWriter.<init>(JournalSegmentWriter.java:61)
        at io.atomix.storage.journal.JournalSegment.<init>(JournalSegment.java:47)
        at io.atomix.storage.journal.SegmentedJournal.newSegment(SegmentedJournal.java:378)
        at io.atomix.storage.journal.SegmentedJournal.loadDiskSegment(SegmentedJournal.java:440)
        at io.atomix.storage.journal.SegmentedJournal.loadSegment(SegmentedJournal.java:427)
        at io.atomix.storage.journal.SegmentedJournal.loadSegments(SegmentedJournal.java:489)
        at io.atomix.storage.journal.SegmentedJournal.open(SegmentedJournal.java:171)
Jordan Halterman
@kuujo
Aug 08 2018 02:34
Same problem
Junbo Ruan
@aruanruan
Aug 08 2018 02:36
when execute open session
some times
java.util.ConcurrentModificationException: null
        at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) ~[na:1.8.0_102]
        at java.util.ArrayList$Itr.remove(ArrayList.java:865) ~[na:1.8.0_102]
        at io.atomix.primitive.service.impl.DefaultServiceExecutor.tick(DefaultServiceExecutor.java:110) ~[atomix-primitive-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.primitive.service.AbstractPrimitiveService.tick(AbstractPrimitiveService.java:107) ~[atomix-primitive-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.service.RaftServiceContext.tick(RaftServiceContext.java:181) ~[atomix-raft-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.service.RaftServiceContext.openSession(RaftServiceContext.java:294) ~[atomix-raft-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.impl.RaftServiceManager.applyOpenSession(RaftServiceManager.java:691) ~[atomix-raft-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.impl.RaftServiceManager.lambda$apply$12(RaftServiceManager.java:420) ~[atomix-raft-3.0.0-SNAPSHOT.jar:na]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_102]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_102]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_102]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_102]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_102]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_102]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Jordan Halterman
@kuujo
Aug 08 2018 03:42
Never seen that one. It’s a thread safety issue. What’s the name of the thread in which the exception is thrown?
And this is a Raft session?
Junbo Ruan
@aruanruan
Aug 08 2018 03:44
10:17:07.955 [raft-server-system-partition-1] INFO  i.a.protocols.raft.roles.LeaderRole - RaftServer{system-partition-1}{role=LEADER} - onOpenSession OpenSessionRequest{node=3a2c2bd2-885e-4d2c-a8f7-eb7b33673cdf, serviceName=atomix-primary-elector, serviceType=PRIMARY_ELECTOR, readConsistency=SEQUENTIAL, minTimeout=250, maxTimeout=30000}
10:17:07.956 [raft-server-system-partition-1] INFO  i.a.protocols.raft.roles.LeaderRole - RaftServer{system-partition-1}{role=LEADER} - onOpenSession ok OpenSessionRequest{node=3a2c2bd2-885e-4d2c-a8f7-eb7b33673cdf, serviceName=atomix-primary-elector, serviceType=PRIMARY_ELECTOR, readConsistency=SEQUENTIAL, minTimeout=250, maxTimeout=30000}
10:17:07.972 [raft-server-system-partition-1-state] INFO  i.a.protocols.raft.roles.LeaderRole - RaftServer{system-partition-1}{role=LEADER} - onOpenSession OpenSessionRequest{node=3a2c2bd2-885e-4d2c-a8f7-eb7b33673cdf, serviceName=atomix-primary-elector, serviceType=PRIMARY_ELECTOR, readConsistency=SEQUENTIAL, minTimeout=250, maxTimeout=30000} apply exception: {}
java.util.ConcurrentModificationException: null
        at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) ~[na:1.8.0_102]
        at java.util.ArrayList$Itr.remove(ArrayList.java:865) ~[na:1.8.0_102]
        at io.atomix.primitive.service.impl.DefaultServiceExecutor.tick(DefaultServiceExecutor.java:110) ~[atomix-primitive-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.primitive.service.AbstractPrimitiveService.tick(AbstractPrimitiveService.java:107) ~[atomix-primitive-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.service.RaftServiceContext.tick(RaftServiceContext.java:181) ~[atomix-raft-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.service.RaftServiceContext.openSession(RaftServiceContext.java:294) ~[atomix-raft-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.impl.RaftServiceManager.applyOpenSession(RaftServiceManager.java:691) ~[atomix-raft-3.0.0-SNAPSHOT.jar:na]
        at io.atomix.protocols.raft.impl.RaftServiceManager.lambda$apply$12(RaftServiceManager.java:420) ~[atomix-raft-3.0.0-SNAPSHOT.jar:na]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_102]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_102]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_102]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_102]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_102]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_102]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
Jordan Halterman
@kuujo
Aug 08 2018 03:45
nvm of course it is
Junbo Ruan
@aruanruan
Aug 08 2018 03:47
sometimes i press ctrl+z, the atomix-agent can not exit completely
it stay here
11:42:12.162 [raft-client-system-partition-1-10] INFO  i.a.p.r.s.impl.RaftSessionManager - RaftClient{system-partition-1} - Closing session 2184
11:42:12.187 [raft-client-system-partition-1-4] INFO  i.a.core.impl.CorePrimitivesService - Stopped
11:42:12.191 [raft-server-system-partition-1] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{system-partition-1} - Transitioning to INACTIVE
11:42:12.200 [ForkJoinPool.commonPool-worker-1] INFO  i.a.p.r.partition.RaftPartitionGroup - Stopped
11:42:12.207 [raft-server-consensus-partition-2] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-2} - Transitioning to INACTIVE
11:42:12.209 [raft-server-consensus-partition-5] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-5} - Transitioning to INACTIVE
11:42:12.214 [raft-server-consensus-partition-4] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-4} - Transitioning to INACTIVE
11:42:12.214 [raft-server-consensus-partition-3] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-3} - Transitioning to INACTIVE
11:42:12.215 [raft-server-consensus-partition-1] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-1} - Transitioning to INACTIVE
11:42:12.219 [raft-server-consensus-partition-6] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-6} - Transitioning to INACTIVE
11:42:12.222 [netty-messaging-event-epoll-client-14] WARN  i.a.c.m.impl.NettyMessagingService - Unable to dispatch message due to Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@18967220 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@2bbaf999[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 3372530]
11:42:12.222 [raft-server-consensus-partition-7] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-7} - Transitioning to INACTIVE
11:42:12.230 [ForkJoinPool.commonPool-worker-2] INFO  i.a.p.r.partition.RaftPartitionGroup - Stopped
when i kill it
Jordan Halterman
@kuujo
Aug 08 2018 03:48
It’s not a bug in openSession. Something else is accessing that list
Junbo Ruan
@aruanruan
Aug 08 2018 03:48
./bin/atomix-agent: line 26: 108064 Killed                  java -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Datomix.root="$ATOMIX_ROOT" -Datomix.data="$ATOMIX_ROOT/data" -Datomix.log="$ATOMIX_ROOT/log" -Datomix.config="$ATOMIX_ROOT/conf/atomix.conf" -Dlogback.configurationFile="$ATOMIX_ROOT/conf/logback.xml" -cp .:"$ATOMIX_ROOT/lib/system/*":"$ATOMIX_ROOT/lib/ext/*" io.atomix.agent.AtomixAgent --http-port 8080 "$@"
[root@SZX1000441343 atomix]# 11:42:12.162 [raft-client-system-partition-1-10] INFO  i.a.p.r.s.impl.RaftSessionManager - RaftClient{system-partition-1} - Closing session 2184
bash: 11:42:12.162: command not found...
[root@SZX1000441343 atomix]# 11:42:12.187 [raft-client-system-partition-1-4] INFO  i.a.core.impl.CorePrimitivesService - Stopped
bash: 11:42:12.187: command not found...
[root@SZX1000441343 atomix]# 11:42:12.191 [raft-server-system-partition-1] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{system-partition-1} - Transitioning to INACTIVE
bash: 11:42:12.191: command not found...
[root@SZX1000441343 atomix]# 11:42:12.200 [ForkJoinPool.commonPool-worker-1] INFO  i.a.p.r.partition.RaftPartitionGroup - Stopped
bash: 11:42:12.200: command not found...
[root@SZX1000441343 atomix]# 11:42:12.207 [raft-server-consensus-partition-2] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-2} - Transitioning to INACTIVE
bash: 11:42:12.207: command not found...
[root@SZX1000441343 atomix]# 11:42:12.209 [raft-server-consensus-partition-5] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-5} - Transitioning to INACTIVE
bash: 11:42:12.209: command not found...
[root@SZX1000441343 atomix]# 11:42:12.214 [raft-server-consensus-partition-4] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-4} - Transitioning to INACTIVE
bash: 11:42:12.214: command not found...
[root@SZX1000441343 atomix]# 11:42:12.214 [raft-server-consensus-partition-3] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-3} - Transitioning to INACTIVE
bash: 11:42:12.214: command not found...
[root@SZX1000441343 atomix]# 11:42:12.215 [raft-server-consensus-partition-1] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-1} - Transitioning to INACTIVE
bash: 11:42:12.215: command not found...
[root@SZX1000441343 atomix]# 11:42:12.219 [raft-server-consensus-partition-6] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-6} - Transitioning to INACTIVE
bash: 11:42:12.219: command not found...
[root@SZX1000441343 atomix]# 11:42:12.222 [netty-messaging-event-epoll-client-14] WARN  i.a.c.m.impl.NettyMessagingService - Unable to dispatch message due to Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@18967220 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@2bbaf999[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 3372530]
bash: 11:42:12.222: command not found...
[root@SZX1000441343 atomix]# 11:42:12.222 [raft-server-consensus-partition-7] INFO  i.a.protocols.raft.impl.RaftContext - RaftServer{consensus-partition-7} - Transitioning to INACTIVE
bash: 11:42:12.222: command not found...
[root@SZX1000441343 atomix]# 11:42:12.230 [ForkJoinPool.commonPool-worker-2] INFO  i.a.p.r.partition.RaftPartitionGroup - Stopped
bash: 11:42:12.230: command not found...
Jordan Halterman
@kuujo
Aug 08 2018 03:49
No idea what command not found is. The node is hanging because it’s trying to shutdown gracefully
So what about the thread?
Junbo Ruan
@aruanruan
Aug 08 2018 03:50
i will monitor the thread next time
command not found maybe my "ctrl+z" more times
Johno Crawford
@johnou
Aug 08 2018 06:59
@aruanruan did you get a thread dump
Jordan Halterman
@kuujo
Aug 08 2018 07:51
ctrl+z is not really the right way to exit a process
it just sends SIGTSTP to the process
Johno Crawford
@johnou
Aug 08 2018 07:56
indeed, no shutdown hooks would be invoked
Jordan Halterman
@kuujo
Aug 08 2018 07:56
hmm
I’m not finding what other thread could be modifying that list
Gotta be somewhere
Johno Crawford
@johnou
Aug 08 2018 07:57
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) ~[na:1.8.0_102]
at java.util.ArrayList$Itr.remove(ArrayList.java:865) ~[na:1.8.0_102]
at io.atomix.primitive.service.impl.DefaultServiceExecutor.tick(DefaultServiceExecutor.java:110) ~[atomix-primitive-3.0.0-SNAPSHOT.jar:na]
wait isn't that the same one i looked into
pretty sure I saw a race
two different stacktraces
aren't you lucky
could they be both places where it's modified from
Jordan Halterman
@kuujo
Aug 08 2018 07:58
they’re the same
just different log entries being applied
Johno Crawford
@johnou
Aug 08 2018 07:59
io.atomix.primitive.service.impl.DefaultServiceExecutor#scheduledTasks
Jordan Halterman
@kuujo
Aug 08 2018 07:59
applyCommand, applyOpenSession, just different method calls on the same thread
what about it
Johno Crawford
@johnou
Aug 08 2018 07:59
so that's the list
just pasting code pointers
io.atomix.primitive.service.impl.DefaultServiceExecutor.ScheduledTask#schedule modifies the list
io.atomix.primitive.service.impl.DefaultServiceExecutor.ScheduledTask#cancel
modifies the list
which is where i think the problem was
and in tick
Jordan Halterman
@kuujo
Aug 08 2018 08:01
there’s no code that cancels a Scheduled object inside of a timer callback
Johno Crawford
@johnou
Aug 08 2018 08:14
hm
looking at it now maybe I searched for usage of base method
then got confused
Jordan Halterman
@kuujo
Aug 08 2018 08:15
it’s strange
I went through all the usages of timers in all the primitive services
none of them should cause this, but I don’t really see what else could
@aruanruan do you have a custom primitive?
Junbo Ruan
@aruanruan
Aug 08 2018 08:18
@kuujo yeah
Jordan Halterman
@kuujo
Aug 08 2018 08:18
it uses timers?
Junbo Ruan
@aruanruan
Aug 08 2018 08:18
but i can not connect it from client
Jordan Halterman
@kuujo
Aug 08 2018 08:18
can you paste the timer code?
Junbo Ruan
@aruanruan
Aug 08 2018 08:19
ok
    @SuppressWarnings({ "unchecked", "rawtypes" })
    @Override
    public UdoResult<String> addVertex(DefaultUdoGraphVertex node, int feature, Map<String, Object> options) {
        if (node == null) {
            return UdoResult.NOOP;
        }
        try {
            Versioned<Vertex> old = dataCache.getVertexIfPresent(node.getId());
            Versioned<Vertex> value = dataCache.addVertex(node);
            Vertex n = value.value();
            n.removeAllProps();
            feature &= GraphFeatureUtils.FEATURE_MASK;
            if (feature != 0) {
                n.setProperty(GraphFeatureUtils.FEATURES, feature);
            }

            // cancel old timer
            timers.computeIfPresent(n.id(), (k, v) -> {
                v.cancel();
                return null;
            });
            if ((feature & GraphFeature.HEALTHY.value()) != 0) {
                SimplePropertyAccessor props = new SimplePropertyAccessor();
                props.setProps(options);

                int timeout = props.getPropertyInt(GraphFeatureUtils.HEALTHY_EXPIRE_TIMEOUT,
                        GraphFeatureUtils.DEFAULT_HEALTHY_EXPIRE_TIMEOUT);
                String statusKey = props.getPropertyString(GraphFeatureUtils.HEALTHY_STATUS_KEY,
                        GraphFeatureUtils.DEFAULT_HEALTHY_STATUS_KEY);
                int unhealthy = props.getPropertyInt(GraphFeatureUtils.HEALTHY_UNHEALTHY_VALUE,
                        GraphFeatureUtils.DEFAULT_HEALTHY_UNHEALTHY_VALUE);
                int status = props.getPropertyInt(GraphFeatureUtils.HEALTHY_CURRENT_VALUE,
                        GraphFeatureUtils.DEFAULT_HEALTHY_CURRENT_VALUE);

                String updateStatusExpr = props.getPropertyString(GraphFeatureUtils.HEALTHY_UPDATE_STATUS_EXPR, "$ != " + unhealthy);

                n.setPropertyInt(GraphFeatureUtils.HEALTHY_EXPIRE_TIMEOUT, timeout);
                n.setPropertyInt(GraphFeatureUtils.HEALTHY_UNHEALTHY_VALUE, unhealthy);
                n.setPropertyString(GraphFeatureUtils.HEALTHY_STATUS_KEY, statusKey);
                n.setPropertyInt(statusKey, status);
                n.setPropertyString(GraphFeatureUtils.HEALTHY_UPDATE_STATUS_EXPR, updateStatusExpr);

                GraphFeatureUtils.touch(n);
                if (timeout > 0 && GraphFeatureUtils.needUnhealthyChecking(n)) {    
                    // track the status
//                    log.info("create timer for: {} with {} seconds", n.id(), timeout);
                    timers.put(n.id(), getScheduler().schedule(Duration.ofMillis(timeout * 1000),
                            new HealthyRunnable(n.id(), this::onNodeHealthyTimeout)));                     <------------- here

                }
            }
            // log.debug("{} -> {}", value, val);
            if (old == null) {
                // create
                notifyListeners(new UdoGraphEvent(EventType.CREATED, value.value().id(), GraphElementType.VERTEX,
                        Optional.of(dataCache.toVersionedUdoGraphVertex(value)), Optional.empty()));
            } else {
                // update
                notifyListeners(new UdoGraphEvent(EventType.UPDATED, value.value().id(), GraphElementType.VERTEX,
                        Optional.of(dataCache.toVersionedUdoGraphVertex(value)),
                        Optional.of(dataCache.toVersionedUdoGraphVertex(old))));
            }
            this.syncSaveNode(value);
            return UdoResult.ok(value.value().id());
        } catch (Exception e) {
            return UdoResult.catchException(e);
        }
    }
Jordan Halterman
@kuujo
Aug 08 2018 08:21
You mean you can’t create the primitive inside a cluster of agents? You’d have to include the jar with the primitive code in the agent’s lib/ext directory to ensure it can create the primitive services
Junbo Ruan
@aruanruan
Aug 08 2018 08:22
i can not use AtomicMap either
Jordan Halterman
@kuujo
Aug 08 2018 08:22
hmm
what do you mean?
Junbo Ruan
@aruanruan
Aug 08 2018 08:22
AsyncAtomicMap<Object, Object> map = atomix().atomicMapBuilder(UUID.randomUUID().toString()).withProtocol(this.protocol()).build().async();
        map.put("hell", "test").get(30, TimeUnit.SECONDS);
it does not work
Jordan Halterman
@kuujo
Aug 08 2018 08:23
but the client starts successfully?
Junbo Ruan
@aruanruan
Aug 08 2018 08:23
always timeout
i can see client join the cluster
Jordan Halterman
@kuujo
Aug 08 2018 08:23
well if the client didn’t start successfully then the map isn’t going to work
Junbo Ruan
@aruanruan
Aug 08 2018 08:23
and see the log OpenSession
Jordan Halterman
@kuujo
Aug 08 2018 08:23
meaning if Atomix#start() never completed
Johno Crawford
@johnou
Aug 08 2018 08:24
@kuujo wasn't there a bug where atomix tried to assign a partition to a client
Junbo Ruan
@aruanruan
Aug 08 2018 08:24
yes!
Jordan Halterman
@kuujo
Aug 08 2018 08:24
no
it was a bug in my test
Johno Crawford
@johnou
Aug 08 2018 08:24
ah
Jordan Halterman
@kuujo
Aug 08 2018 08:25
nothing weird about the timers there
I see nothing that should cause concurrent modification anywhere
it’s gotta be there somewhere
@aruanruan the client did complete start()?
Junbo Ruan
@aruanruan
Aug 08 2018 08:28
yesterday, i use rc6 version, my client can connect cluster, today i upgrade to rc7, my client can not connect the cluster any more
@kuujo yes, always timeout
Jordan Halterman
@kuujo
Aug 08 2018 08:29
should probably just get the DEBUG logs and share them
Junbo Ruan
@aruanruan
Aug 08 2018 08:30
in client?
Jordan Halterman
@kuujo
Aug 08 2018 08:30
there wasn’t really anything new in rc7 that should affect startup except for some retries which should make it more not less likely to succeed
yeah
hmm
actually atomix-test was using an old image need to try to reproduce again with the new image
maybe it’s reproducible
hmm no luck works fine with the rc7 image
Junbo Ruan
@aruanruan
Aug 08 2018 08:34
16:31:29.749 [atomix-bootstrap-heartbeat-receiver] INFO  i.a.c.d.BootstrapDiscoveryProvider - Joined
16:31:29.749 [atomix-bootstrap-heartbeat-receiver] INFO  i.a.c.i.DefaultClusterMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Member activated: Member{id=f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7, address=xxx.com:5689, properties={}}
16:31:29.749 [atomix-bootstrap-heartbeat-receiver] INFO  i.a.c.i.DefaultClusterMembershipService - Started
16:31:29.749 [atomix-cluster-0] INFO  i.a.c.m.i.DefaultClusterCommunicationService - Started
16:31:29.749 [atomix-cluster-heartbeat-sender] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.85.253:5000
16:31:29.749 [atomix-cluster-0] INFO  i.a.c.m.i.DefaultClusterEventService - Started
16:31:29.749 [atomix-cluster-heartbeat-sender] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.76.228:5000
16:31:29.749 [atomix-cluster-heartbeat-sender] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.79.103:5000
16:31:29.749 [atomix-0] DEBUG i.a.p.p.i.DefaultPartitionGroupMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Bootstrapping from member Member{id=node-1, address=10.93.85.253:5000, properties={}}
16:31:29.749 [atomix-0] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.85.253:5000
16:31:29.764 [atomix-0] DEBUG i.a.p.p.i.DefaultPartitionGroupMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Bootstrapping from member Member{id=node-2, address=10.93.76.228:5000, properties={}}
16:31:29.764 [atomix-0] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.76.228:5000
16:31:29.764 [atomix-0] DEBUG i.a.p.p.i.DefaultPartitionGroupMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Bootstrapping from member Member{id=node-3, address=10.93.79.103:5000, properties={}}
16:31:29.764 [atomix-0] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.79.103:5000
16:31:29.998 [atomix-cluster-heartbeat-sender] INFO  i.a.c.i.DefaultClusterMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Member unreachable: Member{id=node-3, address=10.93.79.103:5000, properties={}}
16:31:29.998 [atomix-cluster-heartbeat-sender] INFO  i.a.c.i.DefaultClusterMembershipService - node-3 - Member reachable: Member{id=node-3, address=10.93.79.103:5000, properties={}}
16:31:30.484 [atomix-partition-group-membership-service-0] DEBUG i.a.p.p.i.DefaultPartitionGroupMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Bootstrapped system group PartitionGroupMembership{group=system, members=[node-1, node-2, node-3]} from node-2
16:31:30.484 [atomix-partition-group-membership-service-0] DEBUG i.a.p.p.i.DefaultPartitionGroupMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Bootstrapped partition group PartitionGroupMembership{group=data, members=[node-1, node-2, node-3]} from node-2
16:31:30.484 [atomix-partition-group-membership-service-0] DEBUG i.a.p.p.i.DefaultPartitionGroupMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Bootstrapped partition group PartitionGroupMembership{group=consensus, members=[node-1, node-2, node-3]} from node-2
16:31:30.757 [atomix-cluster-event-executor-0] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.76.228:5000
16:31:31.760 [atomix-cluster-event-executor-0] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.79.103:5000
16:31:34.518 [atomix-partition-group-membership-service-0] INFO  i.a.p.p.i.DefaultPartitionGroupMembershipService - Started
16:31:34.519 [atomix-partition-group-membership-service-0] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.primitive.partition.GroupMember already registered as 502. Skipping 553.
16:31:34.519 [atomix-partition-group-membership-service-0] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.cluster.MemberId already registered as 503. Skipping 554.
16:31:34.569 [atomix-partition-group-membership-service-0] DEBUG io.atomix.utils.serializer.Namespace - RaftProtocol: class java.util.Collections$EmptyList already registered as 41. Skipping 553.
16:31:34.569 [atomix-partition-group-membership-service-0] DEBUG io.atomix.utils.serializer.Namespace - RaftProtocol: class java.util.HashSet already registered as 29. Skipping 554.
16:31:34.569 [atomix-partition-group-membership-service-0] DEBUG io.atomix.utils.serializer.Namespace - RaftStorage: class java.util.ArrayList already registered as 27. Skipping 612.
16:31:34.569 [atomix-partition-group-membership-service-0] DEBUG io.atomix.utils.serializer.Namespace - RaftStorage: class java.util.HashSet already registered as 29. Skipping 613.
16:31:34.585 [atomix-partition-group-membership-service-0] DEBUG i.a.p.r.p.impl.RaftPartitionClient - Successfully started client for partition PartitionId{id=1, group=system}
16:31:34.585 [atomix-partition-group-membership-service-0] INFO  i.a.p.r.partition.RaftPartitionGroup - Started
16:31:34.585 [atomix-partition-group-membership-service-0] DEBUG i.a.p.s.impl.RecoveringSessionClient - SessionClient{system-partition-1} - Opening proxy session
16:31:34.585 [atomix-partition-group-membership-service-0] DEBUG i.a.p.r.s.impl.RaftSessionManager - RaftClient{system-partition-1} - Opening session; name: atomix-primary-elector, type: PrimaryElectorType{name=PRIMARY_ELECTOR}
16:31:35.756 [atomix-cluster-event-executor-0] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.85.253:5000
16:31:39.136 [raft-client-system-partition-1-0] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.79.103:5000
16:31:39.152 [raft-client-system-partition-1-2] DEBUG i.a.p.s.impl.RecoveringSessionClient - SessionClient{192}{type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector} - State changed: CONNECTED
16:31:39.153 [raft-client-system-partition-1-3] DEBUG i.a.p.s.impl.RecoveringSessionClient - SessionClient{system-partition-1} - Opening proxy session
16:31:39.153 [raft-client-system-partition-1-3] DEBUG i.a.p.r.s.impl.RaftSessionManager - RaftClient{system-partition-1} - Opening session; name: session-id, type: SessionIdGeneratorType{name=SESSION_ID_GENERATOR}
16:31:39.184 [raft-client-system-partition-1-2] DEBUG i.a.p.s.impl.RecoveringSessionClient - SessionClient{193}{type=SessionIdGeneratorType{name=SESSION_ID_GENERATOR}, name=session-id} - State changed: CONNECTED
16:31:39.184 [raft-client-system-partition-1-6] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.primitive.partition.GroupMember already registered as 502. Skipping 553.
16:31:39.199 [raft-client-system-partition-1-6] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.cluster.MemberId already registered as 503. Skipping 554.
16:31:39.199 [raft-client-system-partition-1-6] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.primitive.partition.MemberGroupId already registered as 504. Skipping 555.
16:31:39.199 [raft-client-system-partition-1-6] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.primitive.partition.PartitionId already registered as 505. Skipping 556.
16:31:39.199 [raft-client-system-partition-1-6] DEBUG i.a.p.b.p.i.PrimaryBackupPartitionClient - Successfully started client for PartitionId{id=29, group=data}
16:31:39.215 [raft-client-system-partition-1-6] DEBUG i.a.p.b.p.i.PrimaryBackupPartitionClient - Successfully started client for PartitionId{id=31, group=data}
16:31:39.215 [raft-client-system-partition-1-6] DEBUG i.a.p.b.p.i.PrimaryBackupPartitionClient - Successfully started client for PartitionId{id=2, group=data}
16:31:39.215 [raft-client-system-partition-1-6] DEBUG i.a.p.b.p.i.PrimaryBackupPartitionClient - Successfully started client for PartitionId{id=4, group=data}
16:31:39.215 [raft-client-system-partition-1-6] DEBUG i.a.p.b.p.i.PrimaryBackupPartitionClient - Successfully started client for PartitionId{id=6, group=data}
16:31:39.215 [raft-client-consensus-partition-4-0] DEBUG i.a.p.r.p.impl.RaftPartitionClient - Successfully started client for partition PartitionId{id=4, group=consensus}
16:31:39.215 [raft-client-consensus-partition-3-0] DEBUG i.a.p.r.p.impl.RaftPartitionClient - Successfully started client for partition PartitionId{id=3, group=consensus}
16:31:39.215 [raft-client-consensus-partition-2-0] DEBUG i.a.p.r.p.impl.RaftPartitionClient - Successfully started client for partition PartitionId{id=2, group=consensus}
16:31:39.215 [raft-client-consensus-partition-1-0] DEBUG i.a.p.r.p.impl.RaftPartitionClient - Successfully started client for partition PartitionId{id=1, group=consensus}
16:31:39.215 [raft-client-consensus-partition-6-0] DEBUG i.a.p.r.p.impl.RaftPartitionClient - Successfully started client for partition PartitionId{id=6, group=consensus}
16:31:39.215 [raft-client-consensus-partition-7-0] DEBUG i.a.p.r.p.impl.RaftPartitionClient - Successfully started client for partition PartitionId{id=7, group=consensus}
16:31:39.215 [raft-client-consensus-partition-5-0] DEBUG i.a.p.r.p.impl.RaftPartitionClient - Successfully started client for partition PartitionId{id=5, group=consensus}
16:31:39.215 [raft-client-consensus-partition-5-0] INFO  i.a.p.r.partition.RaftPartitionGroup - Started
16:31:39.215 [raft-client-consensus-partition-5-0] INFO  i.a.p.p.impl.DefaultPartitionService - Started
16:31:39.231 [atomix-0] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:39.231 [atomix-0] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:39.231 [atomix-0] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:39.232 [atomix-0] DEBUG i.a.p.s.impl.RecoveringSessionClient - SessionClient{system-partition-1} - Opening proxy session
16:31:39.232 [atomix-0] DEBUG i.a.p.r.s.impl.RaftSessionManager - RaftClient{system-partition-1} - Opening session; name: primitives, type: AtomicMapType{name=atomic-map}
16:31:39.232 [atomix-0] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:39.232 [atomix-0] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:39.232 [atomix-0] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:39.278 [raft-client-system-partition-1-13] DEBUG i.a.p.s.impl.RecoveringSessionClient - SessionClient{196}{type=AtomicMapType{name=atomic-map}, name=primitives} - State changed: CONNECTED
16:31:39.278 [raft-client-system-partition-1-13] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:39.294 [raft-client-system-partition-1-13] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:39.294 [raft-client-system-partition-1-13] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:39.294 [raft-client-system-partition-1-9] DEBUG i.a.c.m.impl.NettyMessagingService - Established a new connection to 10.93.76.228:5000
16:31:43.947 [raft-client-system-partition-1-3] DEBUG i.a.p.s.impl.RecoveringSessionClient - SessionClient{system-partition-1} - Opening proxy session
16:31:43.947 [raft-client-system-partition-1-3] DEBUG i.a.p.r.s.impl.RaftSessionManager - RaftClient{system-partition-1} - Opening session; name: atomix-transactions, type: AtomicMapType{name=atomic-map}
16:31:43.947 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.947 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.947 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.948 [raft-client-system-partition-1-3] INFO  i.a.core.impl.CoreTransactionService - Started
16:31:43.948 [raft-client-system-partition-1-3] INFO  i.a.core.impl.CorePrimitivesService - Started
16:31:43.948 [atomix-cluster-0] INFO  io.atomix.cluster.AtomixCluster - Started
16:31:43.948 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.948 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.948 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.964 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - Ignoring Namespace{friendlyName=BASIC}, already registered.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class io.atomix.utils.time.Versioned already registered as 509. Skipping 513.
16:31:43.980 [main] DEBUG io.atomix.utils.serializer.Namespace - (no name): class [B already registered as 16. Skipping 514.
Jordan Halterman
@kuujo
Aug 08 2018 08:36
it says it started successfully
Junbo Ruan
@aruanruan
Aug 08 2018 08:36
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class [B already registered as 16. Skipping 500.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.concurrent.atomic.AtomicBoolean already registered as 17. Skipping 501.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.concurrent.atomic.AtomicInteger already registered as 18. Skipping 502.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.concurrent.atomic.AtomicLong already registered as 19. Skipping 503.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.Collections$UnmodifiableSet already registered as 23. Skipping 507.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.HashMap already registered as 24. Skipping 508.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.concurrent.ConcurrentHashMap already registered as 25. Skipping 509.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.concurrent.CopyOnWriteArraySet already registered as 26. Skipping 510.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.ArrayList already registered as 27. Skipping 511.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.LinkedList already registered as 28. Skipping 512.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.HashSet already registered as 29. Skipping 513.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.LinkedHashSet already registered as 30. Skipping 514.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.ArrayDeque already registered as 31. Skipping 515.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class com.google.common.collect.HashMultiset already registered as 32. Skipping 516.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class com.google.common.collect.Multisets$ImmutableEntry already registered as 33. Skipping 517.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class com.google.common.collect.Sets already registered as 34. Skipping 518.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class com.google.common.collect.ImmutableEntry already registered as 35. Skipping 519.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.util.Collections$SingletonList already registered as 37. Skipping 521.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.time.Duration already registered as 38. Skipping 522.
16:31:44.058 [raft-client-system-partition-1-3] DEBUG io.atomix.utils.serializer.Namespace - 136db184-4239-4bbc-b26b-4ccc1bc0832e: class java.
Johno Crawford
@johnou
Aug 08 2018 08:36
@aruanruan maybe use a gist?
Junbo Ruan
@aruanruan
Aug 08 2018 08:36
16:31:46.356 [atomix-cluster-heartbeat-sender] INFO  i.a.c.i.DefaultClusterMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Member unreachable: Member{id=node-2, address=10.93.76.228:5000, properties={}}
16:31:48.591 [atomix-cluster-heartbeat-sender] INFO  i.a.c.i.DefaultClusterMembershipService - node-2 - Member reachable: Member{id=node-2, address=10.93.76.228:5000, properties={}}
16:31:48.592 [atomix-data-0] DEBUG i.a.p.s.impl.RecoveringSessionClient - SessionClient{1}{type=AtomicMapType{name=atomic-map}, name=136db184-4239-4bbc-b26b-4ccc1bc0832e} - State changed: CONNECTED
16:31:49.036 [raft-client-system-partition-1-2] DEBUG i.a.p.r.s.impl.RaftSessionConnection - SessionClient{199}{type=AtomicMapType{name=atomic-map}, name=atomix-transactions} - CommandRequest{session=199, sequence=2, operation=PrimitiveOperation{id=DefaultOperationId{id=listen, type=COMMAND}, value=null}} failed! Reason: {}
java.util.concurrent.TimeoutException: Request timed out in 5041 milliseconds
    at io.atomix.cluster.messaging.impl.NettyMessagingService$AbstractClientConnection.timeoutCallbacks(NettyMessagingService.java:919)
    at io.atomix.cluster.messaging.impl.NettyMessagingService.timeoutAllCallbacks(NettyMessagingService.java:338)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16:31:49.036 [netty-messaging-event-nio-client-10] DEBUG i.a.c.m.impl.NettyMessagingService - Received a reply for message id:[520] but was unable to locate the request handle
ok, i try
Jordan Halterman
@kuujo
Aug 08 2018 08:37
16:31:43.948 [atomix-cluster-0] INFO  io.atomix.cluster.AtomixCluster - Started
that’s the last message logged for startup
Junbo Ruan
@aruanruan
Aug 08 2018 08:38
i can see the client joined in cluster
16:31:49.036 [raft-client-system-partition-1-2] DEBUG i.a.p.r.s.impl.RaftSessionConnection - SessionClient{199}{type=AtomicMapType{name=atomic-map}, name=atomix-transactions} - CommandRequest{session=199, sequence=2, operation=PrimitiveOperation{id=DefaultOperationId{id=listen, type=COMMAND}, value=null}} failed! Reason: {}
java.util.concurrent.TimeoutException: Request timed out in 5041 milliseconds
    at io.atomix.cluster.messaging.impl.NettyMessagingService$AbstractClientConnection.timeoutCallbacks(NettyMessagingService.java:919)
    at io.atomix.cluster.messaging.impl.NettyMessagingService.timeoutAllCallbacks(NettyMessagingService.java:338)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
  /**
   * Handles an execute request.
   *
   * @param request the execute request
   * @return future to be completed with an execute response
   */
  public CompletableFuture<ExecuteResponse> execute(ExecuteRequest request) {
    log.info("execute {}", request);
    ComposableFuture<ExecuteResponse> future = new ComposableFuture<>();
    threadContext.execute(() -> {
      role.execute(request).whenComplete(future);
    });
    return future;
  }
PrimaryBackupServiceContext.java executed
but role.execute may not execute
server log:
16:31:38.867 [atomix-cluster-heartbeat-sender] INFO  i.a.c.i.DefaultClusterMembershipService - f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7 - Member reachable: Member{id=f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7, address=10.78.10.219:5689, properties={}}
16:31:48.349 [netty-messaging-event-epoll-client-2] INFO  i.a.p.b.i.PrimaryBackupServerContext - execute ExecuteRequest{session=1, node=f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7, primitive=PrimitiveDescriptor{name=136db184-4239-4bbc-b26b-4ccc1bc0832e, type=atomic-map, backups=1, replication=ASYNCHRONOUS}, operation=PrimitiveOperation{id=DefaultOperationId{id=putWithTtl, type=COMMAND}, value=byte[]{length=28, hash=2012739193}}}
16:31:53.490 [netty-messaging-event-epoll-client-2] INFO  i.a.p.b.i.PrimaryBackupServerContext - execute ExecuteRequest{session=1, node=f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7, primitive=PrimitiveDescriptor{name=136db184-4239-4bbc-b26b-4ccc1bc0832e, type=atomic-map, backups=1, replication=ASYNCHRONOUS}, operation=PrimitiveOperation{id=DefaultOperationId{id=putWithTtl, type=COMMAND}, value=byte[]{length=28, hash=2012739193}}}
16:31:58.615 [netty-messaging-event-epoll-client-2] INFO  i.a.p.b.i.PrimaryBackupServerContext - execute ExecuteRequest{session=1, node=f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7, primitive=PrimitiveDescriptor{name=136db184-4239-4bbc-b26b-4ccc1bc0832e, type=atomic-map, backups=1, replication=ASYNCHRONOUS}, operation=PrimitiveOperation{id=DefaultOperationId{id=putWithTtl, type=COMMAND}, value=byte[]{length=28, hash=2012739193}}}
16:32:03.774 [netty-messaging-event-epoll-client-2] INFO  i.a.p.b.i.PrimaryBackupServerContext - execute ExecuteRequest{session=1, node=f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7, primitive=PrimitiveDescriptor{name=136db184-4239-4bbc-b26b-4ccc1bc0832e, type=atomic-map, backups=1, replication=ASYNCHRONOUS}, operation=PrimitiveOperation{id=DefaultOperationId{id=putWithTtl, type=COMMAND}, value=byte[]{length=28, hash=2012739193}}}
16:32:08.998 [netty-messaging-event-epoll-client-2] INFO  i.a.p.b.i.PrimaryBackupServerContext - execute ExecuteRequest{session=1, node=f2afa6e6-2c2a-4289-90cf-6fda5db0cfb7, primitive=PrimitiveDescriptor{name=136db184-4239-4bbc-b26b-4ccc1bc0832e, type=atomic-map, backups=1, replication=ASYNCHRONOUS}, operation=PrimitiveOperation{id=DefaultOperationId{id=putWithTtl, type=COMMAND}, value=byte[]{length=28, hash=2012739193}}}
it execute 5 times fiting the retrying count
Junbo Ruan
@aruanruan
Aug 08 2018 08:44
PrimaryBackupRole.execute(ExecutionRequest request) may not execute
 /**
   * Handles an execute response.
   *
   * @param request the execute request
   * @return future to be completed with the execute response
   */
  public CompletableFuture<ExecuteResponse> execute(ExecuteRequest request) {
    log.info("execute {}", request);
    logRequest(request);
    return CompletableFuture.completedFuture(logResponse(ExecuteResponse.error()));
  }
i add log in this function, but no log ouput
Jordan Halterman
@kuujo
Aug 08 2018 08:46
that’s the primary backup protocol
you’re using Raft or primary-backup?
Junbo Ruan
@aruanruan
Aug 08 2018 08:49
primary-backup
Jordan Halterman
@kuujo
Aug 08 2018 08:49
I guess you’d have to enable TRACE logging for the primary-backup protocol to see replication
Junbo Ruan
@aruanruan
Aug 08 2018 08:49
ok
Jordan Halterman
@kuujo
Aug 08 2018 08:50
wait
seems strange that PrimaryBackupRole.execute would not be called
that’s just a queue and thread
are you sure it’s not called?
there are several different roles
Junbo Ruan
@aruanruan
Aug 08 2018 08:50
no sure
Jordan Halterman
@kuujo
Aug 08 2018 08:51
the implementation in PrimaryBackupRole is just the default and is overridden by other roles
every request inside the protocols is logged internally, so you can just enable TRACE logging for io.atomix.protocols.raft or io.atomix.protocols.backup to see all the requests
and responses
Junbo Ruan
@aruanruan
Aug 08 2018 08:54
got it
Junbo Ruan
@aruanruan
Aug 08 2018 09:02
16:58:17.397 [raft-server-system-partition-1-state] DEBUG i.a.p.r.service.RaftServiceContext - PrimitiveService{140}{type=AtomicMapType{name=atomic-map}, name=primitives} - Opening session 556
16:58:17.397 [raft-server-system-partition-1-state] DEBUG i.a.p.raft.session.RaftSession - Session{556}{type=AtomicMapType{name=atomic-map}, name=primitives} - State changed: OPEN
16:58:17.514 [raft-server-system-partition-1-state] DEBUG i.a.p.r.service.RaftServiceContext - PrimitiveService{143}{type=AtomicMapType{name=atomic-map}, name=atomix-transactions} - Opening session 559
16:58:17.514 [raft-server-system-partition-1-state] DEBUG i.a.p.raft.session.RaftSession - Session{559}{type=AtomicMapType{name=atomic-map}, name=atomix-transactions} - State changed: OPEN
16:58:42.225 [raft-server-system-partition-1-state] DEBUG i.a.p.r.service.RaftServiceContext - PrimitiveService{4}{type=SessionIdGeneratorType{name=SESSION_ID_GENERATOR}, name=session-id} - Session expired in 36546 milliseconds: RaftSession{RaftServiceContext{server=system-partition-1, type=SessionIdGeneratorType{name=SESSION_ID_GENERATOR}, name=session-id, id=4}, session=106, timestamp=2018-08-08 04:58:05,335}
16:58:42.225 [raft-server-system-partition-1-state] DEBUG i.a.p.raft.session.RaftSession - Session{106}{type=SessionIdGeneratorType{name=SESSION_ID_GENERATOR}, name=session-id} - State changed: EXPIRED
16:58:42.226 [raft-server-system-partition-1-state] DEBUG i.a.p.r.service.RaftServiceContext - PrimitiveService{4}{type=SessionIdGeneratorType{name=SESSION_ID_GENERATOR}, name=session-id} - Session expired in 36546 milliseconds: RaftSession{RaftServiceContext{server=system-partition-1, type=SessionIdGeneratorType{name=SESSION_ID_GENERATOR}, name=session-id, id=4}, session=39, timestamp=2018-08-08 04:58:05,335}
16:58:42.226 [raft-server-system-partition-1-state] DEBUG i.a.p.raft.session.RaftSession - Session{39}{type=SessionIdGeneratorType{name=SESSION_ID_GENERATOR}, name=session-id} - State changed: EXPIRED
16:58:42.227 [raft-server-system-partition-1-state] DEBUG i.a.p.r.service.RaftServiceContext - PrimitiveService{2}{type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector} - Session expired in 36546 milliseconds: RaftSession{RaftServiceContext{server=system-partition-1, type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector, id=2}, session=104, timestamp=2018-08-08 04:58:05,335}
16:58:42.227 [raft-server-system-partition-1-state] DEBUG i.a.p.raft.session.RaftSession - Session{104}{type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector} - State changed: EXPIRED
16:58:42.230 [raft-server-system-partition-1-state] DEBUG i.a.p.r.service.RaftServiceContext - PrimitiveService{2}{type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector} - Session expired in 36546 milliseconds: RaftSession{RaftServiceContext{server=system-partition-1, type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector, id=2}, session=38, timestamp=2018-08-08 04:58:05,335}
16:58:42.230 [raft-server-system-partition-1-state] DEBUG i.a.p.raft.session.RaftSession - Session{38}{type=PrimaryElectorType{name=PRIMARY_ELECTOR}, name=atomix-primary-elector} - State changed: EXPIRED
16:58:42.231 [raft-server-system-partition-1-state] DEBUG i.a.p.r.service.RaftServiceContext - PrimitiveService{140}{type=AtomicMapType{name=atomic-map}, name=primitives} - Session expired in 36546 milliseconds: RaftSession{RaftServiceContext{server=system-partition-1, type=AtomicMapType{name=atomic-map}, name=primitives, id=140}, session=147, timestamp=2018-08-08 04:58:05,335}
16:58:42.231 [raft-server-system-partition-1-state] DEBUG i.a.p.raft.session.RaftSession - Session{147}{type=AtomicMapType{name=atomic-map}, name=primitives} - State changed: EXPIRED
16:58:42.232 [raft-server-system-partition-1-state] DEBUG i.a.p.r.service.RaftServiceContext - PrimitiveService{143}{type=AtomicMapType{name=atomic-map}, name=atomix-transactions} - Session expired in 36546 milliseconds: RaftSession{RaftServiceContext{server=system-partition-1,
Jordan Halterman
@kuujo
Aug 08 2018 09:03
did you stop a node?
Junbo Ruan
@aruanruan
Aug 08 2018 09:03
client will stop
after timeout
Jordan Halterman
@kuujo
Aug 08 2018 09:05
after which timeout?
Junbo Ruan
@aruanruan
Aug 08 2018 09:05
client will exit the junit when throwing TimeoutException
Jordan Halterman
@kuujo
Aug 08 2018 09:06
oh gotcha
Junbo Ruan
@aruanruan
Aug 08 2018 09:06
ExecuteRequest will timeout
Jordan Halterman
@kuujo
Aug 08 2018 09:06
well in that case it’s correct for the sessions to be expired
do you think any of this can be reproduced in a gist?
Junbo Ruan
@aruanruan
Aug 08 2018 09:07
ok, i try it later
Jordan Halterman
@kuujo
Aug 08 2018 09:07
All our tests use clients, and basically every operation of every primitive is tested using a client and both the Raft and primary-backup protocols, so I feel like this is just going to be fruitless without a good piece of code that reproduces this
maybe a configuration for an agent cluster plus client code would work
Junbo Ruan
@aruanruan
Aug 08 2018 09:09
now i have to leave, it is our badminton time
Jordan Halterman
@kuujo
Aug 08 2018 09:09
Awesome!
man I love badminton
wish people played it more in the US
man Maven Central sucks
it’s so frequently down
Johno Crawford
@johnou
Aug 08 2018 09:19
squash is where it's at
Jordan Halterman
@kuujo
Aug 08 2018 09:23
if I go to sleep the release might be uploaded when I wake up
might
Johno Crawford
@johnou
Aug 08 2018 13:35
shit, that slow?
Jordan Halterman
@kuujo
Aug 08 2018 16:26
still not fixed