These are chat archives for atomix/atomix

29th
Dec 2017
Paweł Kamiński
@pawel-kaminski-krk
Dec 29 2017 11:40

hi, I am playing again with newest version as I see many things have changed. I am a bit lost when testing adding new nodes and removing nodes from cluster. I see that atomix requires initial bootstrap nodes to form cluster but I cannot add new one later. I cannot see in documentation how to do that. Previously there was #join() method on Atomix instance, and now it is gone :(

I get error like this

12:23:31.292 [raft-server-core-partition-1] WARN  MDC[] io.atomix.protocols.raft.impl.DefaultRaftServer - RaftServer{core-partition-1} - Failed to start server!
Exception in thread "Thread-263" java.util.concurrent.CompletionException: java.lang.IllegalStateException: not a member of the cluster
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:769)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
    at io.atomix.protocols.raft.cluster.impl.RaftClusterContext.lambda$join$10(RaftClusterContext.java:413)
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: not a member of the cluster
    ... 11 more
12:23:33.908 [raft-server-coordination-partition-4] WARN  MDC[] io.atomix.protocols.raft.roles.FollowerRole - RaftServer{coordination-partition-4}{role=FOLLOWER} - io.atomix.messaging.MessagingException$NoRemoteHandler: No remote message handler registered for this message
Paweł Kamiński
@pawel-kaminski-krk
Dec 29 2017 11:53
and .. :) when I form cluster form 5 nodes and just after removing randomly one of them I get
12:40:43.822 [netty-messaging-event-nio-client-15] WARN  MDC[] io.atomix.cluster.impl.DefaultClusterMetadataService - Anti-entropy advertisement to 127.0.0.1:53470 failed!
12:40:44.825 [netty-messaging-event-nio-client-13] WARN  MDC[] io.atomix.cluster.impl.DefaultClusterMetadataService - Anti-entropy advertisement to 127.0.0.1:53470 failed!
12:40:45.828 [netty-messaging-event-nio-client-13] WARN  MDC[] io.atomix.cluster.impl.DefaultClusterMetadataService - Anti-entropy advertisement to 127.0.0.1:53470 failed!
12:40:45.829 [netty-messaging-event-nio-client-15] WARN  MDC[] io.atomix.cluster.impl.DefaultClusterMetadataService - Anti-entropy advertisement to 127.0.0.1:53470 failed!
12:40:47.837 [netty-messaging-event-nio-client-0] WARN  MDC[] io.atomix.cluster.impl.DefaultClusterMetadataService - Anti-entropy advertisement to 127.0.0.1:53470 failed!
12:40:48.281 [atomix-data-3] ERROR MDC[] io.atomix.utils.concurrent.ThreadPoolContext - An uncaught exception occurred
 java.lang.NoSuchMethodError: io.atomix.storage.buffer.BufferOutput.writeObject(Ljava/lang/Object;Ljava/util/function/Function;)Lio/atomix/storage/buffer/BufferOutput;
    at io.atomix.core.map.impl.ConsistentMapService.backup(ConsistentMapService.java:138) ~[atomix-2.1.0-SNAPSHOT.jar:?]
    at io.atomix.protocols.backup.roles.PrimaryRole.restore(PrimaryRole.java:157) ~[atomix-primary-backup-2.1.0-SNAPSHOT.jar:?]
    at io.atomix.protocols.backup.service.impl.PrimaryBackupServiceContext.lambda$restore$4(PrimaryBackupServiceContext.java:425) ~[atomix-primary-backup-2.1.0-SNAPSHOT.jar:?] 
        ...
12:40:48.281 [atomix-data-2] ERROR MDC[] io.atomix.utils.concurrent.ThreadPoolContext - An uncaught exception occurred
 java.lang.NoSuchMethodError: io.atomix.storage.buffer.BufferOutput.writeObject(Ljava/lang/Object;Ljava/util/function/Function;)Lio/atomix/storage/buffer/BufferOutput;
    at io.atomix.core.map.impl.ConsistentMapService.backup(ConsistentMapService.java:138) ~[atomix-2.1.0-SNAPSHOT.jar:?]
    at io.atomix.protocols.backup.roles.PrimaryRole.restore(PrimaryRole.java:157) ~[atomix-primary-backup-2.1.0-SNAPSHOT.jar:?]
    at io.atomix.protocols.backup.service.impl.PrimaryBackupServiceContext.lambda$restore$4(PrimaryBackupServiceContext.java:425) ~[atomix-primary-backup-2.1.0-SNAPSHOT.jar:?]
    ...
12:40:48.281 [atomix-data-9] ERROR MDC[] io.atomix.utils.concurrent.ThreadPoolContext - An uncaught exception occurred
 java.lang.NoSuchMethodError: io.atomix.storage.buffer.BufferOutput.writeObject(Ljava/lang/Object;Ljava/util/function/Function;)Lio/atomix/storage/buffer/BufferOutput;
    at io.atomix.core.map.impl.ConsistentMapService.backup(ConsistentMapService.java:138) ~[atomix-2.1.0-SNAPSHOT.jar:?]
    at io.atomix.protocols.backup.roles.PrimaryRole.restore(PrimaryRole.java:157) ~[atomix-primary-backup-2.1.0-SNAPSHOT.jar:?]
    at io.atomix.protocols.backup.service.impl.PrimaryBackupServiceContext.lambda$restore$4(PrimaryBackupServiceContext.java:425) ~[atomix-primary-backup-2.1.0-SNAPSHOT.jar:?]

12:40:50.609 [raft-client-coordination-partition-3-7] ERROR MDC[] io.atomix.utils.concurrent.ThreadPoolContext - An uncaught exception occurred
 io.atomix.primitive.PrimitiveException$Unavailable: null
    at io.atomix.protocols.backup.proxy.PrimaryBackupProxy.lambda$null$3(PrimaryBackupProxy.java:164) ~[atomix-primary-backup-2.1.0-SNAPSHOT.jar:?]
        ....
Johno Crawford
@johnou
Dec 29 2017 15:04
@pawel-kaminski-krk looks like you have some bad snapshots / clashing jars on your classpath
infact, if you're building from master
I would recommend setting your own version
then installing that to your local repo
mvn versions:set -DnewVersion=2.1.0-pawel
something like this from the root dir
then mvn install (you could add -DskipTests if you are in a rush)
and use 2.1.0-pawel in your project
Paweł Kamiński
@pawel-kaminski-krk
Dec 29 2017 16:44
yes I am building from master, as I am not aware of any nightly build that was published with 2.1.x version. I see now you are still modifying structure of the project and several modules were removed but were referenced from my local repo.
Johno Crawford
@johnou
Dec 29 2017 16:48
pretty sure there are 2.1.0 snapshots published to sonatype
should I look in other repo?
Paweł Kamiński
@pawel-kaminski-krk
Dec 29 2017 17:04

I don't want to bother you as 2.1.x might be still work in progress. If my experiments maybe useful to you.

in scenario where I want to remove box from cluster (5 boxes - 1 out) i still get errors like this

18:00:24.890 [netty-messaging-event-nio-client-12] WARN  MDC[] io.atomix.cluster.impl.DefaultClusterMetadataService - Anti-entropy advertisement to 127.0.0.1:57295 failed!
18:00:25.584 [raft-server-coordination-partition-4] WARN  MDC[] io.atomix.protocols.raft.roles.CandidateRole - RaftServer{coordination-partition-4}{role=CANDIDATE} - io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /127.0.0.1:57295
18:00:25.588 [raft-server-coordination-partition-4] WARN  MDC[] io.atomix.protocols.raft.roles.LeaderAppender - RaftServer{coordination-partition-4} - Failed to configure serviceN0
18:00:25.589 [raft-server-coordination-partition-4] WARN  MDC[] io.atomix.protocols.raft.roles.LeaderAppender - RaftServer{coordination-partition-4} - Failed to configure serviceN0
18:00:25.840 [raft-server-coordination-partition-4] WARN  MDC[] io.atomix.protocols.raft.roles.LeaderAppender - RaftServer{coordination-partition-4} - Failed to configure serviceN0
18:00:25.878 [atomix-data-5] ERROR MDC[] io.atomix.utils.concurrent.ThreadPoolContext - An uncaught exception occurred
 java.lang.ClassCastException: io.atomix.protocols.backup.protocol.ExpireOperation cannot be cast to io.atomix.protocols.backup.protocol.CloseOperation
    at io.atomix.protocols.backup.roles.BackupRole.applyOperations(BackupRole.java:91) ~[atomix-primary-backup-2.1.1-pawel.jar:?]
    at io.atomix.protocols.backup.roles.BackupRole.lambda$backup$0(BackupRole.java:64) ~[atomix-primary-backup-2.1.1-pawel.jar:?]

18:00:30.726 [raft-client-coordination-partition-3-6] ERROR MDC[] io.atomix.utils.concurrent.ThreadPoolContext - An uncaught exception occurred
 io.atomix.primitive.PrimitiveException$Unavailable: null
    at io.atomix.protocols.backup.proxy.PrimaryBackupProxy.lambda$null$3(PrimaryBackupProxy.java:164) ~[atomix-primary-backup-2.1.1-pawel.jar:?]
Paweł Kamiński
@pawel-kaminski-krk
Dec 29 2017 17:13

in scenario where I add more boxes to cluster (3 running + 2 new), atomix simply hangs in #start() method of the new machine

service = Atomix.builder()
                .withClusterName("Clstr")
                .withLocalNode(localNode)
                .withBootstrapNodes(bootstrapNodes)
                .withDataDirectory(dataDirectory)
                .build()
                .start()
                .join();

where bootstrapNodes consists of all primary nodes (3 nodes) + 2 new. Those 3 primary machine are running and are active when I try add those new one. code above is run from separate thread so #join() is not blocking progress of other Atomix nodes. I guess it is all right to have 3 machines with different configuration than for 2 joining.

18:04:15.359 [raft-server-coordination-partition-1] WARN  MDC[] io.atomix.protocols.raft.roles.FollowerRole - RaftServer{coordination-partition-1}{role=FOLLOWER} - io.atomix.messaging.MessagingException$NoRemoteHandler: No remote message handler registered for this message
Johno Crawford
@johnou
Dec 29 2017 17:19
master branch has snapshot in the project version
and these might be pulled in
so if your code references modules like atomix-netty it will compile as those exist in sonatype
so you might not be running the code you compiled from master
Paweł Kamiński
@pawel-kaminski-krk
Dec 29 2017 17:41
@johnou ok. I rerun my code with new version as you advised and still I cannot make it work. maybe there is something obvious I am missing :/
Johno Crawford
@johnou
Dec 29 2017 18:21
have you defined different data directories for all the nodes?
Johno Crawford
@johnou
Dec 29 2017 18:28
@jhalterman I am almost at the point where I can start profiling, but the phi failure detection is not playing nicely with my tests
atomix/atomix#366
Himanshu
@himanshug
Dec 29 2017 18:35
@kuujo getting back to this after about a year. in the older world it was possible to use just copycat to have a replicated-state-machine and user could keep any arbitrary state in there . in current code, is it ok for user to use atomix-raft standalone the way I could use copycat before or is it necessary to use atomix-cluster and other things on top of raft? I'm looking to just use the raft implementation to construct replicated state machine in my application and I can't see StateMachine anymore.
Paweł Kamiński
@pawel-kaminski-krk
Dec 29 2017 22:46
@johnou yes I did. I guess until there is some documentation covering this kind of scenarios it will be just a pain for me to guess how to configure and manage the cluster :(
Paweł Kamiński
@pawel-kaminski-krk
Dec 29 2017 23:05
@johnou but if you have time I can send you link to sample repo. I was planning to present atomic on minor conf next year but it is really hard to make a progress with new version and about old is waist of time.