These are chat archives for atomix/atomix

25th
Jul 2018
Jordan Halterman
@kuujo
Jul 25 2018 00:19
I’m home now

I’ve been working on https://github.com/atomix/atomix-test. I still need to document it, but it’s pretty fun to play with.

Set up a 3 node cluster with a consensus configuration:

atomix-test cluster foo setup consensus -n 3

Add a stateless node to the cluster

atomix-test cluster foo add-node client

Tear down the cluster

atomix-test cluster foo teardown

The script can take any configuration file and will set up a Docker network/nodes to form a cluster. It also can be used to mess with the cluster - kill nodes and cause partitions and what not.

toni
@digevil
Jul 25 2018 02:29
hello, i'm new to atomix, and i tried to build atomix document tree in a 1 node raft cluster but fail to get it after build
toni
@digevil
Jul 25 2018 03:18
i am using rc3 btw
toni
@digevil
Jul 25 2018 05:58

java.util.concurrent.CompletionException: io.atomix.primitive.PrimitiveException$Unavailable: Failed to reach consensus

at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:769)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Caused by: io.atomix.primitive.PrimitiveException$Unavailable: Failed to reach consensus
at io.atomix.protocols.raft.RaftError$Type$10.createException(RaftError.java:224)
at io.atomix.protocols.raft.RaftError$Type$10.createException(RaftError.java:219)
at io.atomix.protocols.raft.RaftError$Type$10.createException(RaftError.java:224)
at io.atomix.protocols.raft.RaftError.createException(RaftError.java:62)
at io.atomix.protocols.raft.session.impl.RaftSessionInvoker$CommandAttempt.accept(RaftSessionInvoker.java:362)
at io.atomix.protocols.raft.session.impl.RaftSessionInvoker$CommandAttempt.accept(RaftSessionInvoker.java:324)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at io.atomix.protocols.raft.session.impl.RaftSessionConnection.handleResponse(RaftSessionConnection.java:260)
at io.atomix.protocols.raft.session.impl.RaftSessionConnection.lambda$sendRequest$7(RaftSessionConnection.java:227)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at io.atomix.utils.concurrent.ThreadPoolContext.lambda$new$0(ThreadPoolContext.java:81)
... 7 more

this is the exception i got, i created 1 raft partition group, 1 primary group, and i want to add atomic doc tree to the raft group
thanks for anybody knows what's wrong, i tried atomic map and it works with same cluster setup code
Jordan Halterman
@kuujo
Jul 25 2018 08:09

@digevil can you copy the node configuration?

One thing you have to be careful of with Raft is that you delete the data directory if you’re trying to start a new fresh cluster. Sometimes when changing versions or even just configurations, the cluster will pick up the old configuration and/or logs from the data directory and behave erratically because of it. This seems like pretty erratic behavior because it implies a Raft session was successfully created but the leader wasn’t able to commit a write.

The data directory is configured in withDataDorectory in code or the data-directory key in configuration files for the Raft partition group.

By default I think it’s System.getProperty(“user.dir”) + “/.data” or something
Need to add that tidbit to the website
toni
@digevil
Jul 25 2018 08:50
@kuujo thanks for the reply, yes as you mentioned i didn't remove the data directory before run code
toni
@digevil
Jul 25 2018 09:07
i got a class cast exception now
ConsistencyConfig consistencyConfig = new ConsistencyConfig();
consistencyConfig.setClusterId("sortingHat");
consistencyConfig.setManagementId("management");
consistencyConfig.setManagementPartition(1);
consistencyConfig.setManagementGroup(Sets.newHashSet("node1"));
consistencyConfig.setManagementStorageLevel(StorageLevel.MEMORY.name());
    consistencyConfig.setRaftId("raft");
    consistencyConfig.setRaftPartition(1);
    consistencyConfig.setRaftGroup(Sets.newHashSet("node1"));
    consistencyConfig.setRaftStorageLevel(StorageLevel.MEMORY.name());
here's the config code
.withClusterId(consistencyConfig.getClusterId())
.withMemberId(nodeConfig.getId().id())
.withAddress(nodeConfig.getAddress())
.withProperties(consistencyConfig.getProperties())
.withMulticastEnabled()
.withMembershipProvider(new MulticastDiscoveryProvider())
.withManagementGroup(RaftPartitionGroup.builder(consistencyConfig.getManagementId())
.withNumPartitions(consistencyConfig.getManagementPartition())
.withMembers(consistencyConfig.getManagementGroup())
.withStorageLevel(StorageLevel.valueOf(consistencyConfig.getManagementStorageLevel()))
.build())
.addPartitionGroup(RaftPartitionGroup.builder(consistencyConfig.getRaftId())
.withNumPartitions(consistencyConfig.getRaftPartition())
.withMembers(consistencyConfig.getRaftGroup())
.withStorageLevel(StorageLevel.valueOf(consistencyConfig.getRaftStorageLevel()))
.build())
.build();
here goes the atomix builder code
Jordan Halterman
@kuujo
Jul 25 2018 09:09
Where is the ClassCastException? In getDocumentTree? There was an exception in there that was fixed in think in rc4
toni
@digevil
Jul 25 2018 09:09
now in getAtomicMap
java.lang.ClassCastException: io.atomix.core.map.impl.NotNullAsyncAtomicMap cannot be cast to io.atomix.primitive.SyncPrimitive
at io.atomix.core.PrimitivesService.getPrimitive(PrimitivesService.java:763) ~[atomix-3.0.0-rc3.jar:na]
at io.atomix.core.impl.CorePrimitivesService.getAtomicMap(CorePrimitivesService.java:187) ~[atomix-3.0.0-rc3.jar:na]
at io.atomix.core.Atomix.getAtomicMap(Atomix.java:377) ~[atomix-3.0.0-rc3.jar:na]
at com.paic.isic.sortinghat.service.consistency.ConsistencyService.getEnvMap(ConsistencyService.java:115) [classes/:na]
at com.paic.isic.sortinghat.service.consistency.ConsistencyService.addGroup(ConsistencyService.java:181) [classes/:na]
at com.paic.isic.sortinghat.service.consistency.ConsistencyServiceTest.lambda$start$0(ConsistencyServiceTest.java:29) [test-classes/:na]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) ~[na:1.8.0_131]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) ~[na:1.8.0_131]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[na:1.8.0_131]
at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561) ~[na:1.8.0_131]
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929) ~[na:1.8.0_131]
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) ~[na:1.8.0_131]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
Jordan Halterman
@kuujo
Jul 25 2018 09:10
Same
Use rc4/5
Or a builder
toni
@digevil
Jul 25 2018 09:10
my logic is when a node startup, it tries to get a config atomic map, if it doesn't exist, then it try to build one
when the partition doesn't have the map and the client tries to get it, it got the exception i think
Jordan Halterman
@kuujo
Jul 25 2018 09:11
atomix/atomix@839d487
atomix.atomicMapBuilder(“foo”).build()
Will asp@work
Ugh
Will also work*
toni
@digevil
Jul 25 2018 09:13
oh, you mean i don't need to actually getAtomicMap before build
build will also check if the map exists, if not then build, otherwise just return the existing map to me?
Jordan Halterman
@kuujo
Jul 25 2018 09:16
The builder creates a new map instance, but each instance still logically points to the same state, which is based on the String name. In other wordsgetAtomicMap(“foo”) and atomicMapBuilder(“foo”).build() return instances of the same distributed map, but the getter loads the configuration from configuration files and creates only a single client instance per node, while the builder creates an instance on every build() call where each instance just has a new set of threads and appears to replicated state machines as a distinct session.
The getters basically return singletons, and builders new instances, but both refer to the same distributed state
toni
@digevil
Jul 25 2018 09:19
i c
so the safe choice might be build() first, then always get(), am i right
for any node
Jordan Halterman
@kuujo
Jul 25 2018 09:44

No you should use one or the other. The getters create one Java object and one session for each primitive. The builders create one for each build() call. Using a builder and then getter would create two instances - one distinct instance via the builder and then the singleton getter instance.

If you need to programmatically configure a primitive, you have to use a builder. If you have a large multithreaded application (which we do) then you can also benefit from using builders to create many instances since more instances of a primitive means more parallelism. But if you just need a single instance with strong consistency guarantees then either build one object or use getters if you’re able to configure the primitive in configuration files.

The biggest benefit to getters is just simplicity and to builders is customizability and parallelism. Each instance of a primitive has a separate logical thread pool and orders requests/responses/events independently of other primitives.

The best way to start is probably to use getters to get/create a primitive.

You most certainly should try to hold on to a reference to the primitive regardless. There’s a cost to calling getters repeatedly, and a major cost to calling builders. So just choose either method and assign it to a field
AtomicMap<String, String> map = atomix.<String, String>atomicMapBuilder(“foo”)
  .withProtocol(MultiRaftPotocol.builder()
    ...
    .build())
  .build();
Jordan Halterman
@kuujo
Jul 25 2018 09:50
In ONOS we almost always use builders and assign them to a field at startup and hold on to those references for the lifetime of the process. Different components may reference the same named primitive, in which case using builders gives us parallelism across services. And each service can configure its instance how it likes.
Where sharing a primitive across code that’s otherwise encapsulated becomes useful is with e.g. caching. Creating multiple instances of a cached primitive will create multiple caches, but if the getter is used instead in that case then multiple services will share the cache.
So, there are many trade offs with respect to how partitions are configured and how primitives are configured and constructed. Atomix allows vey fine grained control over how memory and threads are used, but that creates challenges in the API.
That’s why it’s best to start to just either get or build a single reference to a primitive and use it.
toni
@digevil
Jul 25 2018 12:34
@kuujo thank you so much
last question, now i have a 3 node cluster, with 3 partition raft partition group, i build a atomic map and a atomic lock and a atomic document tree one after another, all looks good, then i tried to operate the 3 primitives one by one, and the map and lock works, but the tree doesn't work, and i got exception like this:
20:32:31.276 [raft-server-raft-partition-1-state] WARN i.a.protocols.raft.roles.LeaderRole - RaftServer{raft-partition-1}{role=LEADER} - An unexpected error occurred: {}
io.atomix.primitive.PrimitiveException$ServiceException: null
at io.atomix.primitive.service.impl.DefaultServiceExecutor.apply(DefaultServiceExecutor.java:185) ~[atomix-primitive-3.0.0-rc3.jar:na]
at io.atomix.primitive.service.AbstractPrimitiveService.apply(AbstractPrimitiveService.java:112) ~[atomix-primitive-3.0.0-rc3.jar:na]
at io.atomix.protocols.raft.service.RaftServiceContext.applyCommand(RaftServiceContext.java:492) ~[atomix-raft-3.0.0-rc3.jar:na]
at io.atomix.protocols.raft.service.RaftServiceContext.executeCommand(RaftServiceContext.java:464) ~[atomix-raft-3.0.0-rc3.jar:na]
at io.atomix.protocols.raft.impl.RaftServiceManager.applyCommand(RaftServiceManager.java:756) ~[atomix-raft-3.0.0-rc3.jar:na]
at io.atomix.protocols.raft.impl.RaftServiceManager.lambda$apply$12(RaftServiceManager.java:400) ~[atomix-raft-3.0.0-rc3.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_131]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
Johno Crawford
@johnou
Jul 25 2018 13:47
@digevil did you try with rc5?
Jordan Halterman
@kuujo
Jul 25 2018 17:34
@digevil code? Which DocumentTree operation failed? This is a warning in a Raft state machine logging an exception, but it should have been propagated back to the client and caused an exception there. There are no known bugs in any of the primitives in rc3 or later. Wondering if there’s an AtomicDocumentTree method that’s not tested.

Sometimes state machines may also throw exceptions that are handled by the client. What matters is what the client sees.

This should be reproducible with the code that caused it.