These are chat archives for atomix/atomix

13th
Aug 2018
suimi
@suimi
Aug 13 2018 07:15
i run a atomix cluster, one node offline as some reson, it can't be recover after started
2018-08-13 10:51:03,223 [raft-server-raft-group-partition-1] DEBUG RaftClusterContext - RaftServer{raft-group-partition-1} - Successfully joined via 4 [,,]
2018-08-13 10:51:03,442 [raft-server-raft-group-partition-1] INFO  RaftContext - RaftServer{raft-group-partition-1} - Found leader 1 [,,]
2018-08-13 10:51:03,446 [raft-server-raft-group-partition-1] DEBUG FollowerRole - RaftServer{raft-group-partition-1}{role=FOLLOWER} - Rejected AppendRequest{term=12, leade
r=1, prevLogIndex=271504, prevLogTerm=12, entries=0, commitIndex=273836}: Previous index (271504) is greater than the local log's last index (19545) [,,]
2018-08-13 10:51:03,662 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,663 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,663 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,663 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,664 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,664 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,664 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,664 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,664 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,664 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,664 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Unknown session: 16977 [,,]
2018-08-13 10:51:03,664 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Installing snapshot FileSnapshot{index=1
8807} [,,]
2018-08-13 10:51:03,666 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceManager - RaftServer{raft-group-partition-1} - Installing service 2 CommandPrimitiveTyp
e [,,]
2018-08-13 10:51:03,738 [raft-server-raft-group-partition-1-state] DEBUG DefaultServiceExecutor - PrimitiveService{2}{type=com.higgs.trust.consensus.atomix.core.primitive.
CommandPrimitiveType@72557746, name=CommandPrimitiveType} - Registered operation callback DefaultOperationId{id=submit, type=COMMAND} [,,]
2018-08-13 10:51:03,738 [raft-server-raft-group-partition-1-state] DEBUG RaftServiceContext - PrimitiveService{2}{type=com.higgs.trust.consensus.atomix.core.primitive.Comm
andPrimitiveType@72557746, name=CommandPrimitiveType} - Installing snapshot 18807 [,,]
2018-08-13 10:51:03,779 [raft-server-raft-group-partition-1-state] WARN  RaftServiceContext - PrimitiveService{2}{type=com.higgs.trust.consensus.atomix.core.primitive.Comm
andPrimitiveType@72557746, name=CommandPrimitiveType} - Session not open: RaftSession{RaftServiceContext{server=raft-group-partition-1, type=com.higgs.trust.consensus.atom
ix.core.primitive.CommandPrimitiveType@72557746, name=CommandPrimitiveType, id=2}, session=16977, timestamp=2018-08-10 10:09:06,775} [,,]
Mark de Jong
@Fristi
Aug 13 2018 08:21
@kuujo Thanks for your answer. I've got a running setup atm with Profile.dataGrid and no management or partition groups. I see it's forming a cluster in docker contains, so that's good :-) What primitives/architecture would you recommend for partitionining messages to processes on nodes ? Each message has a identifier which can be used to shard/partition it to a specific node. I want to route this message to a specific node so that node is the only writer for that specific identifier.
Jordan Halterman
@kuujo
Aug 13 2018 10:07
Hmm... in ONOS we actually have a service that does this. It’s called WorkPartitionService or something. You give it a key and an object and it assigns the object to be processed by the node assigned that key. That implementation uses a LeaderElector primitive to elect a leader for each key. That’s a pretty good solution depending on your requirements. It has the benefit of only reassigning keys when the leader for a key goes down. But leader balancing in the LeaderElector is currently disabled. We should probably make it configurable.
To be clear, it uses a LeaderElector to elect a leader for each key and then uses the ClusterCommunicationService to send the value to the leader to be processed.
Mark de Jong
@Fristi
Aug 13 2018 12:47
Nice, how well does that scale with a lot of identifiers/keys?
Jordan Halterman
@kuujo
Aug 13 2018 19:58

Yep that. There are a bunch of layers to ONOS services, but underneath all that there’s a LeaderElector primitive. It scales well as long as caching is enabled.

atomix.<String>leaderElectorBuilder(“foo”)
  .withProtocol(MultiRaftProtocol.builder()...build())
  .withCacheEnabled()
  .build();

or in a configuration file:

primitives.foo {
  type: leader-elector
  protocol {
    type: multi-raft
    ...
  }
  cache.enabled: true
}

When caching is enabled, all reads (i.e. leader lookups) will be local, and assuming Raft is used the consistency model is still sequential (strong ordering) because of how we implement Raft clients. Also, the leader elections will be partitioned, so even writes (i.e leader changes) will scale well. This is probably the most common distributed systems pattern we use in ONOS so we know it scales.

Mark de Jong
@Fristi
Aug 13 2018 21:28
Thank you very much for your answer :-) Gonna have a stab at it