These are chat archives for atomix/atomix

23rd
Apr 2018
Johno Crawford
@johnou
Apr 23 2018 08:30
@kuujo working on like 4/5 feature branches?
Jordan Halterman
@kuujo
Apr 23 2018 08:30
at least :-P
actually they all build on each other
Johno Crawford
@johnou
Apr 23 2018 08:31
merge conflict ahoy
Jordan Halterman
@kuujo
Apr 23 2018 08:41

Just about done with all the changes. Looking really good.

The only problem left is that the configuration for client nodes is pretty tedious. Really, a client node just has to store no partitions, so I’m wondering if the PartitionGroups should only have to be defined by the nodes that participate in them, e.g. if nodes a, b, and c are configured with partition group foo then they are the only nodes that replicate the group. All other nodes would then discover the existence of the groups via gossip. So, a “client” node would just be a node that’s not configured with any partition groups.

Hmm that may complicate configuration for data nodes actually
Actually maybe that’s not true
That may work
Johno Crawford
@johnou
Apr 23 2018 08:46
is there a default partition group
or do they always need to be defined
and how's that tie in with defining say, a distributed map
Jordan Halterman
@kuujo
Apr 23 2018 08:54

Well there’s a “system” partition group that is always required by nodes that store data. The system partition group is used for storing information about primitives and electing primaries in the primary-backup protocol.

In addition to the system group (which is usually just one partition), additional Raft or primary-backup groups can be added. Raft groups have to be on persistent nodes, and primary-backup groups can be on any node.

When a primitive is created, it’s replicated within some partition group. To create a persistent Raft replicated primitive, configure the primitive with a RaftProtocol that points to the desired partition group. To create an in-memory primary-backup primitive, create a MultiPrimaryProtocol that points to the desired primary-backup partition group. The configured PrimitiveProtocol is a client-level configuration (e.g. timeouts, retries, etc) and is used to create a PrimitiveProxys for each partition via the indicated PartitionGroup. This is how primitives are decoupled from the replication protocol (Raft or primary-backup).

I actually think it makes a lot of sense to allow nodes to configure which partition groups they participate in. That will simplify a lot of configuration actually.

One of those branches adds profiles, so rather than configuring partition groups you can just do:

cluster:
  local-node:
    id: foo
    address: localhost:1234
profiles:
  - consensus
  - data-grid

That configures a Raft partition group and a primary-backup partition group with a Raft system group if persistent nodes are defined.

Johno Crawford
@johnou
Apr 23 2018 08:58
so in this case because it's in the consensus group it's explicitly a persistent node type, correct?
Jordan Halterman
@kuujo
Apr 23 2018 09:00
Yep
The challenge with node types and primitives being decoupled from replication protocols is the configuration. Need to preserve the ability to use the low-level API while eliminating the need to use it for most users. It’s getting a little closer with these changes but not quite there yet.
Jordan Halterman
@kuujo
Apr 23 2018 09:15
I’ll hack out the configuration changes tomorrow then will have to start on the docs to explain the architecture of partition groups and primitives
rbondar
@rbondar
Apr 23 2018 21:48
Noticed that once there is a little more latency between nodes some nodes are marked as deactivated in a cluster. I handle these states' changes and don't use message service with nodes which are deactivated or removed. Finally i got a situation when jvms are alive, a bit loaded with a consequent state of zero length list of active nodes to which i can send a request. Is it expected behavior ? Should i ignore "deactivated" state and continue pass request to nodes till they are not in "removed" state?
Jordan Halterman
@kuujo
Apr 23 2018 21:55

Need to tune the failure detector parameters.

There are two types of nodes: persistent and ephemeral (currently CORE and DATA respectively). Persistent nodes will always be present in the configuration unless explicitly removed, but they may be activated/deactivated. Ephemeral nodes are removed from the cluster configuration when they become unavailable.

We need to add the failure detection parameters to ClusterConfig to make them tunable. Same goes for messaging/timeouts