These are chat archives for atomix/atomix

9th
Oct 2018
Jordan Halterman
@kuujo
Oct 09 2018 00:34 UTC
More good questions...
Jordan Halterman
@kuujo
Oct 09 2018 00:47 UTC

So, when you create a new Raft primitive you’re opening a new logical session. The Raft implementation is optimized for many primitives to be active at the same time. Essentially, each node has a Raft client for each Raft partition, and all primitives share the same Raft clients. This means the overhead of many primitives is fairly low. Each primitive has a distinct Raft session, but the client will piggyback all the keep-alives for each session in a single request. Basically, what you’re adding when you open a new session is up to 16 bytes on existing keep-alives. So, if you keep creating primitives you will have a very slow memory leak and slowly overload the network, but it will take a long time.

But all this means that the cluster can handle a large number of primitives. ONOS probably has hundreds of active primitives and in large clusters thousands of sessions that it never closes. You’ll still only have a single keep-alive per-partition per-node, with each keep-alive containing metadata (Raft log/event indexes) for each active session.

To answer your other questions, if a node goes down the primitives’ sessions will be expired and the state machines will treat them the same way as if they were closed. For example, if you acquire a lock and then the node crashes, the lock primitive’s session will expire and the lock will be released.

There are some APIs on Atomix that will show registered primitives, but not active primitive sessions IIRC. Seems like that could be useful, though, especially as a metric that can be read via the REST API.

@echavez

In other news, sorry I haven’t gotten to PRs yet today. I have been finishing the SWIM spec, but it’s pretty good now:
https://github.com/atomix/atomix-tlaplus/blob/master/SWIM/SWIM.pdf

Probably will finish up the implementation tonight

Eric Chavez
@echavez
Oct 09 2018 03:06 UTC
Ok thank you. That was very helpful.
Jordan Halterman
@kuujo
Oct 09 2018 23:34 UTC
Just finished up the SWIM implementation, and I like it! It easily tolerates some network partitions that I’ve seen in production that aren’t tolerated by the simple heartbeat/phi membership protocol. I’m going to be cleaning up, reviewing, and merging PRs for the rest of the evening.