These are chat archives for atomix/atomix
We are no longer monitoring this channel, please join Slack! https://join.slack.com/t/atomixio/shared_invite/enQtNDgzNjA5MjMyMDUxLTVmMThjZDcxZDE3ZmU4ZGYwZTc2MGJiYjVjMjFkOWMyNmVjYTc5YjExYTZiOWFjODlkYmE2MjNjYzZhNjU2MjY
Thanks for the quick and detailed reply. Most everything you said makes sense to me (except the bounded clock drift part as I think you don't really need that assumption to implement leader leases).
Can I get @madjam 's benchmarking code? Or could you give me a rough idea of copycat's single-node or 3-node throughput with say 1KB requests where each node is a single-core machine (assuming that more cores will roughly linearly scale throughput until serialization or the NIC becomes a bottleneck)? ZooKeeper can easily get over 10K/s. We have benchmarked Raft's LogCabin implementation before and barely gotten 2K/s.
Another question: is there a formal definition of "fault-tolerant sessions" in Atomix that I can read about or look at code somewhere? Are you basically caching requestIDs on the server to prevent duplicate executions and simply having the client retransmit uncommitted requests upon a connection failure or a switch to a different server?
(Throttling for log compaction is a good idea. ZooKeeper runs out of storage in minutes when pushed to capacity but doesn't allow reducing its log garbage collection interval to less than an hour.)
DistributedValue. The reason I suggest these two is because (a) they are simple (b) they cover all the bases with regards to log compaction that @kuujo alluded to. From my experience running on a 3 node baremetal server with with
FILEbased log and 3 clients (co-located on the same 3 hosts) submitting operations I could hit 13-14k operations per second. In my test each client calls
DistributedLong. Now if I use a
MEMORYbased log I can easily push the system throughput to ~45K operations per second. While not many people might use the system with a
MEMORYbased log in production, the reason for testing that configuration is to identify hot spots else where in the system. So one conclusion is that as the storage layer get faster the system can get faster as well.
start + election timeout / clock drift boundsince the followers shouldn’t time out before then."
it would extends
@madjam thanks, that's helpful. We'll check those out.
@kuujo thanks for the pointer. I just meant that there are ways to implement leases without relying on synchrony, but I understand what you are doing now.
@v-arun sorry I missed one of your questions. All the algorithms Copycat/Atomix uses are described on the website: http://atomix.io/copycat/docs/session-events/ I’ve been working on a spec in TLA+ but that sort of went by the wayside a while back.
The gist of how sessions in Copycat work is basically like you said. Clients attach a sequence number to each request. If a request times out on the client, it resends it. When a command from a client is applied to the state machine, the command is cached in the state machine by its sequence number. When clients send keep-alives, they send the highest sequence number for which they’ve received a response. When a server applies a keep-alive for a client, the server clears all cached outputs up to that point. This caching is done on all servers so that if a client switches servers and resubmits a command it’s only applied once.
Session events work in a similar way. All state machines cache session events (published by the state machine to a specific session) with the
index at which the event was published. The server to which a client is connected actually sends the event to the client. It’s the responsibility of the client to ensure events are received in order. As with commands, clients send the highest
index for which events have been received in their keep-alive requests. If a client switches servers, the new server will send any events the client missed.
All of those algorithms are described in the architecture documentation. Most of this state is tracked in the ServerSessionContext class if you’re interested in reading code.