These are chat archives for atomix/atomix

10th
Jul 2017
terrytan
@txm119161336_twitter
Jul 10 2017 02:50
@kuujo , During the leader election, if old leader cached some operations (command or query),given that they are all in cache, when the new leader takes leadership ,the cached operations will be lost,right?
Jordan Halterman
@kuujo
Jul 10 2017 03:12
Nope. All operations are ultimately applied on all nodes, and all nodes do the same caching of command output, same storing of events and other information that needs to be seen by clients in case of leader changes. Otherwise, I suppose state machines would have to be rebuilt after a leader change, which could be pretty expensive. Caches for command outputs and events are cleared on non-leader nodes when keep-alive entries are committed.
When a command is applied, its output is cached. When a client receives a command output, the next keep-alive will indicate that command has been completed and no longer needs to be held by servers.
terrytan
@txm119161336_twitter
Jul 10 2017 03:18
I means not the applied commands and queries, for those queries with greater commandseq (greater than the session.commandseq), these will be cached locally, and will not be replicated to followers temporary, so these operations will be lost during leader switch, right?
Jordan Halterman
@kuujo
Jul 10 2017 03:22
Gotcha. Yes. Madan and I actually had a discussion about this type of query information failing over and it was ultimately determined to be far too expensive for a variety of reasons relating to consistency guarantees. So Copycat/Atomix Raft just assumes the underlying transport will time out and queries can be retried by the client (atomix-raft now supports query retries, but they're still disabled by default because they break guarantees for asynchronous operations). Really, leaders should reject pending queries after a leader change.
Actually, nothing needs to change
A LINEARIZABLE query that's pending when a leader change occurs will ultimately be failed when the leader can't verify its leadership after the query has been applied
SEQUENTIAL queries will still eventuallybe completed
Perhaps after the new leader begins replicating entries, but eventually
Jordan Halterman
@kuujo
Jul 10 2017 03:29

Commands that are pending may not be completed. They should eventually be rejected once the leader figures out there's a new leader.

Linearizable queries that are pending will be applied, but one the leader attempts the next heartbeat will be failed.

Sequential queries that are pending will eventually succeed as the state machine on the follower (former leader) continues to progress

terrytan
@txm119161336_twitter
Jul 10 2017 03:37
I think keeping operations in the client will be good , we can have strategies like (persisting the operations or not ), the raft -cluster will not have a heavy load on persisting these operations ,it is better doing it on client side .
BTW , i found the current logic for commands is to cache commands whose reqestseq is greater than session.reqseq, for those commands with lower reqseq, it will continue to apply ,rather than reject it ,right?
terrytan
@txm119161336_twitter
Jul 10 2017 03:49
blob
Jordan Halterman
@kuujo
Jul 10 2017 03:52

Yeah. Commands with a sequence number greater than the next sequence number are held until they can be written to the log in sequentially order.

The reason for also logging commands that have already been logged is the following scenario:
• Client submits command 1 to leader A
• Leader A commits command 1 and then crashes before responding
• Leader B is elected
• Client resubmits command 1

In this scenario, the second command needs to return cached output from the first time the command was committed. So, it's logged and committed and applied, but the second application of the command just returns the cached output from the first time.

This could actually be further optimized by not writing duplicate commands to the log at all, but holding on to the request until the first command's output is available (it's committed and applied to the state machine). These are optimizations I'm working on in Atomix this weekend
atomix-raft has part of that optimization already
FYI you can link to specific lines/blocks in GitHub rather than taking screenshots
terrytan
@txm119161336_twitter
Jul 10 2017 06:57
for client submits a lot of commands ,some of commands will keep in leader cache, while the leader is crashed ,the new leader will have no idea about the commands in old leader'cached ,for this scenario ,the cached commands will be lost ,right?
if the client timeout ,then retry one by one ,then these commands will not be lost
Jordan Halterman
@kuujo
Jul 10 2017 20:10
Correct
Those retries are done internally though
The transport just needs to timeout requests
The leader could (and probably should) just reject requests when it loses its leadership, but the transport still needs to either timeout or fail requests when a connection is lost in case the leader actually crashes or there's a partition between the client and leader.