These are chat archives for atomix/atomix

5th
Apr 2016
Jordan Halterman
@kuujo
Apr 05 2016 06:08
So, here’s the algorithm I came up with. Clients now submit eventIndex and completeIndex with their KeepAliveRequests. I believe that alone can be used to determine whether a faulty client is blocked on events. When a KeepAliveEntry is applied to the state machine, the entry’s eventIndex and completeIndex are stored in the session. If a client submits two keep-alives in a row where the eventIndex is greater than the completeIndex and the completeIndex of both keep-alives are identical (completeIndex has not increased as it should have), we assume the client is blocked and expire the session. Because we know the client has received events up to eventIndex and if completeIndex is less than eventIndex the client is still processing those events, we can assume two keep-alives with the same completeIndex indicates a faulty client if the completeIndex should have increased. This effectively gives event listeners a session timeout to complete handling of events.
Madan Jampani
@madjam
Apr 05 2016 06:11
This seems good to me.
So the recommendation to clients that may potentially have event long processing times is to move event handling on to a separate thread?
I meant to say "long event processing times"
Jordan Halterman
@kuujo
Apr 05 2016 06:18
Well... The way I did this is to actually allow asynchronous handling of events by calling listeners with an Event object rather than the event value. I'm still not totally sure about this change. What the Event interface has is a complete method that's basically an ack. So, the completeIndex is increased when all listeners for an event complete() the event. A long-running listener can immediately complete() an event to prevent the cluster from being blocked and can continue on doing whatever it wants in another thread or a thread pool can be set on the client for event listeners. This may force the user to think about an event listener's impact on the progression of the cluster. If a command is submitted with linearizable consistency for events, the command is effectively blocked until the events are acked. I suppose the alternative is removing the complete() method and considering events to be acknowledged once the listener is called, but having an ack method allows clients to do other things with the event before it's considered completed
but the complete() method does make it easier to write a faulty client simply by not acking events
Madan Jampani
@madjam
Apr 05 2016 06:21
There is also the potential for deadlock if the listener turns around and makes a query with LINEARIZABLEconsistency...
query or command
Jordan Halterman
@kuujo
Apr 05 2016 06:32
this is true
Madan Jampani
@madjam
Apr 05 2016 06:32
If we take the definition of lineaizability as “once an operation is complete every one must see it”, the system just need to make a listener aware of the event and not necessarily wait for the said listener to complete some arbitrary handling...
Jordan Halterman
@kuujo
Apr 05 2016 06:32
right
Madan Jampani
@madjam
Apr 05 2016 06:33
Are you familiar with how zookeeper deals with this?
Jordan Halterman
@kuujo
Apr 05 2016 06:34
nope
removing it is simpler and I think is fine
but this can still happen if events are considered completed after calling event listeners… if a listener submits a command and blocks on the future
this was actually what happened to me last night
so the only other option is to consider the command completed right before the listener is called
which in practice won’t make much of a difference
for linearizability… it’s just a difference in what is occuring between request and response
the client received the event? the event listener was called? the event listener did something with the event?
all could be called linearizable…. they’re just different guarantees
Jordan Halterman
@kuujo
Apr 05 2016 06:41
in practice calling the event listener immediately after completing the event should not make a difference… it would take a really long GC pause at exactly that point to break the guarantee if it’s that a listener will be called between the command’s invocation and completion
Madan Jampani
@madjam
Apr 05 2016 06:41
My preference is to use the default gaurantee as "client receives the event”. That is the only option for which copycat can definitely enforce the gaurantee. All others rely on a well behaved listener.
Jordan Halterman
@kuujo
Apr 05 2016 06:41
yep
I’ll update it
Madan Jampani
@madjam
Apr 05 2016 06:42
Great!
Jordan Halterman
@kuujo
Apr 05 2016 06:45
It will just have to be that the listener is called immediately after the event is completed from the listener’s thread… that ensures that in practice the event listener is called “when” the event is completed. The reason it’s important is otherwise a blocked event thread can mean e.g. a DistributedGroup join command can complete without all nodes actually being notified of the new member, and that can produce other consistency issues. At least if it’s completed in the event thread we have ensured the thread is not blocked at that point
hmm...
Jordan Halterman
@kuujo
Apr 05 2016 06:51
Well… actually there’s not really any difference between completing an event before or after the listener. The only real option is completing an event when it gets to the client or completing it after it’s been received by event listeners. If an event is completed when it gets to the client, there’s no need for the completeIndex at all and no need to expire sessions that have blocked event listeners.
Even if an event is completed before a listener, if that listener blocks then it will block the next event anyways
I just can’t help but feel like linearizability is not particularly useful to the user if the guarantee only extends to some code that’s internal to the client but not the user’s code
If the guarantee extends to the user’s code then there will be some potential for deadlock, and in that case at least the complete() method allows user code to handle events asynchronously if necessary
Effectively, the guarantee for event listeners becomes no different than with sequential consistency
Madan Jampani
@madjam
Apr 05 2016 06:55
I agree. It does dilute the contract significantly.
Jordan Halterman
@kuujo
Apr 05 2016 06:58

here it is:

This implies that an event is on the way to the client, but may not reach the client before the successful return code to the change operation reaches the client that initiated the change. Watches are sent asynchronously to watchers. ZooKeeper provides an ordering guarantee: a client will never see a change for which it has set a watch until it first sees the watch event. Network delays or other factors may cause different clients to see watches and return codes from updates at different times. The key point is that everything seen by the different clients will have a consistent order.

This seems to imply ZooKeeper does not attempt to actually make events linearizable
Madan Jampani
@madjam
Apr 05 2016 07:00
Exactly. Its basically sequential.
Jordan Halterman
@kuujo
Apr 05 2016 07:00
Note that if there is a change to /a between the asynchronous read and the synchronous read, the client library will receive the watch event saying /a changed before the response for the synchronous read, but because the completion callback is blocking the event queue, the synchronous read will return with the new value of /a before the watch event is processed.
yeah
Madan Jampani
@madjam
Apr 05 2016 07:00
I can see that why this so. In real deployments where not all clients are in the same trust domain, the strongest gaurantee (where we wait till the listener handling completes) can be exploited to significantly reduce system throughput. Basically a listner never acks and the system cannot make progress until the client session timesout.
Jordan Halterman
@kuujo
Apr 05 2016 07:00
hmm
yep
well… it is configurable
and SEQUENTIAL can be made the default
Jordan Halterman
@kuujo
Apr 05 2016 07:07
I think sequential consistency is totally fine if events are sequentially consistent with commands from a client to ensure logical time still progresses at the same rate on all nodes. This isn’t currently the case with SEQUENTIAL events. Commands and events are sequenced separately. In other words, if client A submits command 1to the cluster which triggers a event 1, and client B submits command 2 to the cluster, client B may see command 2 complete before it receives event 1
I think it’s the case now… but I might not be making sense this late at night
actually that maybe doesn’t make sense
Madan Jampani
@madjam
Apr 05 2016 07:10
So basically events need to be observed by all clients in the same order
What you describe is in inline with the contract Zookeeper offers : because the completion callback is blocking the event queue, the synchronous read will return with the new value of /a before the watch event is processed.
Jordan Halterman
@kuujo
Apr 05 2016 07:14
right… that makes sense
Madan Jampani
@madjam
Apr 05 2016 07:16
I feel it is better we pick this up tomorrow morning when are are bit more fresh :)
Jordan Halterman
@kuujo
Apr 05 2016 07:17

if there is a change to /a between the asynchronous read and the synchronous read, the client library will receive the watch event saying /a changed before the response for the synchronous read

this

Madan Jampani
@madjam
Apr 05 2016 07:17
Event notifications is always the head scratcher!
Jordan Halterman
@kuujo
Apr 05 2016 07:17
definitely
gonna have to think about that one a bit, but it can replace linearizable events if events are sequenced with other operations
see you tomorrow
Madan Jampani
@madjam
Apr 05 2016 07:18
Well, this can be easily enforced right?
if the events are already queued at the client, all we need to do before completing the sync read is to flush the local event queue, right?
Jordan Halterman
@kuujo
Apr 05 2016 07:25
if the client has received the events… the problem I think is the client doesn’t know what events it should have until it has received them. So, my example is client B submits command 2 to the cluster and it shouldn’t see command 2 complete before it has received event 1. I’m not sure what coordination is necessary for the client to know it should have received event 1. It seems to be a contract that can be enforced on a client with some feedback from the cluster perhaps in the command response.
maybe the command response contains the last event index published prior to the command, and the client can wait for that event to be received before completing the command. In most cases the client will indeed have already received the event
I think that makes sense but I probably have to sleep it off
Madan Jampani
@madjam
Apr 05 2016 07:28
I think that makes sense. Lets pick this up tomorrow!
good nite!
Jordan Halterman
@kuujo
Apr 05 2016 07:28
cool adios
Jordan Halterman
@kuujo
Apr 05 2016 07:50
Glad we figured this out :-) would love to replace LINEARIZABLE with something more practical than adding a bunch of synchronous round trips to operations.
Jordan Halterman
@kuujo
Apr 05 2016 17:05
I'll take a stab at this today and see what I come up with. Something like the above will likely work fine
Madan Jampani
@madjam
Apr 05 2016 17:34
Great. I’ll be around if you want to chat about it.
Jordan Halterman
@kuujo
Apr 05 2016 18:37
Removing the linearizable events removes a ton of expensive code which is awesome
Madan Jampani
@madjam
Apr 05 2016 18:42
With the existing SEQUENTIAL model, the gaurantee is events will be observed in the same order at all clients. But no guarantees are provided regarding the timing, right?
So if client A submits command 1 and client B submits command 2, then B’s command can complete before it sees the event for A’s command. Is that right?
Jordan Halterman
@kuujo
Apr 05 2016 19:25
Right
I think I am getting there in terms of how this can be implemented...
Jordan Halterman
@kuujo
Apr 05 2016 20:09
Organizing my thoughts :-) I think I discovered some related issues that can be tackled
Madan Jampani
@madjam
Apr 05 2016 20:13
Even better :)
Jordan Halterman
@kuujo
Apr 05 2016 21:55
Alright so...
Jordan Halterman
@kuujo
Apr 05 2016 22:26
This message was deleted
ugh
Jordan Halterman
@kuujo
Apr 05 2016 23:37
What I realized is that queries do not currently have some of the consistency guarantees they think they have. Commanda have linearizable semantics by being submitted with sequence numbers and making commands idempotent in the state machine. Clients retry commands until success or until the session expires, indicating a partial completion of a command may have occurred. But I think the same types of retries for queries needs to be rethought. The problem with retries with queries is there's nothing in the Raft log to ensure queries occur in sequential order with retries as is the case with commands, and there is likely no way to do it without writing queries to the Raft log. So, I'm wondering if the order of the protocol should be relied on for queries, and queries should not be retried internally as is the case with commands. In that case, a failed query would just be failed. Alternatively, failed command retries could be handled internally and retry policies could apply only to queries.
I think my brain is hurting
I guess all this means is I'm not sure retries for queries are great and maybe it's a problem that the retry policy is configured for both commands and queries but retries for each ca behave a lot differently
Madan Jampani
@madjam
Apr 05 2016 23:43
So the problem is client A issues two queries 1 and 2 in that order. And it sees 2 complete before 1 ?