These are chat archives for atomix/atomix

18th
May 2016
David Moravek
@dmvk
May 18 2016 13:55
Hi, is there any possibility to throw an exception inside atomix resources state machine (so future would complete exceptionally on the client side)? I wasn't able to find any error handling in state machine executor code.
Jonathan Halterman
@jhalterman
May 18 2016 16:08
@kedaly Missed yesterday, but regarding your comment "this just simply seems better" and "Is there anything I can do to help?", the answer is YES! Spread the word! We haven't had much blogging, etc., about Atomix yet, and could use help spreading the word in any way. There have been some talk proposal(s) for later this year though so we'll see what comes of that.
@kedaly love your cost vs SLA spreadsheet idea. will store that one away!
Jordan Halterman
@kuujo
May 18 2016 16:51
@davidmoravek there is exception handling in Copycat's ServerStateMachineExecutor. What it does is catches any exception thrown by the state machine and rethrows it as an ApplicationException. That exception is returned as an ERROR of type APPLICATION_ERROR to the client. When ClientConnection receives a failed response with that error (among others) it immediately fails the request instead of retrying. Then, ClientSessionSubmitter also fails the command based on that exception. So, an exception thrown in the state machine should cause the client-side future to be completed with an ApplicationException. But TBH, I think this needs some work. I'm not opposed to serializing the exception and allowing the state machine to decide whether it should throw an application-specific exception that needs to be deserialized on the client or throw a generic exception like ApplicationException.
Right now, Copycat's protocol favors small size. But I think exceptions need to be made for... exceptions to get more useful error messages at least. An error message should be added to responses to make client-side exceptions more useful.
Jonathan Halterman
@jhalterman
May 18 2016 18:24
@kuujo The ApplicationException wrapper is probably still good since it distinguishes user space errors from other things like unknown state machine operations
Perhaps though, ApplicationException -> StateMachineException?
which is basically how the javadoc describes it
Kevin Daly
@kedaly
May 18 2016 20:12
@jhalterman I can maybe do a talk to the JUG Here in Toronto about atomix as I learn more about it..
Jonathan Halterman
@jhalterman
May 18 2016 20:13
@kedaly that would be great!
Jordan Halterman
@kuujo
May 18 2016 21:09
Yeah ApplicationException is fine but still need a message
That would indeed be amazing
Madan Jampani
@madjam
May 18 2016 21:10
@kuujo quick question regarding client sequencing...
Jordan Halterman
@kuujo
May 18 2016 21:10
I'm supposed to be doing a talk very soon... The only problem is finding the time to make decent slides. But I'll share them when I have them
Indeed
Madan Jampani
@madjam
May 18 2016 21:11
Is there any reason why we treat query attempt failures differently from command attempt failures.
I’m seeing some erratic behavior under failures (commands would stall at the sequencer)
I need to explain this better. But I am myself trying to piece together some forensic evidence :)
Jordan Halterman
@kuujo
May 18 2016 21:17
We had some conversations about this so there's probably more in the history here, but the gist was: because commands are always submitted to the leader and written to the Raft log, we can easily sequence them in proper order when they're retried even when the leader changes. But because queries can be handled on followers and their sequencing is not shared between nodes, switching servers and maintaining the same sequencing with query retries is very difficult and impractically expensive. So, failed queries are simply failed, while commands are retried.
I think I have seen what you're referring to and it absolutely needs to be figured out. I haven't been able to reproduce it consistently
Madan Jampani
@madjam
May 18 2016 21:21
I understand the rationale and remember our conversation. This appears to be an issue with appropriately sequencing responses (or failures). The stall behavior I suspect is becase client is expecting a respons (or failure) for sequence x and that never arrives. Subsequent responses are held up (rightly) to provide the sequential consistency semantics.
Jordan Halterman
@kuujo
May 18 2016 21:25
Right… so, the NettyClient uses a timeout to ensure that requests are always failed. The two possible scenarios are:
A: some request isn’t being timed out (NettyClient is broken)
B: some part of the sequencing logic in the sequencer code is broken
Madan Jampani
@madjam
May 18 2016 21:27
Right. And this could quite likely be due to A. I do not use NettyTransportin this set up.
Jordan Halterman
@kuujo
May 18 2016 21:28
Ahh yeah that seems like the likely culprit then. I think this is when I’ve seen it happen as well — in tests when using the LocalTransport which doesn’t do timeouts. I should write a TestTransport that does timeouts properly since LocalTransport is also used in AtomixReplica and I don’t want to slow it down with timeouts
it would also be possible to just time out requests at a higher level too
not sure what the arguments for/against that are
Madan Jampani
@madjam
May 18 2016 22:38
Perhaps, the argument for is that people writing custom transports don’t shoot themselves in the foot. And the argument against might be that it could be tricky to pick a reasonable timeout. This argument against is weak IMO.
In any case I see one piece of logic that is a bit problematic...
When a query fails, we don’t bother incrementing the responseSequencein order to skip over the failed query
Jordan Halterman
@kuujo
May 18 2016 23:11
That does seem like a problem